Whatami: Musings on Architecture Identification Strings

Remy Evard, [email protected]


Abstract

We often need to be able to determine what type of ``architecture'' a given machine is, so that we can put the appropriate directories in a path or execute the correct program. No unix utility quite fulfills our requirements, so we're creating the ``whatami'' program to return a single unique string on each architecture. This design doc describes the need for the ``whatami'' program, the architecture strings it returns, and the mechanism for creating new strings over time.


Requirements

You often need to be able to discover what type of machine you're using. This comes up when setting a path, deciding what code to execute, or which configuration file to run. It should be possible to do this by executing one simple program. This ends up turning into quite a set of requirements.


Architecture Differentiation

One would think that it's trivial to figure out what kind of architecture you're using. You should be able to run a simple unix command that would return a unique identifier. This turns out to not be as true as you'd like. For example:

I'm picking on Suns here, but other architectures are equally pathological, and uname, while theoretically standard, is so specific to OS implementations that you almost have to know which OS you're running to know what flags to call it with.


What Is an Architecture

Another problem is deciding just what you mean by ``type'' of machine. It depends on your your particular problem of the moment. In most cases, for example, you only care about basic binary compability and aren't too worried about low-level hardware differences. But in some few cases, like when you're building ``top'', you care in detail about the hardware differences.

For the vast majority of cases, however, what we really want in an architecture specifying program is something that we can use to differentiate paths. For example, I, as a user, want to be able to tell whether my PATH environment should include ``~/bin/solaris'' or ``~/bin/aix-4''. As a system administrator, I want to be able to set global default PATHs. It turns out that this is the differentiation that we really need the most, and that pathological cases like top are quite a bit more rare. Therefore, we will specify this as a requirement:

The words ``should'' and ``most'' leave that wide-open for interpretation, so we'll just say that the final decision is up to the systems administration team. In the vast majority of cases, it'll be very obvious when a new architecture needs to be defined.

It turns out that returning a unique string to designate an architecture is very useful, so that's what we'll do:


Further Specification

This doesn't solve cases like top, which are more particular than others. In that case, we'll do what we do now - we'll put a wrapper around the top binaries that looks to see what specific kernel architecture we're running, and then the wrapper will invoke the correct binary. This turns out to occasionally be necessary for hardware architectures or OS versions. It may be necessary for other things as well, but those will be quite a bit more rare.

Therefore:


Sticking to Standards

This is a pretty simple idea. Surely someone has solved this before, and we should use one of those solutions.

Well, here are the options that we're aware of:

UNIX builtins.
In an ideal world, the program we need would be supplied with the operating system, and would be the same on very UNIX. The POSIX spec suggests that uname is the way to go, but uname alone will not give you the string you need to set your path. You end up having to wrap code around around invocations of uname, which is exactly what we're trying to avoid.

GNU's configuration.
The FSF has some very cool code that is uses to differentiate between zillions of architectures for emacs and gcc versions. Those turn out to be far more specific that we need for setting paths and implementing scripts, and the names are sort of long and ugly (like ``mpi-sgi-irix5.3'').

AFS.
AFS has a similar mechanism built into it, available through the @sys directory macro, which also turns out to be extraordinally specific, and tends to differentiate on a much broader level that one typically does for building software and writing scripts.

tcsh.
Tcsh has a builtin environment variable called HOSTTYPE, which includes a string of the form we need that reflects the current architecture type. This is very close to the right thing, however, it doesn't have the right architectures. In particular, it doesn't differentiate between SunOS and Solaris, AIX 3 and 4, or IRIX 5 or 6. Also, it's not easily extensible, since it requires that tcsh source be modified and rebuilt.

These are the only widely available mechanisms that we're aware of, although it's quite possible that there are more. The conclusion from all of these is that most implementations are at too low of a level (specific to the needs of compilers), and that we need to create our own mechanism to work at the application installation and script layer.

The downside of writing it ourselves is that anything we write that relies on our mechanism will be reliant on it, and will not work in other architectures. For example, if I build my own path in my .cshrc by running our ``whatami'' program, then my .cshrc won't be portable to other sites. One simple solution for that is to make our mechanism portable and available to other sites. Therefore I'd just install ``whatami'' at that other site (possibly in my own directory). This isn't the best solution, but there doesn't appear to be another one.

Therefore:


Speed

Finally, if we run this program everytime someone logs in, it needs to be fast. One particularly fast implementation would be something like:

        #!/bin/sh
        echo "solaris"

But this isn't particularly portable. Therefore, in the interests of portability, we'll have slightly more complex code and simply specify that it should be written to be as fast as reasonable.


Other Desireables

Now that we've decided we're going to write our own program to differentiate architectures and that it should be portable and fast, we may as well list other desireable features.


Specification

Given the above requirements, it's pretty simple to outline a specification for the program.


Program Name: whatami

In the grand tradition of who am i and whoami, the program will be called whatami. This has the important feature of not having the same name as any other UNIX command that we know of.


Usage

Normal usage of whatami will return a string describing the architecture string for that architecture.

It may also be called with one command line argument. Those arguments and their results are:

   -t:  prints the machine type - the same as if no option were given
   -n:  prints out the name of the operating system
   -r:  prints out the name and release of the os, separated by a space
   -m:  prints out the machine hardware type for the machine
   -a:  prints type, hardware, os, and version
   -l:  lists all known description strings
   -h, --help:     this help message
   -v, --version:  the version of whatami


Architecture String Formats

Architecture strings will be designated by the MCS Systems Group. They should be chosen to be as indicative of the architecture as possible, and to follow any conventions that may already be in use.

Arch strings will always be lowercase (to avoid seismological disasters like ``NeXT'' :-), and will follow this format:

    <arch>[<hardware>][-<number>[.<number>]]

Where:

Recalling that the resulting strings will probably be used as directory names in a lot of places, they should also be kept to a reasonable length.


Existing Architecture Strings

We'll use these strings for the architectures now in the division:

    aix-3
    aix-4
    aux
    digital
    freebsd
    hpux
    irix-5
    irix-6
    linux
    next
    nt
    ntalpha
    osf
    solaris
    solaris86
    solarishp
    sun4


Errors

If whatami can't figure out what kind of architecture its on, it will print the string ``unknown'' and abort with an exit status of 1.

(This means that the string ``unknown'' is not a legitimate architecture name, as if there were any doubt.)


Implementation

It's done.

Notes made before it was actually done...


Random Ideas

Wow it's annoying that UNIX doesn't do this right for us already.

Also, this is another case where there's an awful lot of text that needs to be written to explain a simple idea, but this concept of reliable architecture strings is central to the software installation mechanism, and our rationale needs to be documented.