Quick Start Guide to SUT

This guide will show you how to get started quickly using SUT. Note that SUT is currently in Beta, so you may not want to use it in a production environment.

Conventions

In this guide, the dollar sign ($) is used to indicate a shell prompt. This may vary at your site.

Installing MPICH

SUT has only been tested with the MPICH implementation of MPI, so this section will show you how to install it.
  1. Get MPICH

    Download the most recent version of MPICH from here. (See the MPICH homepage for more information.)

  2. Unpack the distribution

        $ gunzip -c mpich.tar.gz | tar xf -
    

  3. Configure MPICH

        $ cd mpich
        $ ./configure --with-device=ch_p4mpd
    
    The configure step may take some time. You may also want to change the installation directory by adding a --prefix flag to the configure line. For example, to install MPICH in /usr/local/mpich, use
        $ ./configure --with-device=ch_p4mpd --prefix=/usr/local/mpich
    
    The option --with-device=ch_p4mpd means that MPICH will build and use MPD.

  4. Build and Install MPICH

        $ make
    
    If there were no errors in the build, you can install by typing
        $ make install
    
    You may need to become root to install MPICH, depending on the location you specified in the configure step. Both build and installation take some time.

  5. Distribute MPICH

    At this point, you need to copy MPICH to all the nodes in your cluster. Make sure that MPICH gets installed in the same place on all machines or else your MPI programs (including SUT) may not work. If you are not the cluster administrator, consult with him or her for the proper method of copying MPICH to all of your nodes.

Installing SUT

  1. Get SUT

    Go to the download page and get the current distribution.

  2. Unpack SUT

    If you got the tar gzip distribution, use

        $ gunzip -c sut-<version>.tar.gz | tar xf -
    
    If you got the tar bzip2 distribution, use
        $ bunzip2 -c sut-<version>.tar.bz2 | tar xf -
    
    If you got the tar Z (compress) distribution, use
        $ uncompress -c sut-<version>.tar.Z | tar xf -
    
    If you got the zip distribution, use
        $ unzip sut-<version>.zip
    

  3. Configure SUT

        $ cd sut-<version>
    	$ ./configure
    
    Again, you can give a --prefix option to specify where SUT will be installed.

  4. Build and Install SUT

        $ make
    
    If there were no errors in the build, you can install by typing
        $ make install
    
    You may need to become root to install SUT, depending on the location you specified in the configure step.

  5. Distribute SUT

    At this point, you need to copy SUT to all the nodes in your cluster. SUT only needs to be in the execution path on each node, but it may be easier for administrative purposes to put in the same location on each node. If you are not the cluster administrator, consult with him or her for the proper method of copying SUT to all of your nodes.

Using SUT

Before you use SUT, you need to start MPD on all of the nodes that you want to run on.

Setting up an MPD ring

Because MPD uses a ring topology, a network of MPD's is called an MPD ring. Here we will give an example of how to set up such a network.
  1. Setting up .mpdpasswd

    MPD uses a file in the user's home directory named '.mpdpasswd' to authenticate connections. This file contains a random string and must match on all hosts on which you plan to run MPD. You must create this file yourself and distribute it to all your hosts. Make sure the file is only readable and writable by you (or the appropriate user).

    As an example, .mpdpasswd might contain the text

        ,@#af!#13ng,01nkav
    
    which was generated randomly. Then the permissions are set appropriately:
        chmod 600 .mpdpasswd
    

  2. Start the first node

    Log on to any machine that you want to be in your ring. For this example, assume that this machine is called myhost. Start MPD in the background by using

        $ mpd &
    
    Now run mpdtrace to see the result
        $ mpdtrace
        mpdtrace: myhost_4075:  lhs=myhost_4075  rhs=myhost_4075  rhs2=myhost_4075
    
    From this output, you can see that myhost is in the ring, running on port 4075. What this output is showing is that there is now an mpd ring up and running with 1 node.

  3. Starting MPD on the remaining nodes

    Now that you have one node running, you can start the entire ring by running MPD on all the other nodes

        $ mpd -h myhost -p 4075 &
    
    Once you have run this command on all the nodes that you want in your ring, you can see what the ring looks like by running mpdtrace on any node in the ring
        $ mpdtrace
        mpdtrace: myhost_4075:  lhs=host2_2988  rhs=host3_1628  rhs2=host2_2988
        mpdtrace: host2_2988:  lhs=host3_1628  rhs=myhost_4075  rhs2=host3_1628
        mpdtrace: host3_1628:  lhs=myhost_4075  rhs=host2_2988  rhs2=myhost_4075
    
    In this example, there is a ring of 3 hosts, myhost, host2, and host3.

For more information about MPD, check the MPICH User's Guide.

Examples of using the actual tools

Now that everything is installed and you have an MPD ring, you can begin to use SUT. This section gives a few examples to get you started.

Example 1: Listing the directory

To list the current directory on all the nodes in your MPD ring, use

    $ ptls -all
    myfile1
    myfile2
    myfile1

Looking at the output above, you may be confused. By default, ptls simply lists the files in the directories on all the nodes you specified. This can be useful for some applications, but in many cases, you would like to see what nodes the files are on. To do this, use the -h option

    $ ptls -all -h
    [host2.domain.tld]
    myfile1
    myfile2
    [host3.domain.tld]
    myfile1
Here, the header lists to which node each following file belongs.

The -C option is also useful for getting columnar output from ptls

    $ ptls -all -Ch
    [host2.domain.tld]
    myfile1   myfile2
    [host3.domain.tld]
    myfile1

A Note on Nodes

In the preceeding example, the first option given was -all. This means that ptls should run on all of the nodes in the MPD ring. The -all option is a useful shorthand, but often you would like to run a command on only a subset of the hosts in your ring.

For this, the -m and -M options are useful. These two options are basically the same, except that -m gets the list of nodes from a file while -M gets the list of nodes from the next argument.

Basic node specification is fairly easy and obvious: simply list the names of the nodes on which you wish to run separated by white space. For example, the command

    $ ptls -M "myhost host2"
is valid.

For a few nodes, this verbose listing of the node names is acceptable, but when you start using the commands on a large number of machines, it becomes unwieldy. Thus, SUT offers an abbreviation syntax for nodes. Suppose that you wanted to run ptls on hosts host1 through host30 and host52 in your large cluster. You can do this by using

    $ ptls -M "host%d@1-30,52"
The host specification here is broken into two parts: the part before the '@' symbol and the part after it. The part before the '@' is the format. This specifies how the node names look. In this example, the node names are of the form host<number>. The %d in the format is where the numbers belong in the node name. (The format is similar to those specified for the printf C function.) The part after the '@' is the list of numbers that belong in the format given before the '@'. This list can consist of single numbers, such as 52 in this example, or ranges of numbers, such as 1-30 in this example.

Note that in example 1, when running ptls with the -all option, the command only ran on host2 and host3. This is because -all actually means "run on all nodes except the current one." (Example 1 was assumed to be run on myhost). This makes sense in many cases, as you will see in the ptcp example below. In order to run on the current host, it must be specified explicitly in a node list.

NOTE: The -all, -m <machine file>, or -M <machine list> must be the first option given to any of the parallel SUT commands.

Example 2: Copying files

To copy the file 'BIGFILE' to all nodes (except the current one), use

    $ ptcp -all BIGFILE .
This will copy 'BIGFILE' to the current directory on all of the nodes in the MPD ring.

Recursive copying of directories is also possible, just as with the normal cp

    $ ptcp -all -r mydir/ .

A Note on the Current Working Directory

In the examples above, the 'current directory' was mentioned, but not explained. When you run a command on one node, your current working directory on that node is considered to be the same across all nodes on which you are running the command. If the directory you are in on node does not exist on another, the current working directory on that node is considered to be your home directory.

NOTE: Pay very close attention to what your current working directory on all nodes is whenever using SUT commands. You can get unexpected results if you are not careful, including data loss. This is especially true when using potentially destructive commands such as ptrm or even ptcp!