This guide will show you how to get started quickly using SUT. Note that SUT is currently in Beta, so you may not want to use it in a production environment.
In this guide, the dollar sign ($) is used to indicate a shell prompt. This may vary at your site.
Download the most recent version of MPICH from here. (See the MPICH homepage for more information.)
$ gunzip -c mpich.tar.gz | tar xf -
$ cd mpich $ ./configure --with-device=ch_p4mpdThe configure step may take some time. You may also want to change the installation directory by adding a --prefix flag to the configure line. For example, to install MPICH in /usr/local/mpich, use
$ ./configure --with-device=ch_p4mpd --prefix=/usr/local/mpichThe option --with-device=ch_p4mpd means that MPICH will build and use MPD.
$ makeIf there were no errors in the build, you can install by typing
$ make installYou may need to become root to install MPICH, depending on the location you specified in the configure step. Both build and installation take some time.
At this point, you need to copy MPICH to all the nodes in your cluster. Make sure that MPICH gets installed in the same place on all machines or else your MPI programs (including SUT) may not work. If you are not the cluster administrator, consult with him or her for the proper method of copying MPICH to all of your nodes.
Go to the download page and get the current distribution.
If you got the tar gzip distribution, use
$ gunzip -c sut-<version>.tar.gz | tar xf -If you got the tar bzip2 distribution, use
$ bunzip2 -c sut-<version>.tar.bz2 | tar xf -If you got the tar Z (compress) distribution, use
$ uncompress -c sut-<version>.tar.Z | tar xf -If you got the zip distribution, use
$ unzip sut-<version>.zip
$ cd sut-<version> $ ./configureAgain, you can give a --prefix option to specify where SUT will be installed.
$ makeIf there were no errors in the build, you can install by typing
$ make installYou may need to become root to install SUT, depending on the location you specified in the configure step.
At this point, you need to copy SUT to all the nodes in your cluster. SUT only needs to be in the execution path on each node, but it may be easier for administrative purposes to put in the same location on each node. If you are not the cluster administrator, consult with him or her for the proper method of copying SUT to all of your nodes.
MPD uses a file in the user's home directory named '.mpdpasswd' to authenticate connections. This file contains a random string and must match on all hosts on which you plan to run MPD. You must create this file yourself and distribute it to all your hosts. Make sure the file is only readable and writable by you (or the appropriate user).
As an example, .mpdpasswd might contain the text
,@#af!#13ng,01nkavwhich was generated randomly. Then the permissions are set appropriately:
chmod 600 .mpdpasswd
Log on to any machine that you want to be in your ring. For this example, assume that this machine is called myhost. Start MPD in the background by using
$ mpd &Now run mpdtrace to see the result
$ mpdtrace mpdtrace: myhost_4075: lhs=myhost_4075 rhs=myhost_4075 rhs2=myhost_4075From this output, you can see that myhost is in the ring, running on port 4075. What this output is showing is that there is now an mpd ring up and running with 1 node.
Now that you have one node running, you can start the entire ring by running MPD on all the other nodes
$ mpd -h myhost -p 4075 &Once you have run this command on all the nodes that you want in your ring, you can see what the ring looks like by running mpdtrace on any node in the ring
$ mpdtrace mpdtrace: myhost_4075: lhs=host2_2988 rhs=host3_1628 rhs2=host2_2988 mpdtrace: host2_2988: lhs=host3_1628 rhs=myhost_4075 rhs2=host3_1628 mpdtrace: host3_1628: lhs=myhost_4075 rhs=host2_2988 rhs2=myhost_4075In this example, there is a ring of 3 hosts, myhost, host2, and host3.
For more information about MPD, check the MPICH User's Guide.
Now that everything is installed and you have an MPD ring, you can begin to use SUT. This section gives a few examples to get you started.
To list the current directory on all the nodes in your MPD ring, use
$ ptls -all myfile1 myfile2 myfile1
Looking at the output above, you may be confused. By default, ptls simply lists the files in the directories on all the nodes you specified. This can be useful for some applications, but in many cases, you would like to see what nodes the files are on. To do this, use the -h option
$ ptls -all -h [host2.domain.tld] myfile1 myfile2 [host3.domain.tld] myfile1Here, the header lists to which node each following file belongs.
The -C option is also useful for getting columnar output from ptls
$ ptls -all -Ch [host2.domain.tld] myfile1 myfile2 [host3.domain.tld] myfile1
A Note on NodesIn the preceeding example, the first option given was -all. This means that ptls should run on all of the nodes in the MPD ring. The -all option is a useful shorthand, but often you would like to run a command on only a subset of the hosts in your ring. For this, the -m and -M options are useful. These two options are basically the same, except that -m gets the list of nodes from a file while -M gets the list of nodes from the next argument. Basic node specification is fairly easy and obvious: simply list the names of the nodes on which you wish to run separated by white space. For example, the command $ ptls -M "myhost host2"is valid. For a few nodes, this verbose listing of the node names is acceptable, but when you start using the commands on a large number of machines, it becomes unwieldy. Thus, SUT offers an abbreviation syntax for nodes. Suppose that you wanted to run ptls on hosts host1 through host30 and host52 in your large cluster. You can do this by using $ ptls -M "host%d@1-30,52"The host specification here is broken into two parts: the part before the '@' symbol and the part after it. The part before the '@' is the format. This specifies how the node names look. In this example, the node names are of the form host<number>. The %d in the format is where the numbers belong in the node name. (The format is similar to those specified for the printf C function.) The part after the '@' is the list of numbers that belong in the format given before the '@'. This list can consist of single numbers, such as 52 in this example, or ranges of numbers, such as 1-30 in this example. Note that in example 1, when running ptls with the -all option, the command only ran on host2 and host3. This is because -all actually means "run on all nodes except the current one." (Example 1 was assumed to be run on myhost). This makes sense in many cases, as you will see in the ptcp example below. In order to run on the current host, it must be specified explicitly in a node list. NOTE: The -all, -m <machine file>, or -M <machine list> must be the first option given to any of the parallel SUT commands. |
To copy the file 'BIGFILE' to all nodes (except the current one), use
$ ptcp -all BIGFILE .This will copy 'BIGFILE' to the current directory on all of the nodes in the MPD ring.
Recursive copying of directories is also possible, just as with the normal cp
$ ptcp -all -r mydir/ .
A Note on the Current Working DirectoryIn the examples above, the 'current directory' was mentioned, but not explained. When you run a command on one node, your current working directory on that node is considered to be the same across all nodes on which you are running the command. If the directory you are in on node does not exist on another, the current working directory on that node is considered to be your home directory. NOTE: Pay very close attention to what your current working directory on all nodes is whenever using SUT commands. You can get unexpected results if you are not careful, including data loss. This is especially true when using potentially destructive commands such as ptrm or even ptcp! |