Scalable Unix Commands on Parallel Computers

Jim Cownie
Meiko, Inc.

Dan Frye
IBM

Bill Gropp
Rusty Lusk
Argonne National Laboratory

Contents

  • A Tale
  • Motivation
  • Summary of Commands
  • Some Simple Examples
  • Arguments
  • Specifying a Set of Nodes
  • Output
  • Parallel Process Find
  • Options for Parallel Process Find
  • More Options for Parallel Process Find
  • Parallel Predicate and Parallel Test
  • Parallel Distribution of tasks
  • Parallel Display
  • Options for Parallel Display
  • Status


  • Next: Motivation

    A Tale

    Once upon a time a programmer was testing a 128-processor machine with local disks on each node. The nodes were connected by a fast switch and an ethernet. Program startup could be greatly accelerated by putting copies of the executable in /tmp on each node. With small numbers of processes
     
    foreach i (node1 node2 node3 node4) 
         rcp foo \\$i:/tmp/foo 
       end 
    
    was OK, but for 128 nodes it was very, very bad.

    So, being a parallel library writer, he wrote a shell script to copy the file in parallel using a tree of intermediate processes. This reduced the copying time by a factor of 10, and he lived happily ever after. par



    Next: Summary of Commands Previous: A Tale

    Motivation




    Next: Some Simple Examples Previous: Motivation

    Summary of Commands

     
    Program Name       Action                                                                                                           
     pcp     =0pt plus8em Parallel copy (for systems with local disks on each node).                                           
    pcat     Parallel concatenation of files                                                                                   
    pls      Parallel directory list (ls).                                                                            
    prm      Parallel remove                                                                                                   
    pmv      Parallel move                                                                                                     
    pfind    Parallel find                                                                                            
    pps      Parallel ps                                                                                              
    pfps     Parallel process find                                                                                             
    pkill    Parallel process kill                                                                                             
    pexec    =0pt plus8em Run a command on all selected processors                                                             
    ptest    =0pt plus8em Run test on all selected processors, anding theresults and returning a single status value. 
    ppred    =0pt plus8em Run a command when a condition is satisfied                                                          
    pdistrib =0pt plus8em Run a command on a collection of files                                                               
    pdisp                                                                                                                      
    



    Next: Arguments Previous: Summary of Commands

    Some Simple Examples

    To copy mycode to /tmp/myname/mycode on processors 1 and 32 through 63, use
     
    pcp 1,32-63 mycode /tmp/myname/mycode 
    
    The command
     
    pcat 1-10 /tmp/testfile > myfile 
    
    concatenates the file /tmp/testfile on nodes one through ten to the file myfile. The results are concatenated in the listed node-number order.

    The command prm executes rm on the specified nodes.

    The command pmv executes mv on the specified nodes. Files may only be moved within a single processor. That is, a file may be moved from one place to another on the local disk of a processor, for each processor selected.

    To find all of the files on the local disks that are older than two days, use

     
    pfind 1-128 /tmp -atime ... -print 
    



    Next: Specifying a Set of Nodes Previous: Some Simple Examples

    Arguments


    (Press Specifying a Set of Nodes for details)



    Up: Arguments Next: Output Previous: Arguments

    Specifying a Set of Nodes


     
    nodelist -> '-all' 
          nodelist -> domain ':' nodelist 
          nodelist -> range [ , nodelist ] 
          nodelist -> nodenum [ , nodelist ] 
          nodelist -> nodename [ , nodelist ] 
          domain   -> <predefinedname> 
          domain   -> '@' <valid filename> 
          range    -> nodenum '-' nodenum 
          nodenum  -> <integer> 
          nodenum  -> 'last' 
          nodename -> <any valid nodename> 
      



    Next: Parallel Process Find Previous: Specifying a Set of Nodes

    Output




    Next: Options for Parallel Process Find Previous: Output

    Parallel Process Find




    Next: More Options for Parallel Process Find Previous: Parallel Process Find

    Options for Parallel Process Find

     
    Option                   Description                                                                                                                                                      
     -n name        =0pt plus8em Match with the name of the process.  The name maycontain wildcards.                                                                                 
    -tn             =0pt plus8em Match the tail name of the executable                                                                                                               
    -o owner        =0pt plus8em Match with the owner (by name) of the process.  Bydefault, only the user name of the caller is matched.  Use -o '*'to match any user name. 
    -pty name       =0pt plus8em Match with the controlling terminal of theprocess                                                                                                   
    -rtime hh:mm    =0pt plus8em Match with jobs that have run hh:mm time orlonger.                                                                                                  
    -stime dd:hh:mm =0pt plus8em Match with jobs that started at least dd days, hhhours, and mm minutes ago.                                                                         
                                                                                                                                                                                              
    



    Next: Parallel Predicate and Parallel Test Previous: Options for Parallel Process Find

    More Options for Parallel Process Find

     
    -r state       =0pt plus8em Match with jobs in the specified run state                                                                                                                                                                       
    -or            =0pt plus8em Combine matching criteria by or'ing them.                                                                                                                                                                        
    -print         =0pt plus8em Causes matching jobs to be printed in the selectedps format.                                                                                                                                                     
    -id            =0pt plus8em Causes matching jobs to be printed asnodename:pid.                                                                                                                                                      
    -sort          =0pt plus8em Causes the output to be sorted by nodename                                                                                                                                                                       
    -exec pgm args =0pt plus8em Executes pgm for each matching process.  Similarto find, the string  stands for the pid of thematched process, and  indicates the end of the list ofarguments to give to the program. 
    -kill signal   =0pt plus8em Causes all matched processes to be killed withthe specified signal.  The signal value may be either the number or thename (for example, -kill 9 and -kill SIGQUIT are thesame).                
    -nice n        =0pt plus8em Sets the nice value of matched jobs.                                                                                                                                                                             
                                                                                                                                                                                                                                                          
    



    Next: Parallel Distribution of tasks Previous: More Options for Parallel Process Find

    Parallel Predicate and Parallel Test




    Next: Parallel Display Previous: Parallel Predicate and Parallel Test

    Parallel Distribution of tasks




    Next: Options for Parallel Display Previous: Parallel Distribution of tasks

    Parallel Display




    Next: Status Previous: Parallel Display

    Options for Parallel Display

     
    Option                   =0pt plus8em Action                                                                                                                               
     -yes colorname =0pt plus8em Color of nodes appearing in input                                                                                                    
    -no colorname   =0pt plus8em Color of nodes not appearing in input                                                                                                
    -down colorname =0pt plus8em Color of down nodes                                                                                                                  
    -text string    =0pt plus8em Text for nodes appearing in input.  This string maycontain formatting information such as 3 for the third token inthe line. 
    -small          =0pt plus8em Do not display text unless button pressed (produces smalldisplay)                                                                    
    -store          =0pt plus8em Save text with node; pushing the left mouse button willdisplay the text.                                                             
    -layout RxC     =0pt plus8em Layout of R rows and C columns                                                                                     
    -domain name    =0pt plus8em Name of the machine's domain                                                                                                         
    -pserver name   =0pt plus8em Use a pre-existing display                                                                                                           
    -pstart name    =0pt plus8em Make this a pdisp display server                                                                                            
                                                                                                                                                                               
    



    Previous: Options for Parallel Display

    Status