This directory contains a very simple accounting statistics tool "pbsacct" for the Portable Batch System (tested with PBS versions 2.2 and 2.3). The latest version of this software may be downloaded from ftp://ftp.fysik.dtu.dk/pub/PBS/ Thanks for feedback goes to: Jan-Frode Myklebust ------------------------------------------------------- Usage: pbsacct where are daily records (such as 20000705) located in $PBSHOME/server_priv/accounting/ (PBSHOME is usually /var/spool/pbs). A sample output is: # pbsacct 200006?? Portable Batch System accounting statistics ------------------------------------------- A total of 30 accounting files will be processed. First record is dated 06/01/2000, last record is dated 06/30/2000. Average Average Username #jobs CPU-days Wall-days Efcy. #nodes q-days -------- ----- -------- --------- ----- ------- ------- TOTAL 237 1415.31 1578.68 0.897 6.80 3.50 user0001 12 278.67 301.06 0.926 12.00 5.98 user0002 29 226.96 244.98 0.926 4.71 4.04 user0003 52 221.65 271.37 0.817 10.83 3.21 user0004 26 201.35 204.27 0.986 5.66 4.94 user0005 13 130.26 151.13 0.862 8.69 3.44 user0006 38 112.23 114.87 0.977 3.22 2.33 user0007 18 109.85 117.90 0.932 7.53 4.75 user0008 14 75.43 85.88 0.878 8.86 2.58 user0009 8 38.88 41.63 0.934 6.72 2.62 user0010 4 12.05 12.36 0.975 3.97 3.25 user0011 4 5.88 31.12 0.189 6.40 7.30 user0012 3 1.47 1.48 0.991 2.08 2.61 user0013 5 0.37 0.37 0.986 1.00 1.36 user0014 10 0.26 0.27 0.973 1.00 0.87 user0015 1 0.00 0.00 0.797 1.00 2.84 -- The usernames have been made anonymous. We prefer to count CPU- and wall-time in days rather than hours or seconds. It should be noted that PBS records only the CPU-time spent on the Master-node of parallel jobs. The spawning of parallel processes by, e.g., MPI is outside the control of PBS, and no accounting of the Slave nodes is currently performed. The total CPU-time is estimated as the CPU-time on the Master times the number of nodes. The only reliable measure is actually the Wall-time times the number of nodes. The column "Efcy." is the ratio of CPU-time to wall-time. Some jobs spend a long time in waiting states, likely because of I/O, or because of parallel processes waiting for network communication. This measure may indicate that some users' jobs need to be analyzed for possible improvements. The column "Average #nodes" is a weighted average of the number of nodes used in parallel by the user's jobs. The column "Average q-days" is the average number of days that the jobs spent in the queue while being eligible to run. This shows how difficult it is for jobs to get CPU-time on this system. ------------------------------------------------------- The script "pbsreportmonth" is a convenient way to automatically generate a monthly report for the previous month. It may be run on the first day of every month using crontab with a line like this: 0 2 1 * * (cd ; /usr/local/bin/pbsreportmonth) The accounting report may be mailed to the administrators by uncommenting some lines at the end of the script. ------------------------------------------------------- The helper script "pbsjobs" processes the raw accounting files, looking for records with an "E" in the second field, meaning a job that Ended. The script extracts some fields of interest, and prints out 1 line of relevant information for each job. This list of information is then summarized by the pbsacct script. The PBS server records accounting information in the module src/server/accounting.c, wherein the explanation of the various accounting fields may be learned. This is also documented in the PBS External Reference Specification, see the chapter on Batch Server Functions. ------------------------------------------------------- Author: Ole Holm Nielsen Department of Physics, Technical University of Denmark, Building 307, DK-2800 Lyngby, Denmark. E-mail: Ole.H.Nielsen@fysik.dtu.dk