Compilers: /home/balay/bin/mpicc and mpif77 Stream ------- mpicc -O5 -o stream -qbgl -qtune=440 -qarch=440 stream_d.c second_wall.c mpirun -partition Pkaushik -np 1 -cwd `pwd` -exe `pwd`/stream1 0: ------------------------------------------------------------- 0: This system uses 8 bytes per DOUBLE PRECISION word. 0: ------------------------------------------------------------- 0: Array size = 2000000, Offset = 0 0: Total memory required = 45.8 MB. 0: Each test is run 10 times, but only 0: the *best* time for each is used. 0: ------------------------------------------------------------- 0: Your clock granularity/precision appears to be 5 microseconds. 0: Each test below will take on the order of 14321 microseconds. 0: (= 2864 clock ticks) 0: Increase the size of the arrays if this shows that 0: you are not getting at least 20 clock ticks per test. 0: ------------------------------------------------------------- 0: WARNING -- The above is only a rough guideline. 0: For best results, please be sure you know the 0: precision of your system timer. 0: ------------------------------------------------------------- 0: Function Rate (MB/s) RMS time Min time Max time 0: Copy: 1335.5593 0.0241 0.0240 0.0252 0: Scale: 1316.3307 0.0243 0.0243 0.0243 0: Add: 1405.1111 0.0342 0.0342 0.0342 0: Triad: 1415.2195 0.0339 0.0339 0.0339 Stream2 (http://www.cs.virginia.edu/stream/stream2) --------------------------------------------------- STREAM2 is an attempt to extend the functionality of the STREAM benchmark in two important ways: # STREAM2 measures sustained bandwidth at all levels of the cache hierarchy, and # STREAM2 more clearly exposes the performance differences between reads and writes STREAM2 is based on the same ideas as STREAM, but uses a different set of vector kernels: # FILL: similar to bzero(), but fills with a constant instead of zero # COPY: similar to bcopy(), and the same as STREAM Copy # DAXPY: similar to STREAM Triad, but overwrites one of the input vectors instead of writing results to a third vector # SUM: sum reduction on a single vector -- reads only, no writes mpif77 -O5 -o stream2 -qbgl -qtune=440 -qarch=440 stream2.f second_wall.o mpirun -partition Pkaushik -np 1 -cwd `pwd` -exe `pwd`/stream2 0: Smallest time delta is 0.199999976757680997E-05 0: Size Iter FILL COPY DAXPY SUM 0: 30 10 3168.3 5884.5 5326.8 1640.7 26.4 0: 43 10 4716.9 7117.3 6199.1 1702.1 27.4 0: 61 10 5093.8 7529.2 6418.7 1746.7 20.9 0: 88 10 5932.4 8856.8 7822.6 1761.7 16.9 0: 126 10 5923.7 8531.0 7369.9 1790.9 11.8 0: 180 10 6896.5 9016.5 7780.8 1812.0 9.6 0: 258 10 7216.1 9352.9 8069.0 1826.9 7.0 0: 368 10 7680.1 9643.0 8232.1 1842.2 5.2 0: 527 10 7637.1 9520.8 8032.0 1848.2 3.6 0: 754 10 7795.7 9721.6 8290.0 1850.6 2.6 0: 1079 10 7910.5 9753.1 8227.9 1859.0 1.8 0: 1545 10 7965.1 9875.8 8334.5 1861.3 1.3 0: 2210 10 8015.4 3433.1 3427.6 851.4 .9 0: 3163 10 8044.3 3312.4 3442.9 723.8 .6 0: 4525 10 3334.2 3169.4 3175.5 658.2 .2 0: 6475 10 3482.7 3178.5 3182.2 658.4 .1 0: 9266 10 3401.1 3161.9 3190.1 658.4 .1 0: 13258 10 3488.9 3174.0 3193.2 658.5 .1 0: 18971 10 3559.4 3189.7 3191.4 658.2 .0 0: 27146 10 3554.5 3192.0 3194.1 658.3 .0 0: 38844 10 3574.3 3193.9 3195.4 658.4 .0 0: 55582 10 3571.9 3193.7 3195.9 658.3 .0 0: 79532 10 3565.7 3195.0 3196.8 658.4 .0 0: 113802 10 3574.4 3195.4 3197.1 658.5 .0 0: 162840 10 3578.1 3193.9 3197.3 658.4 .0 0: 233008 10 3577.0 3195.7 3197.1 658.4 .0 0: 333411 10 3489.4 1345.4 1970.5 657.0 .0 0: 477079 10 3303.0 1282.0 1955.2 655.9 .0 0: 682653 10 1713.6 1235.2 1951.9 654.1 .0 0: 976810 10 1615.1 1245.2 1965.5 657.1 .0 0: 1397720 10 1709.2 1228.7 1959.8 654.9 .0 0: 2000000 10 1612.6 1243.9 1957.9 655.1 .0