| Home
Projects Software Bio Books Papers Committees Presentations | BenchmarkingThe purpose of benchmarking is to understand the behavior of a system. To this end, benchmarks must beDocumenting BenchmarksAs part of your benchmark, record the environment variables andPATH. The printenv command will output this
data for you.
The compilers also have many options and configuration settings. The
compiler options Compiler OptionsThere are many compiler options that may affect performance. One goal of this phase of the benchmarking is to understand which of these options are valuable, and on what kinds of code.
The compilers are named
-qbgl through -qnoautoconfig). If
possible, run your benchmarks with the following choices (all with
-qbgl -qnoautoconfig):
-O5, -O3, and with 2nd floating point unit
enabled (-qarch=440d) and disabled
(-qarch=440).
Code TuningData alignment and pointer (non)aliasing are important items to consider in tuning code for BG/L (and for many modern processors).Data AlignmentData that is aligned on more than 4- or 8-byte boundaries and that does not cross a cache line can be handled more efficiently in the Power architecture. There are a number of pseudo-functions that can be used to inform the compiler that data has particular alignment properties (the compiler knows the alignment of statically allocated data).
In C/C++, the pseudo function is
void __alignx( int n, const void *addr )
where n is the alignment in the number of bytes that
applies to pointer addr. For example, if x
is aligned on a 16-byte boundary, you can use
__alignx(16,y)
C Users can use
#ifndef HAVE___ALIGNX #define __alignx(a,b) #endifto keep code portable to other compilers.
In Fortran, the pseudo function is Pointer AliasingThis section applies mostly to C and C++ programmers. For the compiler to efficiently schedule load and store commands and to unroll loops for performance, it often needs to know whether two pointers can point at the same data. If so, these pointers are said to alias one another, and the compiler may be unable to perform some optimizations.
Well-written, modern C code will use the
void scale( double *restrict a, const double *restrict b,
const double sc ) {
int i;
for (i=0; i<10; i++) {
a[i] = b[i] * sc;
}
}
With the xlc family of compilers, it may be necessary to use a
#pragma disjoint (*a,*b)tells the xlc compiler that pointers a and b
point to different memory. This is more precise that the C
restrict qualifier, but is not portable to other
systems. The BG/L RedBook recommends the use of the
#pragma form. In a few experiments with the
xlc compiler on icrunch, the #pragma form
appeared to generate better code (independent loads moved ahead of stores).
ParallelismThere are two principle models: the communication co-processor model and the virtual node model. In the communication co-processor model, the second processor is used exclusively for supporting communication. In the virtual node model, each processor supports a separate MPI process, with each processor receiving half the memory of the node.Process to Processor MappingThe way in which processes are assigned to physical processors can be controlled in several ways. The environment variableBGLMPI_MAPPING may be used to
provide simple control of the mapping of processes, relative to their
rank in MPI_COMM_WORLD, to processors and nodes. For
example, when the system is running in virtual
node mode, the processes can be assigned with consequtive pairs on
the same node (that is, ranks 0 and 1 on the first node, ranks 2 and 3
on the second) with the mpirun option
-env BGLMPI_MAPPING=TXYZ
|
| MCS Division | Argonne National Laboratory | University of Chicago |