Home
Projects
Software
Bio
Books
Papers
Committees
Presentations

Halo Exchange Tests

In these tests, we look at a ``halo'' or ``ghost cell'' exchange. The goal of this test is to understand how best to exploit the multiple communication paths in BG/L.

This test assumes a 2-dimensional mesh. The sizes are chosen with MPI_Dims_create, and the neighbors are chosen using the same ordering as MPI_Cart_create for the case of reorder==false (note that this is not the best ordering, though our current version of MPI does not exploit the reorder==true mode; this is expected in a future release). Code similar to a halo exchange (with contiguous data only) is used with 2, 4, or eight neighbors, reflecting a 1-d decomposition, a 2-d decomposition with a star stencil, and a 2-d decomposition with a box-like stencil (however, all 8 processes get the same amount of data, so this isn't representative of an edge and vertex stencil).

There are three tests in this example:

single sender
Only the process at location (1,1) sends; the neighbor processes receive. The time reported is the time on the (1,1) process, divided by the number of neighbors (i.e., is the total send time divided by the number of neighbors).
all send/receive
All processes both send and receive, using MPI_Isend, MPI_Irecv, and MPI_Waitall. The time reported is the maximum over all processes.
Phased send/receive
Divides the communication into phases where processes either send or receive, but not both. Communication within a phase uses MPI_Isend, MPI_Irecv, and MPI_Waitall. The time reported is the maximum over all processes.
The results are shown in the following table (there are no results for phased send/receive and 8 neighbors at this time):
     bytes                     single sender                                all send/receive                         Phased send/recieve
                     2               4               8               2               4               8               2               4
0:          0    0.000002        0.000002        0.000002        0.000003        0.000003        0.000003        0.000002        0.000002
0:          4    0.000002        0.000002        0.000002        0.000004        0.000003        0.000003        0.000002        0.000002
0:          8    0.000002        0.000002        0.000002        0.000004        0.000003        0.000003        0.000002        0.000002
0:         16    0.000002        0.000002        0.000002        0.000004        0.000003        0.000003        0.000002        0.000002
0:         32    0.000002        0.000002        0.000002        0.000004        0.000003        0.000003        0.000002        0.000002
0:         64    0.000002        0.000002        0.000002        0.000004        0.000004        0.000004        0.000002        0.000002
0:        128    0.000002        0.000002        0.000002        0.000004        0.000004        0.000004        0.000002        0.000002
0:        256    0.000003        0.000003        0.000003        0.000006        0.000006        0.000006        0.000003        0.000003
0:        512    0.000003        0.000003        0.000003        0.000009        0.000008        0.000009        0.000003        0.000003
0:       1024    0.000005        0.000004        0.000004        0.000013        0.000011        0.000015        0.000005        0.000004
0:       2048    0.000008        0.000007        0.000007        0.000019        0.000017        0.000028        0.000008        0.000007
0:       4096    0.000014        0.000013        0.000012        0.000031        0.000031        0.000050        0.000014        0.000013
0:       8192    0.000028        0.000027        0.000022        0.000059        0.000062        0.000108        0.000028        0.000027
0:      16384    0.000061        0.000032        0.000041        0.000140        0.000131        0.000179        0.000062        0.000030
0:      32768    0.000114        0.000058        0.000080        0.000247        0.000254        0.000344        0.000115        0.000057
0:      65536    0.000219        0.000111        0.000159        0.000526        0.000491        0.000668        0.000220        0.000110
0:     131072    0.000432        0.000217        0.000316        0.001026        0.000980        0.001288        0.000432        0.000216
0:     262144    0.000856        0.000429        0.000630        0.002020        0.001973        0.002616        0.000857        0.000428
0:     524288    0.001704        0.000853        0.001259        0.004006        0.003946        0.005082        0.001705        0.000853
0:    1048576    0.003402        0.001702        0.002516        0.008015        0.008027        0.010299        0.003403        0.001702
These results show that the phased approach gives significantly better performance than the unphased performance, and is very close to the single sender performance. Factors of over four can be seen in the 4 neighbor case (at 1MB, a factor of 4.7 faster).

MCS Division Argonne National Laboratory University of Chicago