Nonblocking messages need to be completed eventually; at the MPI level, this is done with the MPI_Wait/MPI_Test family of routines. The corresponding MPID routines are a subset of these and have two forms: a quick test for completion (indicating that no further action is required) and a test-and-complete (similar to MPI_Test). There are separate routines for send and receive operations.
In addition, there is a ``wait-until-something-happens'' routine called MPID_DeviceCheck. This can be null but can be used to allow MPI_Wait to wait more passively (for example, with a select) rather than spinning on tests. It takes one argument of type MPID_BLOCKING_TYPE; it can be used for both blocking and nonblocking checks of the device.
The simplest case is testing for requests that have already completed. This is done by testing the field is_complete in the request structure. The ADI should set this field in the request when the operation is complete. Early versions of this document included a MPID_RecvDone and a MPID_SendDone call, but in the end is was clearer to make this an explicit part of the MPI_Request structure.
If a request is not complete, the routines MPID_xxxxIcomplete can be used. These return true if the request is complete (after calling); that is, they return true if the corresponding MPID_xxxxDone routine would return true. These routines are expected to call the device and perform some additional processing. Note that MPID_RecvIcomplete sets the status variable; just as for a blocking receive, status may be null.
MPID_RecvIcomplete( handle, &status, &error_code ) MPID_SendIcomplete( handle, &error_code )If an error has been detected, these routines return true and set the error_code appropriately. For a send operation (MPID_SendIcomplete), an error is unlikely but could occur, for example, when the destination process disappears. The use of I in Icomplete indicates that these routines do not block waiting for the operation to complete. These really correspond to MPI_Test, which, despite its name, does more than just test (since, on a successful test, it also completes the request).
The routine MPID_DeviceCheck may be used to wait for something to happen. This routine may return at any time; it must return if the is_complete value for any request is changed during the call. For example, if a request would be completed only when an acknowledgment message or data message arrived, then MPID_DeviceCheck could block until a message (from anywhere) arrived. This call is typically used in the implementation of MPI_Waitxxx, as shown below. Note that calling MPID_DeviceCheck itself is not enough to ensure that a request is completed.
There could be separate send versions for each type of send (ready, synchronous, etc.); but since the ADI may map the different send operations into such parts as ``short eager'' or ``long rendezvous'' messages, it makes more sense for the ADI to take the responsibility of remembering what kind of ADI operation the request is performing. In the sample implementation, this information is stored in the MPI request in the completer field (see Section Structure of MPI_Request ).
An illustrative use of these calls in the implementation of MPI_Wait is
shown below.
...
int err = MPI_SUCCESS;
switch ((*request)->type) {
case MPIR_SEND:
if ((*request)->shandle.is_complete) {
do {
MPID_DeviceCheck( MPID_BLOCKING );
} while (!MPID_SendIcomplete( *request, &err ));
}
MPID_SendFree( request );
request = 0;
return err;
break;
...
In the case of MPID_Waitall, the code is a little different because of
the need to wait for completion in any order; this
difference explains the need for MPID_DeviceCheck rather than just a
MPID_SendComplete routine:
...
int err = MPI_SUCCESS;
nleft = n;
while (nleft) {
for (i=0; i<n; i++) {
request = array_of_requests[i];
switch ((request)->type) {
case MPIR_SEND:
if (request->shandle.is_complete) {
MPID_SendFree( request );
array_of_requests[i] = 0;
if (nleft-- == 0) return err;
}
else {
if (MPID_SendIcomplete( request, &err )) {
MPID_SendFree( request );
array_of_requests[i] = 0;
if (nleft-- == 0) return err;
}
}
break;
case MPIR_RECV:
...
}
MPID_DeviceCheck( MPID_BLOCKING );
}
This code is a little sketchy (particularly on the error handling) and is not
appropriate for large n, but it does show the use of
MPID_DeviceCheck to allow the operations in the MPI_Waitall to
complete in any order without requiring the code to constantly call
nonblocking test routines.
The implementation of MPI_Wait shown above suggests that there be the
additional functions
MPID_SendComplete( handle, &error_code ) MPID_RecvComplete( handle, &status, &error_code )which block until the specified operation completes. These can be implemented with MPID_DeviceCheck and MPID_xxxIcomplete, but some devices may be more efficient with these calls (for example, in the MPICH implementation, the Intel nx and Meiko meiko devices would implement these directly). The MPICH implementation of MPI_Waitall etc. makes use of these.