Implementation of Noncontiguous Operations


Up: Noncontiguous Operations Next: Other Functions Previous: Pack and Unpack

A sample implementation of noncontiguous operations is provided in the MPICH implementation. This implementation uses a few features of the MPICH implementation. In particular, there are special MPIR_Pack2 and MPIR_Unpack2 routines that copy data to and from an MPI_Datatype and a contiguous array of bytes. These routines also handle checks on buffer sizes and can be used to move data that is shorter than expected. Depending on the length of the message, two different approaches are used. If the message is short enough, it is copied directly into the message envelope (which usually has a small data area). If the message is large, it may be copied into a temporary buffer, and that buffer may be sent as contiguous data. Alternatively, a long message may be sent in several separate messages (as is currently done in the ch_shmem device). As an implementation issue, this requires having a way to incrementally process a datatype. This requires saving some information on the current position in the datatype in the request.

The noncontiguous operations need to be careful of the datatype reference counts, particularly in the case of a nonblocking receive or a persistent operation. Note that the code

MPI_Irecv( ..., datatype, ..., &request ); 
    MPI_Type_free( &datatype ); 
    ... 
    MPI_Wait( request, &status ); 
must work, even though the user freed the datatype before completing the receive.

For example, an implementation of MPID_SendDatatype might send small amounts of data in the envelope (the data packet that contains the information on message tag, size, context, etc.). Large amounts of data are first copied into a buffer allocated just for this operation; the buffer is freed when the operation completes. This is sketched below. Note the use of a function MPID_GetLen to get the length of the message in the chosen destination dest_format; this format is passed to the sending routine. (The actual choice for the destination format is described in a forthcoming report.) Homogeneous implementations do not need dest_format, of course. (These examples use the msgrep field; a homogeneous implementation can ignore these, and even, through the use of macro redefinitions, eliminate them from the functions.)


len = MPID_GetLen(count, datatype, msgrep ); 
if (len <= MPID_MAX_PKT_SIZE) { 
    MPID_PKT_T pkt; 
    ... set pkt fields ... 
    MPIR_Pack2( ..., &pkt.data ); 
    MPID_SendPkt( &pkt, .... ); 
    } 
else { 
    void *buf = malloc(len);  
    if (!buf) {*error_code=MPI_ERR_EXHAUSTED;return;} 
    MPIR_Pack2( ..., buf ); 
    MPID_SendContig( comm, buf, len, ...,  
                           msgrep, error_code ); 
    free( buf ); 
    return; 
    } 
The routines MPID_GetLen and MPID_SendPkt are not part of the ADI specification; they are merely examples of routines that may be internal to the ADI implementation.

The implementation of MPID_IsendDatatype is similar; it uses fields in the MPI_Request to save the data:

len = MPID_GetLen(count, datatype, dest_format); 
if (len <= MPID_MAX_PKT_SIZE) { 
    MPID_PKT_T pkt; 
    ... set pkt fields ... 
    MPIR_Pack2( ..., &pkt.data ); 
    MPID_SendPkt( &pkt, .... ); 
    } 
else { 
    void *buf = malloc(len);  
    if (!buf) {*error_code=MPI_ERR_EXHAUSTED;return;} 
    MPIR_Pack2( ..., buf ); 
    request->shandle.start = buf; 
    MPID_IsendContig( comm, buf, len, ...,  
                  dest_format, &request, error_code ); 
    return; 
    } 
The MPID wait/test code for a send request for noncontiguous messages in the above implemenation includes
if (request->shandle.start) { 
        free( request->shandle.start ); 
        } 
The field start in the request is an example that a particular ADI implementation may choose; it represents a pointer to a memory area allocated by the ADI for handling message data.

More sophisticated implementations could avoid allocating a buffer for the entire message; that is a ``quality of implementation'' issue. Note that a three-case implementation may be most appropriate: small amounts of data in the envelope, modest amounts in an allocated buffer, and large amounts sent in multiple parts.



Up: Noncontiguous Operations Next: Other Functions Previous: Pack and Unpack