First, we determine the device to use, based on the destination (if there are multiple devices, this can be an array of pointers to structures, indexed by global rank).
Next, we determine which protocol to use (short, eager, rendezvous, get). This is probably based on the size of the message, but could be based on something else, like the number of pending completions, datatype, etc.
We then start the send with the appropriate operation, chosen from the protocol.
Now, if the operation is not already complete (e.g., nonblocking or rendezvous or get send), we need to know how to
The most general way to handle this case is to store the functions in the request. An alternate method, in use in the first generation system, is to store an integer that selects the function. This limits the choices and makes it hard to add new approaches (see below).
Note that we can combine the test and push-test with the choice that a null push-test function means that the request is complete.
Finally, for reasons the will be clear when discussing the noncontiguous operations, it is helpful to have a routine to call to finish the request locally. This can be used, for example, to free any local buffers. The name of this function is finish.