AUTOPACK provides a way to determine the number of incoming messages that result from a group of sends (Figure 5). Each processor calls AP_send_begin() prior to sending its messages, and AP_send_end() afterwards. During this interval, on each processor the library automatically keeps track of how many messages are sent to each destination. A global reduction of this data yields the total number of messages to be received on each processor. The library can perform this reduction without global synchronization. A call to AP_recv_count() queries the result of the reduction. If the return value is zero, the reduction is not complete and the processor should wait for additional messages. It is safe to block on incoming messages as long as the AP_DROPOUT flag is specified. This flag causes the call to unblock if the result of the reduction arrives. When AP_recv_count() indicates the reduction is complete, the argument will indicate the number of messages that were sent to this processor. This number can be compared to the number already received to see how many more are yet to arrive. After receiving all its messages, a processor should call AP_check_sends() before proceeding so as to ensure that any deferred messages are sent out.
The entire process may be repeated without performing any synchronization. However, be careful that messages sent from processors that enter the second stage early are not accidentally received by processors in the first stage, which will cause confusion in the count. One way to avoid this problem is to use distinct tags for sending in each stage of the communication, and do not use MPI_ANY_TAG when receiving.