Introduction

This page describes an interface that can be provided by implementations of MPI and exploited by tools such as debuggers to provide information on MPI programs, particularly the message queues.

This interface is exploited by the TotalView debugger and is implemented by the Compaq, HP, IBM, MPICH, MPI Software Technologies, Quadrics, SCALI, SGI, and other implementations of MPI.

Publications

A Standard Interface for Debugger Access to Message Queue Information in MPI (Postscript) (Compressed Postscript), James Cownie and William Gropp, in the Proceedings of PVMMPI'99, pages 51-58.

The Files

The interface is described precisely in these files, which are drawn from the current version of MPICH.
README
A detailed description of the interface.
mpi_interface.h (download)
The header that defines the interface
dll_mpich.c (download)
Sample example code which uses this interface to provide message queue dumping for MPICH 1.2.1
mpich_dll_defs.h (download)
Header for the sample code.
msgqdllloc.c (download)
Supporting file for the MPICH implementation; implements a global variable containing the location of the DLL.
debugutil.c (download)
Supporting file for the MPICH implementation for MPIR_Breakpoint (see below)
attach.h (download)
Describes the data structures used in the MPICH implementation to help the debugger find all MPI processes and initialize when MPI_Init is called (see below). A more complete description of this interface is in mpich-attach.txt.

Finding Processes

The interface described in the README file describes only how the debugger can interface to the MPI implementation to gather information on message queues. In addition to this, a debugger (or other tool) needs to be able to find all of the MPI processes. The interface that is sketched here is provided by the MPICH implementation and used by TotalView to provide this information.

The global symbols are:

MPIR_proctable
This is an array of structures with type
typedef struct {
  char * host_name;           /* Something we can pass to inet_addr */
  char * executable_name;     /* The name of the image */
  int    pid;		      /* The pid of the process */
} MPIR_PROCDESC;
(see mpich/mpid/ch2/attach.h in the MPICH implementation).
MPIR_proctable_size
Number of entries in MPIR_proctable
MPIR_debug_state
This int has value 0 before MPIR_proctable is initialized and MPIR_DEBUG_SPAWNED after MPIR_proctable has been filled in.
MPIR_Breakpoint()
This routine is called by the MPI implementation during MPI_Init after MPIR_proctable is setup. A debugger can set a breakpoint here to provide a point to stop after MPI_Init has set up all MPI processes but before exiting MPI_Init.
For some tools, one more item of information is needed: the location (nodename, pid, and executable name) of one of the MPI processes (often the process with rank 0 in MPI_COMM_WORLD). This is particularly important for systems that run an MPI program on a different set of processors than the processor where mpirun is executed. This information is normally not needed by TotalView, because the normal use of TotalView with an MPI program is to start the parallel program under TotalView.

An interface that provides this information is still under development. In the short term, a simple file may be provided with this information. In the long term, an API for accessing the information from the system that run MPI programs will be provided. This API may simply implement access to a file in a known place or it may use a service such as LDAP.