Introduction
This page describes an interface that can be provided by
implementations of MPI and
exploited by tools such as debuggers to
provide information on MPI programs, particularly the message queues.
This interface is exploited by the TotalView debugger and is implemented
by the Compaq, HP, IBM, MPICH,
MPI Software Technologies, Quadrics, SCALI, SGI, and other
implementations of MPI.
Publications
A
Standard Interface for Debugger Access to Message Queue Information in
MPI (Postscript) (Compressed
Postscript), James
Cownie and William Gropp, in the Proceedings of PVMMPI'99, pages 51-58.
The Files
The interface is described precisely in these files, which are drawn
from the current version of MPICH.
- README
- A detailed description of the interface.
- mpi_interface.h
(download)
- The header that defines the interface
- dll_mpich.c
(download)
- Sample example code which uses this interface to provide message
queue dumping for MPICH 1.2.1
- mpich_dll_defs.h
(download)
- Header for the sample code.
- msgqdllloc.c
(download)
- Supporting file for the MPICH implementation; implements a global
variable containing the location of the DLL.
- debugutil.c
(download)
- Supporting file for the MPICH implementation for
MPIR_Breakpoint (see below)
- attach.h
(download)
- Describes the data structures used in the MPICH implementation to
help the debugger find all MPI processes and initialize when
MPI_Init is called (see below). A more complete description of this
interface is in mpich-attach.txt.
Finding Processes
The interface described in the README
file describes only how the debugger can interface to the MPI
implementation to gather information on message queues. In addition
to this, a debugger (or other tool) needs to be able to find all of
the MPI processes. The interface that is sketched here is provided by
the MPICH implementation and used by TotalView to provide this information.
The global symbols are:
- MPIR_proctable
-
This is an array of structures with type
typedef struct {
char * host_name; /* Something we can pass to inet_addr */
char * executable_name; /* The name of the image */
int pid; /* The pid of the process */
} MPIR_PROCDESC;
(see mpich/mpid/ch2/attach.h in the
MPICH implementation).
- MPIR_proctable_size
- Number of entries in MPIR_proctable
- MPIR_debug_state
- This int has value 0 before
MPIR_proctable is initialized and
MPIR_DEBUG_SPAWNED after MPIR_proctable
has been filled in.
- MPIR_Breakpoint()
- This routine is called by the MPI implementation during
MPI_Init after MPIR_proctable is setup. A debugger
can set a breakpoint here to provide a point to stop after
MPI_Init has set up all MPI processes but before exiting
MPI_Init.
For some tools, one more item of information is needed: the location
(nodename, pid, and executable name) of one of the MPI processes
(often the process with rank 0 in MPI_COMM_WORLD).
This is particularly important for systems that run an MPI program on
a different set of processors than the processor where mpirun is
executed. This information is normally not needed by TotalView,
because the normal use of TotalView with an MPI program is to start
the parallel program under TotalView.
An interface that provides this information is still under
development. In the short term, a simple file may be provided with
this information. In the long term, an API for accessing the
information from the system that run MPI programs will be provided.
This API may simply implement access to a file in a known place or it
may use a service such as LDAP.