This document describes the MPI-1.2 and MPI-2 standards. They
are both extensions to the MPI-1.1 standard. The
MPI-1.2 part of the document contains clarifications and corrections to
the MPI-1.1 standard and defines MPI-1.2. The MPI-2 part of the document describes
additions to the MPI-1 standard and defines MPI-2. These include miscellaneous
topics, process creation and management, one-sided communications,
extended collective operations, external interfaces, I/O, and
additional language bindings.
(c) 1995, 1996, 1997 University of Tennessee, Knoxville, Tennessee. Permission to copy without fee all or part of this material is granted, provided the University of Tennessee copyright notice and the title of this document appear, and notice is given that copying is by permission of the University of Tennessee.
Acknowledgments
This document represents the work of many people who have served on the MPI Forum. The meetings have been attended by dozens of people from many parts of the world. It is the hard and dedicated work of this group that has led to the MPI standard.
The technical development was carried out by subgroups, whose work was reviewed by the full committee. During the period of development of the Message Passing Interface ( MPI-2), many people helped with this effort. Those who served as the primary coordinators are:
The following list includes some of the active participants who attended MPI-2 Forum meetings and are not mentioned above.
| Greg Astfalk | Robert Babb | Ed Benson | Rajesh Bordawekar |
| Pete Bradley | Peter Brennan | Ron Brightwell | Maciej Brodowicz |
| Eric Brunner | Greg Burns | Margaret Cahir | Pang Chen |
| Ying Chen | Albert Cheng | Yong Cho | Joel Clark |
| Lyndon Clarke | Laurie Costello | Dennis Cottel | Jim Cownie |
| Zhenqian Cui | Suresh Damodaran-Kamal | Raja Daoud | Judith Devaney |
| David DiNucci | Doug Doefler | Jack Dongarra | Terry Dontje |
| Nathan Doss | Anne Elster | Mark Fallon | Karl Feind |
| Sam Fineberg | Craig Fischberg | Stephen Fleischman | Ian Foster |
| Hubertus Franke | Richard Frost | Al Geist | Robert George |
| David Greenberg | John Hagedorn | Kei Harada | Leslie Hart |
| Shane Hebert | Rolf Hempel | Tom Henderson | Alex Ho |
| Hans-Christian Hoppe | Joefon Jann | Terry Jones | Karl Kesselman |
| Koichi Konishi | Susan Kraus | Steve Kubica | Steve Landherr |
| Mario Lauria | Mark Law | Juan Leon | Lloyd Lewins |
| Ziyang Lu | Bob Madahar | Peter Madams | John May |
| Oliver McBryan | Brian McCandless | Tyce McLarty | Thom McMahon |
| Harish Nag | Nick Nevin | Jarek Nieplocha | Ron Oldfield |
| Peter Ossadnik | Steve Otto | Peter Pacheco | Yoonho Park |
| Perry Partow | Pratap Pattnaik | Elsie Pierce | Paul Pierce |
| Heidi Poxon | Jean-Pierre Prost | Boris Protopopov | James Pruyve |
| Rolf Rabenseifner | Joe Rieken | Peter Rigsbee | Tom Robey |
| Anna Rounbehler | Nobutoshi Sagawa | Arindam Saha | Eric Salo |
| Darren Sanders | Eric Sharakan | Andrew Sherman | Fred Shirley |
| Lance Shuler | A. Gordon Smith | Ian Stockdale | David Taylor |
| Stephen Taylor | Greg Tensa | Rajeev Thakur | Marydell Tholburn |
| Dick Treumann | Simon Tsang | Manuel Ujaldon | David Walker |
| Jerrell Watts | Klaus Wolf | Parkson Wong | Dave Wright |
The MPI Forum also acknowledges and appreciates the valuable input from people via e-mail and in person.
The following institutions supported the MPI-2 effort through time and travel support for the people listed above.
Argonne National Laboratory
Bolt, Beranek, and Newman
California Institute of Technology
Center for Computing Sciences
Convex Computer Corporation
Cray Research
Digital Equipment Corporation
Dolphin Interconnect Solutions, Inc.
Edinburgh Parallel Computing Centre
General Electric Company
German National Research Center for Information Technology
Hewlett-Packard
Hitachi
Hughes Aircraft Company
Intel Corporation
International Business Machines
Khoral Research
Lawrence Livermore National Laboratory
Los Alamos National Laboratory
MPI Software Techology, Inc.
Mississippi State University
NEC Corporation
National Aeronautics and Space Administration
National Energy Research Scientific Computing Center
National Institute of Standards and Technology
National Oceanic and Atmospheric Adminstration
Oak Ridge National Laboratory
Ohio State University
PALLAS GmbH
Pacific Northwest National Laboratory
Pratt & Whitney
San Diego Supercomputer Center
Sanders, A Lockheed-Martin Company
Sandia National Laboratories
Schlumberger
Scientific Computing Associates, Inc.
Silicon Graphics Incorporated
Sky Computers
Sun Microsystems Computer Corporation
Syracuse University
The MITRE Corporation
Thinking Machines Corporation
United States Navy
University of Colorado
University of Denver
University of Houston
University of Illinois
University of Maryland
University of Notre Dame
University of San Fransisco
University of Stuttgart Computing Center
University of Wisconsin
MPI-2 operated on a very tight budget (in reality, it had no budget when the first meeting was announced). Many institutions helped the MPI-2 effort by supporting the efforts and travel of the members of the MPI Forum. Direct support was given by NSF and DARPA under NSF contract CDA-9115428 for travel by U.S. academic participants and Esprit under project HPC Standards (21111) for European participants.
Beginning in March 1995, the MPI Forum began meeting to
consider corrections and extensions to the original MPI Standard
document [5]. The first product of these deliberations was
Version 1.1 of the MPI specification, released in June of 1995 (see
http://www.mpi-forum.org for official MPI document releases).
Since that time, effort has been focused in five types of areas.
Corrections and clarifications (items of type 1 in the above list) have been collected in Chapter Version 1.2 of MPI of this document, ``Version 1.2 of MPI.'' This chapter also contains the function for identifying the version number. Additions to MPI-1.1 (items of types 2, 3, and 4 in the above list) are in the remaining chapters, and constitute the specification for MPI-2. This document specifies Version 2.0 of MPI. Items of type 5 in the above list have been moved to a separate document, the ``MPI Journal of Development'' (JOD), and are not part of the MPI-2 Standard.
This structure makes it easy for users and implementors to understand what level of MPI compliance a given implementation has:
It is to be emphasized that forward compatibility is preserved. That is, a valid MPI-1.1 program is both a valid MPI-1.2 program and a valid MPI-2 program, and a valid MPI-1.2 program is a valid MPI-2 program.
This document is organized as follows:
The rest of this document contains the MPI-2 Standard Specification. It adds substantial new types of functionality to MPI, in most cases specifying functions for an extended computational model (e.g., dynamic process creation and one-sided communication) or for a significant new capability (e.g., parallel I/O).
The following is a list of the chapters in MPI-2, along with a brief description of each.
MPI-2 provides various interfaces to facilitate interoperability of distinct MPI implementations. Among these are the canonical data representation for MPI I/O and for MPI_PACK_EXTERNAL and MPI_UNPACK_EXTERNAL. The definition of an actual binding of these interfaces that will enable interoperability is outside the scope of this document.
A separate document consists of ideas that were discussed in the MPI Forum and deemed to have value, but are not included in the MPI Standard. They are part of the ``Journal of Development'' (JOD), lest good ideas be lost and in order to provide a starting point for further work. The chapters in the JOD are
This chapter explains notational terms and conventions used throughout the MPI-2 document, some of the choices that have been made, and the rationale behind those choices. It is similar to the MPI-1 Terms and Conventions chapter but differs in some major and minor ways. Some of the major areas of difference are the naming conventions, some semantic definitions, file objects, Fortran 90 vs Fortran 77, C++, processes, and interaction with signals.
Rationale.
Throughout this document, the rationale for the design choices made in
the interface specification is set off in this format. Some readers may
wish to skip these sections, while readers interested in interface design
may want to read them carefully.
( End of rationale.)
Advice to users.
Throughout this document, material aimed at users and that illustrates
usage is set off in this format. Some readers may
wish to skip these sections, while readers interested in programming in MPI
may want to read them carefully.
( End of advice to users.)
Advice
to implementors.
Throughout this document, material that is primarily commentary to implementors
is set off in this format. Some readers may
wish to skip these sections, while readers interested in
MPI implementations may want to read them carefully.
( End of advice to implementors.)
MPI-1 used informal naming conventions. In many cases, MPI-1 names for C functions are of the form Class_action_subset and in Fortran of the form CLASS_ACTION_SUBSET, but this rule is not uniformly applied. In MPI-2, an attempt has been made to standardize names of new functions according to the following rules. In addition, the C++ bindings for MPI-1 functions also follow these rules (see Section C++ Binding Issues ). C and Fortran function names for MPI-1 have not been changed.
2. If the routine is not associated with a class, the name
should be of the form Action_subset in C and
ACTION_SUBSET in Fortran,
and in C++ should be scoped in the MPI namespace,
MPI::Action_subset.
3. The names of certain actions have been standardized. In
particular, Create creates a new object, Get
retrieves information about an object, Set sets
this information, Delete deletes information,
Is asks whether or not an object has a certain property.
MPI identifiers are limited to 30 characters (31 with the profiling interface). This is done to avoid exceeding the limit on some compilation systems.
MPI procedures are specified using a language-independent notation. The arguments of procedure calls are marked as IN, OUT or INOUT. The meanings of these are:
Rationale.
The definition of MPI tries to avoid, to the largest possible extent,
the use of INOUT arguments, because such use is error-prone,
especially for scalar arguments.
( End of rationale.)
MPI's use of IN, OUT and INOUT is intended
to indicate to the user how an argument is
to be used, but
does not provide a rigorous classification that can be translated
directly into
all
language bindings (e.g., INTENT in Fortran 90 bindings
or const in C bindings). For instance, the ``constant''
MPI_BOTTOM can usually be passed to OUT buffer
arguments. Similarly, MPI_STATUS_IGNORE can be passed as the
OUT status argument.
A common occurrence for MPI functions is an argument that is used as IN by some processes and OUT by other processes. Such an argument is, syntactically, an INOUT argument and is marked as such, although, semantically, it is not used in one call both for input and for output on a single process.
Another frequent situation arises when an argument value is needed only by a subset of the processes. When an argument is not significant at a process then an arbitrary value can be passed as an argument.
Unless specified otherwise, an argument of type OUT or type
INOUT cannot be aliased with any other argument passed to an
MPI procedure. An example of argument aliasing in C appears below.
If we define a C procedure like this,
void copyIntBuffer( int *pin, int *pout, int len )
{ int i;
for (i=0; i<len; ++i) *pout++ = *pin++;
}
then a call to it in the following code fragment has aliased arguments.
int a[10]; copyIntBuffer( a, a+3, 7);Although the C language allows this, such usage of MPI procedures is forbidden unless otherwise specified. Note that Fortran prohibits aliasing of arguments.
All MPI functions are first specified in the language-independent notation. Immediately below this, the ANSI C version of the function is shown followed by a version of the same function in Fortran and then the C++ binding. Fortran in this document refers to Fortran 90; see Section Language Binding .
When discussing MPI procedures the following semantic terms are used.
MPI manages system memory that is used for buffering messages and for storing internal representations of various MPI objects such as groups, communicators, datatypes, etc. This memory is not directly accessible to the user, and objects stored there are opaque: their size and shape is not visible to the user. Opaque objects are accessed via handles, which exist in user space. MPI procedures that operate on opaque objects are passed handle arguments to access these objects. In addition to their use by MPI calls for object access, handles can participate in assignments and comparisons.
In Fortran, all handles have type INTEGER. In C and C++, a different handle type is defined for each category of objects. In addition, handles themselves are distinct objects in C++. The C and C++ types must support the use of the assignment and equality operators.
Advice
to implementors.
In Fortran, the handle can be an index into a table of opaque objects in a system table; in C it can be such an index or a pointer to the object. C++ handles can simply ``wrap up'' a table index or pointer.
( End of advice to implementors.)
Opaque objects are allocated and deallocated
by calls that are specific to each object type.
These are listed in the sections where the objects are described.
The calls accept a handle argument of matching type.
In an allocate call this is an OUT argument that
returns a valid reference to the object.
In a call to deallocate this is an INOUT argument which returns
with an ``invalid handle'' value.
MPI provides an ``invalid handle'' constant
for each object type. Comparisons to this constant are used to test for
validity of the handle.
A call to a deallocate routine invalidates the handle and marks the object for deallocation. The object is not accessible to the user after the call. However, MPI need not deallocate the object immediately. Any operation pending (at the time of the deallocate) that involves this object will complete normally; the object will be deallocated afterwards.
An opaque object and its handle are significant only at the process where the object was created and cannot be transferred to another process.
MPI provides certain predefined opaque objects and predefined, static handles to these objects. The user must not free such objects. In C++, this is enforced by declaring the handles to these predefined objects to be static const.
Rationale.
This design hides the internal representation used for MPI data structures, thus allowing similar calls in C, C++, and Fortran. It also avoids conflicts with the typing rules in these languages, and easily allows future extensions of functionality. The mechanism for opaque objects used here loosely follows the POSIX Fortran binding standard.
The explicit separation of handles in user space and objects in system space allows space-reclaiming and deallocation calls to be made at appropriate points in the user program. If the opaque objects were in user space, one would have to be very careful not to go out of scope before any pending operation requiring that object completed. The specified design allows an object to be marked for deallocation, the user program can then go out of scope, and the object itself still persists until any pending operations are complete.
The requirement that handles support
assignment/comparison is made since
such operations are common.
This restricts the domain of possible implementations.
The alternative would have been
to allow handles to have been an arbitrary, opaque type. This would
force the introduction of routines to do assignment and comparison, adding
complexity, and was therefore ruled out.
( End of rationale.)
Advice to users.
A user may accidently create a dangling reference by assigning to a
handle the value of another handle, and then deallocating the object
associated with these handles. Conversely, if a handle variable is
deallocated before the associated object is freed, then the object
becomes inaccessible (this may occur, for example, if the handle is a
local variable within a subroutine, and the subroutine is exited
before the associated object is deallocated). It is the user's
responsibility to avoid adding or deleting references to opaque
objects, except as a result of MPI calls that allocate or deallocate
such objects.
( End of advice to users.)
Advice
to implementors.
The intended semantics of opaque objects is that opaque objects are separate
from one another; each call to allocate such an object copies all the information
required for the object. Implementations may avoid excessive copying by
substituting referencing for copying. For example, a derived datatype
may contain
references to its components, rather then copies of its components; a call to
MPI_COMM_GROUP may return a reference to the group associated with
the communicator, rather than a copy of this group. In such cases, the
implementation must maintain reference counts, and allocate and deallocate
objects in such a way that the visible effect is as if the objects were copied.
( End of advice to implementors.)
An MPI call may need an argument that is an array of opaque objects, or an array of handles. The array-of-handles is a regular array with entries that are handles to objects of the same type in consecutive locations in the array. Whenever such an array is used, an additional len argument is required to indicate the number of valid entries (unless this number can be derived otherwise). The valid entries are at the beginning of the array; len indicates how many of them there are, and need not be the size of the entire array. The same approach is followed for other array arguments. In some cases NULL handles are considered valid entries. When a NULL argument is desired for an array of statuses, one uses MPI_STATUSES_IGNORE.
MPI procedures use at various places arguments with state types. The values of such a data type are all identified by names, and no operation is defined on them. For example, the MPI_TYPE_CREATE_SUBARRAY routine has a state argument order with values MPI_ORDER_C and MPI_ORDER_FORTRAN.
MPI procedures sometimes assign a special meaning to a special value of a basic type argument; e.g., tag is an integer-valued argument of point-to-point communication operations, with a special wild-card value, MPI_ANY_TAG. Such arguments will have a range of regular values, which is a proper subrange of the range of values of the corresponding basic type; special values (such as MPI_ANY_TAG) will be outside the regular range. The range of regular values, such as tag, can be queried using environmental inquiry functions (Chapter 7 of the MPI-1 document). The range of other values, such as source, depends on values given by other MPI routines (in the case of source it is the communicator size).
MPI also provides predefined named constant handles, such as MPI_COMM_WORLD.
All named constants, with the exceptions noted below for Fortran, can be used in initialization expressions or assignments. These constants do not change values during execution. Opaque objects accessed by constant handles are defined and do not change value between MPI initialization ( MPI_INIT) and MPI completion ( MPI_FINALIZE).
The constants that cannot be used in initialization expressions or
assignments in Fortran are:
MPI_BOTTOM MPI_STATUS_IGNORE MPI_STATUSES_IGNORE MPI_ERRCODES_IGNORE MPI_IN_PLACE MPI_ARGV_NULL MPI_ARGVS_NULL
In Fortran the implementation of these special constants may require the
use of language constructs that are outside the Fortran
standard. Using special values for the constants (e.g., by defining
them through parameter statements) is not possible because an
implementation cannot distinguish these values from legal
data. Typically, these constants are implemented as predefined
static variables (e.g., a variable in an MPI-declared COMMON
block), relying on the fact that the target compiler passes data by
address. Inside the subroutine, this address can be extracted by some
mechanism outside the Fortran standard (e.g., by Fortran extensions or
by implementing the function in C).
( End of advice to implementors.)
MPI functions sometimes use arguments with a choice (or union) data type. Distinct calls to the same routine may pass by reference actual arguments of different types. The mechanism for providing such arguments will differ from language to language. For Fortran, the document uses <type> to represent a choice variable; for C and C++, we use void *.
Some MPI procedures use address arguments that represent an absolute address in the calling program. The datatype of such an argument is MPI_Aint in C, MPI::Aint in C++ and INTEGER (KIND=MPI_ADDRESS_KIND) in Fortran. There is the MPI constant MPI_BOTTOM to indicate the start of the address range.
For I/O there is a need to give the size, displacement, and offset into a file. These quantities can easily be larger than 32 bits which can be the default size of a Fortran integer. To overcome this, these quantities are declared to be INTEGER (KIND=MPI_OFFSET_KIND) in Fortran. In C one uses MPI_Offset whereas in C++ one uses MPI::Offset.
This section defines the rules for MPI language binding in general and for Fortran, ANSI C, and C++, in particular. (Note that ANSI C has been replaced by ISO C. References in MPI to ANSI C now mean ISO C.) Defined here are various object representations, as well as the naming conventions used for expressing this standard. The actual calling sequences are defined elsewhere.
MPI bindings are for Fortran 90, though they are designed to be usable in Fortran 77 environments.
Since the word PARAMETER is a keyword in the Fortran language, we use the word ``argument'' to denote the arguments to a subroutine. These are normally referred to as parameters in C and C++, however, we expect that C and C++ programmers will understand the word ``argument'' (which has no specific meaning in C/C++), thus allowing us to avoid unnecessary confusion for Fortran programmers.
Since Fortran is case insensitive, linkers may use either lower case or upper case when resolving Fortran names. Users of case sensitive languages should avoid the ``mpi_'' and ``pmpi_'' prefixes.
A number of chapters refer to deprecated or replaced MPI-1 constructs. These are constructs that continue to be part of the MPI standard, but that users are recommended not to continue using, since MPI-2 provides better solutions. For example, the Fortran binding for MPI-1 functions that have address arguments uses INTEGER. This is not consistent with the C binding, and causes problems on machines with 32 bit INTEGERs and 64 bit addresses. In MPI-2, these functions have new names, and new bindings for the address arguments. The use of the old functions is deprecated. For consistency, here and a few other cases, new C functions are also provided, even though the new functions are equivalent to the old functions. The old names are deprecated. Another example is provided by the MPI-1 predefined datatypes MPI_UB and MPI_LB. They are deprecated, since their use is awkward and error-prone, while the MPI-2 function MPI_TYPE_CREATE_RESIZED provides a more convenient mechanism to achieve the same effect.
The following is a list of all of the deprecated constructs. Note that the constants MPI_LB and MPI_UB are replaced by the function MPI_TYPE_CREATE_RESIZED; this is because their principle use was as input datatypes to MPI_TYPE_STRUCT to create resized datatypes. Also note that some C typedefs and Fortran subroutine names are included in this list; they are the types of callback functions.
| Deprecated | MPI-2 Replacement |
| MPI_ADDRESS | MPI_GET_ADDRESS |
| MPI_TYPE_HINDEXED | MPI_TYPE_CREATE_HINDEXED |
| MPI_TYPE_HVECTOR | MPI_TYPE_CREATE_HVECTOR |
| MPI_TYPE_STRUCT | MPI_TYPE_CREATE_STRUCT |
| MPI_TYPE_EXTENT | MPI_TYPE_GET_EXTENT |
| MPI_TYPE_UB | MPI_TYPE_GET_EXTENT |
| MPI_TYPE_LB | MPI_TYPE_GET_EXTENT |
| MPI_LB | MPI_TYPE_CREATE_RESIZED |
| MPI_UB | MPI_TYPE_CREATE_RESIZED |
| MPI_ERRHANDLER_CREATE | MPI_COMM_CREATE_ERRHANDLER |
| MPI_ERRHANDLER_GET | MPI_COMM_GET_ERRHANDLER |
| MPI_ERRHANDLER_SET | MPI_COMM_SET_ERRHANDLER |
| MPI_Handler_function | MPI_Comm_errhandler_fn |
| MPI_KEYVAL_CREATE | MPI_COMM_CREATE_KEYVAL |
| MPI_KEYVAL_FREE | MPI_COMM_FREE_KEYVAL |
| MPI_DUP_FN | MPI_COMM_DUP_FN |
| MPI_NULL_COPY_FN | MPI_COMM_NULL_COPY_FN |
| MPI_NULL_DELETE_FN | MPI_COMM_NULL_DELETE_FN |
| MPI_Copy_function | MPI_Comm_copy_attr_function |
| COPY_FUNCTION | COMM_COPY_ATTR_FN |
| MPI_Delete_function | MPI_Comm_delete_attr_function |
| DELETE_FUNCTION | COMM_DELETE_ATTR_FN |
| MPI_ATTR_DELETE | MPI_COMM_DELETE_ATTR |
| MPI_ATTR_GET | MPI_COMM_GET_ATTR |
| MPI_ATTR_PUT | MPI_COMM_SET_ATTR |
MPI-1.1 provided bindings for Fortran 77. MPI-2 retains these bindings but they are now interpreted in the context of the Fortran 90 standard. MPI can still be used with most Fortran 77 compilers, as noted below. When the term Fortran is used it means Fortran 90.
All MPI names have an MPI_ prefix, and all characters are capitals. Programs must not declare variables, parameters, or functions with names beginning with the prefix MPI_. To avoid conflicting with the profiling interface, programs should also avoid functions with the prefix PMPI_. This is mandated to avoid possible name collisions.
All MPI Fortran subroutines have a return code in the last argument. A few MPI operations which are functions do not have the return code argument. The return code value for successful completion is MPI_SUCCESS. Other error codes are implementation dependent; see the error codes in Chapter 7 of the MPI-1 document and Annex Language Binding in the MPI-2 document.
Constants representing the maximum length of a string are one smaller in Fortran than in C and C++ as discussed in Section Constants .
Handles are represented in Fortran as INTEGERs. Binary-valued variables are of type LOGICAL.
Array arguments are indexed from one.
The MPI Fortran binding is inconsistent with the Fortran 90 standard in several respects. These inconsistencies, such as register optimization problems, have implications for user codes that are discussed in detail in Section A Problem with Register Optimization . They are also inconsistent with Fortran 77.
We use the ANSI C declaration format. All MPI names have an MPI_ prefix, defined constants are in all capital letters, and defined types and functions have one capital letter after the prefix. Programs must not declare variables or functions with names beginning with the prefix MPI_. To support the profiling interface, programs should not declare functions with names beginning with the prefix PMPI_.
The definition of named constants, function prototypes, and type definitions must be supplied in an include file mpi.h.
Almost all C functions return an error code. The successful return code will be MPI_SUCCESS, but failure return codes are implementation dependent.
Type declarations are provided for handles to each category of opaque objects.
Array arguments are indexed from zero.
Logical flags are integers with value 0 meaning ``false'' and a non-zero value meaning ``true.''
Choice arguments are pointers of type void *.
Address arguments are of MPI defined type MPI_Aint. File displacements are of type MPI_Offset. MPI_Aint is defined to be an integer of the size needed to hold any valid address on the target architecture. MPI_Offset is defined to be an integer of the size needed to hold any valid file size on the target architecture.
There are places in the standard that give rules for C and not for C++. In these cases, the C rule should be applied to the C++ case, as appropriate. In particular, the values of constants given in the text are the ones for C and Fortran. A cross index of these with the C++ names is given in Annex Language Binding .
We use the ANSI C++ declaration format. All MPI names are declared within the scope of a namespace called MPI and therefore are referenced with an MPI:: prefix. Defined constants are in all capital letters, and class names, defined types, and functions have only their first letter capitalized. Programs must not declare variables or functions in the MPI namespace. This is mandated to avoid possible name collisions.
The definition of named constants, function prototypes, and type definitions must be supplied in an include file mpi.h.
Advice
to implementors.
The file mpi.h may contain both the C and C++ definitions.
Usually one can simply use the defined value (generally __cplusplus,
but not required) to see if one is using
C++ to protect the C++ definitions. It is possible that a C compiler
will require that the source protected this way be legal C code. In
this case, all the C++ definitions can be placed in a different
include file and the ``#include'' directive can be used to include the
necessary C++ definitions in the mpi.h file.
( End of advice to implementors.)
C++ functions that create objects or return information usually place
the object or information in the return value. Since the language
neutral prototypes of MPI functions include the C++ return value as
an OUT parameter, semantic descriptions of MPI functions refer to
the C++ return value by that parameter name (see
Section Function Name Cross Reference
).
The remaining C++ functions return void.
In some circumstances, MPI permits users to indicate that they do not want a return value. For example, the user may indicate that the status is not filled in. Unlike C and Fortran where this is achieved through a special input value, in C++ this is done by having two bindings where one has the optional argument and one does not.
C++ functions do not return error codes. If the default error handler has been set to MPI::ERRORS_THROW_EXCEPTIONS, the C++ exception mechanism is used to signal an error by throwing an MPI::Exception object.
It should be noted that the default error handler (i.e., MPI::ERRORS_ARE_FATAL) on a given type has not changed. User error handlers are also permitted. MPI::ERRORS_RETURN simply returns control to the calling function; there is no provision for the user to retrieve the error code.
User callback functions that return integer error codes should not throw exceptions; the returned error will be handled by the MPI implementation by invoking the appropriate error handler.
Advice to users.
C++ programmers that want to handle MPI errors on their own should
use the MPI::ERRORS_THROW_EXCEPTIONS error handler, rather
than MPI::ERRORS_RETURN, that is used for that purpose in
C. Care should be taken using exceptions in mixed language
situations.
( End of advice to users.)
Opaque object handles must be objects in themselves, and have the
assignment and equality operators overridden to perform semantically
like their C and Fortran counterparts.
Array arguments are indexed from zero.
Logical flags are of type bool.
Choice arguments are pointers of type void *.
Address arguments are of MPI-defined integer type MPI::Aint, defined to be an integer of the size needed to hold any valid address on the target architecture. Analogously, MPI::Offset is an integer to hold file offsets.
Most MPI functions are methods of MPI C++ classes. MPI class names are generated from the language neutral MPI types by dropping the MPI_ prefix and scoping the type within the MPI namespace. For example, MPI_DATATYPE becomes MPI::Datatype.
The names of MPI-2 functions generally follow the naming rules given. In some circumstances, the new MPI-2 function is related to an MPI-1 function with a name that does not follow the naming conventions. In this circumstance, the language neutral name is in analogy to the MPI-1 name even though this gives an MPI-2 name that violates the naming conventions. The C and Fortran names are the same as the language neutral name in this case. However, the C++ names for MPI-1 do reflect the naming rules and can differ from the C and Fortran names. Thus, the analogous name in C++ to the MPI-1 name is different than the language neutral name. This results in the C++ name differing from the language neutral name. An example of this is the language neutral name of MPI_FINALIZED and a C++ name of MPI::Is_finalized.
In C++, function typedefs are made publicly within appropriate classes. However, these declarations then become somewhat cumbersome, as with the following:
typedef MPI::Grequest::Query_function();
would look like the following:
namespace MPI {
class Request {
// ...
};
class Grequest : public MPI::Request {
// ...
typedef Query_function(void* extra_state, MPI::Status& status);
};
};
Rather than including this scaffolding when declaring
C++ typedefs, we use an abbreviated form. In
particular, we explicitly indicate the class and namespace scope for
the typedef of the function. Thus, the example above is
shown in the text as follows:
typedef int MPI::Grequest::Query_function(void* extra_state,
MPI::Status& status)
The C++ bindings presented in Annex MPI-1 C++ Language Binding and throughout this document were generated by applying a simple set of name generation rules to the MPI function specifications. While these guidelines may be sufficient in most cases, they may not be suitable for all situations. In cases of ambiguity or where a specific semantic statement is desired, these guidelines may be superseded as the situation dictates.
2. Arrays of MPI handles are always left in the argument list
(whether they are IN or OUT arguments).
3. If the argument list of an MPI function contains a scalar IN
handle, and it makes sense to define the function as a method of the
object corresponding to that handle, the function is made a member
function of the corresponding MPI class.
The member functions are named according to the corresponding MPI
function name, but without the `` MPI_'' prefix and without
the object name prefix (if applicable). In addition:
2. The function is declared const.
5. If the argument list contains a single OUT argument that is
not of type MPI_STATUS (or an array), that argument is
dropped from the list and the function returns that value.
Example
The C++ binding for MPI_COMM_SIZE is
int MPI::Comm::Get_size(void) const.
6. If there are multiple OUT arguments in the argument list, one
is chosen as the return value and is removed from the list.
7. If the argument list does not contain any OUT arguments,
the function returns void.
Example
The C++ binding for MPI_REQUEST_FREE is void
MPI::Request::Free(void)
8. MPI functions to which the above rules do not apply are not
members of any class, but are defined in the MPI namespace.
Example
The C++ binding for MPI_BUFFER_ATTACH is
void MPI::Attach_buffer(void* buffer, int size).
9. All class names, defined types, and function names have only
their first letter capitalized. Defined constants are in all
capital letters.
10. Any IN pointer, reference, or array argument must be declared
const.
11. Handles are passed by reference.
12. Array arguments are denoted with square brackets ( []), not
pointers, as this is more semantically precise.
An MPI program consists of autonomous processes, executing their own code, in a MIMD style. The codes executed by each process need not be identical. The processes communicate via calls to MPI communication primitives. Typically, each process executes in its own address space, although shared-memory implementations of MPI are possible.
This document specifies the behavior of a parallel program assuming that only MPI calls are used. The interaction of an MPI program with other possible means of communication, I/O, and process management is not specified. Unless otherwise stated in the specification of the standard, MPI places no requirements on the result of its interaction with external mechanisms that provide similar or equivalent functionality. This includes, but is not limited to, interactions with external mechanisms for process control, shared and remote memory access, file system access and control, interprocess communication, process signaling, and terminal I/O. High quality implementations should strive to make the results of such interactions intuitive to users, and attempt to document restrictions where deemed necessary.
Advice
to implementors.
Implementations that support such additional mechanisms for
functionality supported within MPI are expected to document how
these interact with MPI.
( End of advice to implementors.)
The interaction of MPI and threads is defined in
Section MPI and Threads
.
MPI provides the user with reliable message transmission. A message sent is always received correctly, and the user does not need to check for transmission errors, time-outs, or other error conditions. In other words, MPI does not provide mechanisms for dealing with failures in the communication system. If the MPI implementation is built on an unreliable underlying mechanism, then it is the job of the implementor of the MPI subsystem to insulate the user from this unreliability, or to reflect unrecoverable errors as failures. Whenever possible, such failures will be reflected as errors in the relevant communication call. Similarly, MPI itself provides no mechanisms for handling processor failures.
Of course, MPI programs may still be erroneous. A program error can occur when an MPI call is made with an incorrect argument (non-existing destination in a send operation, buffer too small in a receive operation, etc.). This type of error would occur in any implementation. In addition, a resource error may occur when a program exceeds the amount of available system resources (number of pending messages, system buffers, etc.). The occurrence of this type of error depends on the amount of available resources in the system and the resource allocation mechanism used; this may differ from system to system. A high-quality implementation will provide generous limits on the important resources so as to alleviate the portability problem this represents.
In C and Fortran, almost all MPI calls return a code that indicates successful completion of the operation. Whenever possible, MPI calls return an error code if an error occurred during the call. By default, an error detected during the execution of the MPI library causes the parallel computation to abort, except for file operations. However, MPI provides mechanisms for users to change this default and to handle recoverable errors. The user may specify that no error is fatal, and handle error codes returned by MPI calls by himself or herself. Also, the user may provide his or her own error-handling routines, which will be invoked whenever an MPI call returns abnormally. The MPI error handling facilities are described in Chapter 7 of the MPI-1 document and in Section Error Handlers of this document. The return values of C++ functions are not error codes. If the default error handler has been set to MPI::ERRORS_THROW_EXCEPTIONS, the C++ exception mechanism is used to signal an error by throwing an MPI::Exception object.
Several factors limit the ability of MPI calls to return with meaningful error codes when an error occurs. MPI may not be able to detect some errors; other errors may be too expensive to detect in normal execution mode; finally some errors may be ``catastrophic'' and may prevent MPI from returning control to the caller in a consistent state.
Another subtle issue arises because of the nature of asynchronous communications: MPI calls may initiate operations that continue asynchronously after the call returned. Thus, the operation may return with a code indicating successful completion, yet later cause an error exception to be raised. If there is a subsequent call that relates to the same operation (e.g., a call that verifies that an asynchronous operation has completed) then the error argument associated with this call will be used to indicate the nature of the error. In a few cases, the error may occur after all calls that relate to the operation have completed, so that no error value can be used to indicate the nature of the error (e.g., an error on the receiver in a send with the ready mode). Such an error must be treated as fatal, since information cannot be returned for the user to recover from it.
This document does not specify the state of a computation after an erroneous MPI call has occurred. The desired behavior is that a relevant error code be returned, and the effect of the error be localized to the greatest possible extent. E.g., it is highly desirable that an erroneous receive call will not cause any part of the receiver's memory to be overwritten, beyond the area specified for receiving the message.
Implementations may go beyond this document in supporting in a meaningful manner MPI calls that are defined here to be erroneous. For example, MPI specifies strict type matching rules between matching send and receive operations: it is erroneous to send a floating point variable and receive an integer. Implementations may go beyond these type matching rules, and provide automatic type conversion in such situations. It will be helpful to generate warnings for such non-conforming behavior.
MPI-2 defines a way for users to create new error codes as defined in Section Error Classes, Error Codes, and Error Handlers .
There are a number of areas where an MPI implementation may interact with the operating environment and system. While MPI does not mandate that any services (such as signal handling) be provided, it does strongly suggest the behavior to be provided if those services are available. This is an important point in achieving portability across platforms that provide the same set of services.
MPI programs require that library routines that are part of the basic language environment (such as write in Fortran and printf and malloc in ANSI C) and are executed after MPI_INIT and before MPI_FINALIZE operate independently and that their completion is independent of the action of other processes in an MPI program.
Note that this in no way prevents the creation of library routines that
provide parallel services whose operation is collective. However, the
following program is expected to complete in an ANSI C environment
regardless of the size of MPI_COMM_WORLD (assuming that
printf is available at the executing nodes).
int rank;
MPI_Init((void *)0, (void *)0);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) printf("Starting program\n");
MPI_Finalize();
The corresponding Fortran and C++ programs are also expected to
complete.
An example of what is not required is any particular ordering
of the action of these routines when called by several tasks. For
example, MPI makes neither requirements nor recommendations for the
output from the following program (again assuming that
I/O is available at the executing nodes).
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("Output from task rank %d\n", rank);
In addition, calls that fail because of resource exhaustion or other
error are not considered a violation of the requirements here (however,
they are required to complete, just not to complete successfully).
MPI does not specify the interaction of processes with signals and does not require that MPI be signal safe. The implementation may reserve some signals for its own use. It is required that the implementation document which signals it uses, and it is strongly recommended that it not use SIGALRM, SIGFPE, or SIGIO. Implementations may also prohibit the use of MPI calls from within signal handlers.
In multithreaded environments, users can avoid conflicts between signals and the MPI library by catching signals only on threads that do not execute MPI calls. High quality single-threaded implementations will be signal safe: an MPI call suspended by a signal will resume and complete normally after the signal is handled.
The examples in this document are for illustration purposes only. They are not intended to specify the standard. Furthermore, the examples have not been carefully checked or verified.
This section contains clarifications and minor corrections to Version 1.1 of the MPI Standard. The only new function in MPI-1.2 is one for identifying which version of the MPI Standard the implementation being used conforms to. There are small differences between MPI-1 and MPI-1.1. There are very few differences (only those discussed in this chapter) between MPI-1.1 and MPI-1.2, but large differences (the rest of this document) between MPI-1.2 and MPI-2.
In order to cope with changes to the MPI Standard, there are both compile-time and run-time ways to determine which version of the standard is in use in the environment one is using.
The ``version'' will be represented by two separate integers, for the version and subversion:
In C and C++,
#define MPI_VERSION 1
#define MPI_SUBVERSION 2
in Fortran,
INTEGER MPI_VERSION, MPI_SUBVERSION
PARAMETER (MPI_VERSION = 1)
PARAMETER (MPI_SUBVERSION = 2)
For runtime determination,
| MPI_GET_VERSION( version, subversion ) | |
| OUT version | version number (integer) |
| OUT subversion | subversion number (integer) |
int MPI_Get_version(int *version, int *subversion)
MPI_GET_VERSION(VERSION, SUBVERSION, IERROR)
INTEGER VERSION, SUBVERSION, IERROR
MPI_GET_VERSION is one of the few functions that can be called before MPI_INIT and after MPI_FINALIZE. Its C++ binding can be found in the Annex, Section C++ Bindings for New 1.2 Functions .
As experience has been gained since the releases of the 1.0 and 1.1 versions of the MPI Standard, it has become apparent that some specifications were insufficiently clear. In this section we attempt to make clear the intentions of the MPI Forum with regard to the behavior of several MPI-1 functions. An MPI-1-compliant implementation should behave in accordance with the clarifications in this section.
MPI_INITIALIZED returns true if the calling process has called MPI_INIT. Whether MPI_FINALIZE has been called does not affect the behavior of MPI_INITIALIZED.
This routine cleans up all MPI state. Each process must call MPI_FINALIZE before it exits. Unless there has been a call to MPI_ABORT, each process must ensure that all pending non-blocking communications are (locally) complete before calling MPI_FINALIZE. Further, at the instant at which the last process calls MPI_FINALIZE, all pending sends must be matched by a receive, and all pending receives must be matched by a send.
For example, the following program is correct:
Process 0 Process 1
--------- ---------
MPI_Init(); MPI_Init();
MPI_Send(dest=1); MPI_Recv(src=0);
MPI_Finalize(); MPI_Finalize();
Without the matching receive, the program is erroneous:
Process 0 Process 1
----------- -----------
MPI_Init(); MPI_Init();
MPI_Send (dest=1);
MPI_Finalize(); MPI_Finalize();
A successful return from a blocking communication operation or from MPI_WAIT or MPI_TEST tells the user that the buffer can be reused and means that the communication is completed by the user, but does not guarantee that the local process has no more work to do. A successful return from MPI_REQUEST_FREE with a request handle generated by an MPI_ISEND nullifies the handle but provides no assurance of operation completion. The MPI_ISEND is complete only when it is known by some means that a matching receive has completed. MPI_FINALIZE guarantees that all local actions required by communications the user has completed will, in fact, occur before it returns.
MPI_FINALIZE guarantees nothing about pending communications that have not been completed (completion is assured only by MPI_WAIT, MPI_TEST, or MPI_REQUEST_FREE combined with some other verification of completion).
Example
This program is correct:
rank 0 rank 1 ===================================================== ... ... MPI_Isend(); MPI_Recv(); MPI_Request_free(); MPI_Barrier(); MPI_Barrier(); MPI_Finalize(); MPI_Finalize(); exit(); exit();
Example
This program is erroneous and its behavior is undefined:
rank 0 rank 1 ===================================================== ... ... MPI_Isend(); MPI_Recv(); MPI_Request_free(); MPI_Finalize(); MPI_Finalize(); exit(); exit();
If no MPI_BUFFER_DETACH occurs between an MPI_BSEND (or other buffered send) and MPI_FINALIZE, the MPI_FINALIZE implicitly supplies the MPI_BUFFER_DETACH.
Example
This program is correct, and after the MPI_Finalize, it is
as if the buffer had been detached.
rank 0 rank 1 ===================================================== ... ... buffer = malloc(1000000); MPI_Recv(); MPI_Buffer_attach(); MPI_Finalize(); MPI_Bsend(); exit(); MPI_Finalize(); free(buffer); exit();
Example
In this example, MPI_Iprobe() must return a FALSE
flag. MPI_Test_cancelled() must return a TRUE flag,
independent of the relative order of execution of MPI_Cancel()
in process 0 and MPI_Finalize() in process 1.
The MPI_Iprobe() call is there to make sure the implementation knows that the ``tag1'' message exists at the destination, without being able to claim that the user knows about it.
rank 0 rank 1
========================================================
MPI_Init(); MPI_Init();
MPI_Isend(tag1);
MPI_Barrier(); MPI_Barrier();
MPI_Iprobe(tag2);
MPI_Barrier(); MPI_Barrier();
MPI_Finalize();
exit();
MPI_Cancel();
MPI_Wait();
MPI_Test_cancelled();
MPI_Finalize();
exit();
An implementation may need to delay the return from MPI_FINALIZE
until all potential future message cancellations have been
processed. One possible solution is to place a barrier inside
MPI_FINALIZE
( End of advice to implementors.)
Once MPI_FINALIZE returns, no MPI routine (not even MPI_INIT) may be called, except for MPI_GET_VERSION, MPI_INITIALIZED, and the MPI-2 function MPI_FINALIZED. Each process must complete any pending communication it initiated before it calls MPI_FINALIZE. If the call returns, each process may continue local computations, or exit, without participating in further MPI communication with other processes. MPI_FINALIZE is collective on MPI_COMM_WORLD.
Advice
to implementors.
Even though a process has completed all the communication it initiated, such communication may not yet be completed from the viewpoint of the underlying MPI system. E.g., a blocking send may have completed, even though the data is still