MPI-2: Extensions to the Message-Passing Interface

Message Passing Interface Forum

This document describes the MPI-1.2 and MPI-2 standards. They are both extensions to the MPI-1.1 standard. The MPI-1.2 part of the document contains clarifications and corrections to the MPI-1.1 standard and defines MPI-1.2. The MPI-2 part of the document describes additions to the MPI-1 standard and defines MPI-2. These include miscellaneous topics, process creation and management, one-sided communications, extended collective operations, external interfaces, I/O, and additional language bindings.

(c) 1995, 1996, 1997 University of Tennessee, Knoxville, Tennessee. Permission to copy without fee all or part of this material is granted, provided the University of Tennessee copyright notice and the title of this document appear, and notice is given that copying is by permission of the University of Tennessee.

Acknowledgments

This document represents the work of many people who have served on the MPI Forum. The meetings have been attended by dozens of people from many parts of the world. It is the hard and dedicated work of this group that has led to the MPI standard.

The technical development was carried out by subgroups, whose work was reviewed by the full committee. During the period of development of the Message Passing Interface ( MPI-2), many people helped with this effort. Those who served as the primary coordinators are:


The following list includes some of the active participants who attended MPI-2 Forum meetings and are not mentioned above.

Greg Astfalk Robert Babb Ed Benson Rajesh Bordawekar
Pete Bradley Peter Brennan Ron Brightwell Maciej Brodowicz
Eric Brunner Greg Burns Margaret Cahir Pang Chen
Ying Chen Albert Cheng Yong Cho Joel Clark
Lyndon Clarke Laurie Costello Dennis Cottel Jim Cownie
Zhenqian Cui Suresh Damodaran-Kamal Raja Daoud Judith Devaney
David DiNucci Doug Doefler Jack Dongarra Terry Dontje
Nathan Doss Anne Elster Mark Fallon Karl Feind
Sam Fineberg Craig Fischberg Stephen Fleischman Ian Foster
Hubertus Franke Richard Frost Al Geist Robert George
David Greenberg John Hagedorn Kei Harada Leslie Hart
Shane Hebert Rolf Hempel Tom Henderson Alex Ho
Hans-Christian Hoppe Joefon Jann Terry Jones Karl Kesselman
Koichi Konishi Susan Kraus Steve Kubica Steve Landherr
Mario Lauria Mark Law Juan Leon Lloyd Lewins
Ziyang Lu Bob Madahar Peter Madams John May
Oliver McBryan Brian McCandless Tyce McLarty Thom McMahon
Harish Nag Nick Nevin Jarek Nieplocha Ron Oldfield
Peter Ossadnik Steve Otto Peter Pacheco Yoonho Park
Perry Partow Pratap Pattnaik Elsie Pierce Paul Pierce
Heidi Poxon Jean-Pierre Prost Boris Protopopov James Pruyve
Rolf Rabenseifner Joe Rieken Peter Rigsbee Tom Robey
Anna Rounbehler Nobutoshi Sagawa Arindam Saha Eric Salo
Darren Sanders Eric Sharakan Andrew Sherman Fred Shirley
Lance Shuler A. Gordon Smith Ian Stockdale David Taylor
Stephen Taylor Greg Tensa Rajeev Thakur Marydell Tholburn
Dick Treumann Simon Tsang Manuel Ujaldon David Walker
Jerrell Watts Klaus Wolf Parkson Wong Dave Wright

The MPI Forum also acknowledges and appreciates the valuable input from people via e-mail and in person.

The following institutions supported the MPI-2 effort through time and travel support for the people listed above.

Argonne National Laboratory
Bolt, Beranek, and Newman
California Institute of Technology
Center for Computing Sciences
Convex Computer Corporation
Cray Research
Digital Equipment Corporation
Dolphin Interconnect Solutions, Inc.
Edinburgh Parallel Computing Centre
General Electric Company
German National Research Center for Information Technology
Hewlett-Packard
Hitachi
Hughes Aircraft Company
Intel Corporation
International Business Machines
Khoral Research
Lawrence Livermore National Laboratory
Los Alamos National Laboratory
MPI Software Techology, Inc.
Mississippi State University
NEC Corporation
National Aeronautics and Space Administration
National Energy Research Scientific Computing Center
National Institute of Standards and Technology
National Oceanic and Atmospheric Adminstration
Oak Ridge National Laboratory
Ohio State University
PALLAS GmbH
Pacific Northwest National Laboratory
Pratt & Whitney
San Diego Supercomputer Center
Sanders, A Lockheed-Martin Company
Sandia National Laboratories
Schlumberger
Scientific Computing Associates, Inc.
Silicon Graphics Incorporated
Sky Computers
Sun Microsystems Computer Corporation
Syracuse University
The MITRE Corporation
Thinking Machines Corporation
United States Navy
University of Colorado
University of Denver
University of Houston
University of Illinois
University of Maryland
University of Notre Dame
University of San Fransisco
University of Stuttgart Computing Center
University of Wisconsin

MPI-2 operated on a very tight budget (in reality, it had no budget when the first meeting was announced). Many institutions helped the MPI-2 effort by supporting the efforts and travel of the members of the MPI Forum. Direct support was given by NSF and DARPA under NSF contract CDA-9115428 for travel by U.S. academic participants and Esprit under project HPC Standards (21111) for European participants.


Contents

  • Introduction to MPI-2
  • Background
  • Organization of this Document
  • MPI-2 Terms and Conventions
  • Document Notation
  • Naming Conventions
  • Procedure Specification
  • Semantic Terms
  • Data Types
  • Opaque Objects
  • Array Arguments
  • State
  • Named Constants
  • Choice
  • Addresses
  • File Offsets
  • Language Binding
  • Deprecated Names and Functions
  • Fortran Binding Issues
  • C Binding Issues
  • C++ Binding Issues
  • Processes
  • Error Handling
  • Implementation Issues
  • Independence of Basic Runtime Routines
  • Interaction with Signals
  • Examples
  • Version 1.2 of MPI
  • Version Number
  • MPI-1.0 and MPI-1.1 Clarifications
  • Clarification of MPI_INITIALIZED
  • Clarification of MPI_FINALIZE
  • Clarification of status after MPI_WAIT and MPI_TEST
  • Clarification of MPI_INTERCOMM_CREATE
  • Clarification of MPI_INTERCOMM_MERGE
  • Clarification of Binding of MPI_TYPE_SIZE
  • Clarification of MPI_REDUCE
  • Clarification of Error Behavior of Attribute Callback Functions
  • Clarification of MPI_PROBE and MPI_IPROBE
  • Minor Corrections
  • Miscellany
  • Portable MPI Process Startup
  • Passing NULL to MPI_Init
  • Version Number
  • Datatype Constructor MPI_TYPE_CREATE_INDEXED_BLOCK
  • Treatment of MPI_Status
  • Passing MPI_STATUS_IGNORE for Status
  • Non-destructive Test of status
  • Error Class for Invalid Keyval
  • Committing a Committed Datatype
  • Allowing User Functions at Process Termination
  • Determining Whether MPI Has Finished
  • The Info Object
  • Memory Allocation
  • Language Interoperability
  • Introduction
  • Assumptions
  • Initialization
  • Transfer of Handles
  • Status
  • MPI Opaque Objects
  • Datatypes
  • Callback Functions
  • Error Handlers
  • Reduce Operations
  • Addresses
  • Attributes
  • Extra State
  • Constants
  • Interlanguage Communication
  • Error Handlers
  • Error Handlers for Communicators
  • Error Handlers for Windows
  • Error Handlers for Files
  • New Datatype Manipulation Functions
  • Type Constructors with Explicit Addresses
  • Extent and Bounds of Datatypes
  • True Extent of Datatypes
  • Subarray Datatype Constructor
  • Distributed Array Datatype Constructor
  • New Predefined Datatypes
  • Wide Characters
  • Signed Characters and Reductions
  • Unsigned long long Type
  • Canonical MPI_PACK and MPI_UNPACK
  • Functions and Macros
  • Profiling Interface
  • Process Creation and Management
  • Introduction
  • The MPI-2 Process Model
  • Starting Processes
  • The Runtime Environment
  • Process Manager Interface
  • Processes in MPI
  • Starting Processes and Establishing Communication
  • Starting Multiple Executables and Establishing Communication
  • Reserved Keys
  • Spawn Example
  • Manager-worker Example, Using MPI_SPAWN.
  • Establishing Communication
  • Names, Addresses, Ports, and All That
  • Server Routines
  • Client Routines
  • Name Publishing
  • Reserved Key Values
  • Client/Server Examples
  • Simplest Example --- Completely Portable.
  • Ocean/Atmosphere - Relies on Name Publishing
  • Simple Client-Server Example.
  • Other Functionality
  • Universe Size
  • Singleton MPI_INIT
  • MPI_APPNUM
  • Releasing Connections
  • Another Way to Establish MPI Communication
  • One-Sided Communications
  • Introduction
  • Initialization
  • Window Creation
  • Window Attributes
  • Communication Calls
  • Put
  • Get
  • Examples
  • Accumulate Functions
  • Synchronization Calls
  • Fence
  • General Active Target Synchronization
  • Lock
  • Assertions
  • Miscellaneous Clarifications
  • Examples
  • Error Handling
  • Error Handlers
  • Error Classes
  • Semantics and Correctness
  • Atomicity
  • Progress
  • Registers and Compiler Optimizations
  • Extended Collective Operations
  • Introduction
  • Intercommunicator Constructors
  • Extended Collective Operations
  • Intercommunicator Collective Operations
  • Operations that Move Data
  • Broadcast
  • Gather
  • Scatter
  • ``All'' Forms and All-to-all
  • Reductions
  • Other Operations
  • Generalized All-to-all Function
  • Exclusive Scan
  • External Interfaces
  • Introduction
  • Generalized Requests
  • Examples
  • Associating Information with Status
  • Naming Objects
  • Error Classes, Error Codes, and Error Handlers
  • Decoding a Datatype
  • MPI and Threads
  • General
  • Clarifications
  • Initialization
  • New Attribute Caching Functions
  • Communicators
  • Windows
  • Datatypes
  • Duplicating a Datatype
  • I/O
  • Introduction
  • Definitions
  • File Manipulation
  • Opening a File
  • Closing a File
  • Deleting a File
  • Resizing a File
  • Preallocating Space for a File
  • Querying the Size of a File
  • Querying File Parameters
  • File Info
  • Reserved File Hints
  • File Views
  • Data Access
  • Data Access Routines
  • Positioning
  • Synchronism
  • Coordination
  • Data Access Conventions
  • Data Access with Explicit Offsets
  • Data Access with Individual File Pointers
  • Data Access with Shared File Pointers
  • Noncollective Operations
  • Collective Operations
  • Seek
  • Split Collective Data Access Routines
  • File Interoperability
  • Datatypes for File Interoperability
  • External Data Representation: ``external32''
  • User-Defined Data Representations
  • Extent Callback
  • Datarep Conversion Functions
  • Matching Data Representations
  • Consistency and Semantics
  • File Consistency
  • Random Access vs. Sequential Files
  • Progress
  • Collective File Operations
  • Type Matching
  • Miscellaneous Clarifications
  • MPI_Offset Type
  • Logical vs. Physical File Layout
  • File Size
  • Examples
  • Asynchronous I/O
  • I/O Error Handling
  • I/O Error Classes
  • Examples
  • Double Buffering with Split Collective I/O
  • Subarray Filetype Constructor
  • Language Bindings
  • C++
  • Overview
  • Design
  • C++ Classes for MPI
  • Class Member Functions for MPI
  • Semantics
  • C++ Datatypes
  • Communicators
  • Exceptions
  • Mixed-Language Operability
  • Profiling
  • Fortran Support
  • Overview
  • Problems With Fortran Bindings for MPI
  • Problems Due to Strong Typing
  • Problems Due to Data Copying and Sequence Association
  • Special Constants
  • Fortran 90 Derived Types
  • A Problem with Register Optimization
  • Basic Fortran Support
  • Extended Fortran Support
  • The mpi Module
  • No Type Mismatch Problems for Subroutines with Choice Arguments
  • Additional Support for Fortran Numeric Intrinsic Types
  • Parameterized Datatypes with Specified Precision and Exponent Range
  • Support for Size-specific MPI Datatypes
  • Communication With Size-specific Types
  • Bibliography
  • Language Binding
  • Introduction
  • Defined Constants, Error Codes, Info Keys, and Info Values
  • Defined Constants
  • Info Keys
  • Info Values
  • MPI-1.2 C Bindings
  • MPI-1.2 Fortran Bindings
  • MPI-1.2 C++ Bindings
  • MPI-2 C Bindings
  • Miscellany
  • Process Creation and Management
  • One-Sided Communications
  • Extended Collective Operations
  • External Interfaces
  • I/O
  • Language Bindings
  • MPI-2 C Functions
  • MPI-2 Fortran Bindings
  • Miscellany
  • Process Creation and Management
  • One-Sided Communications
  • Extended Collective Operations
  • External Interfaces
  • I/O
  • Language Bindings
  • MPI-2 Fortran Subroutines
  • MPI-2 C++ Bindings
  • Miscellany
  • Process Creation and Management
  • One-Sided Communications
  • Extended Collective Operations
  • External Interfaces
  • I/O
  • Language Bindings
  • MPI-2 C++ Functions
  • MPI-1 C++ Language Binding
  • C++ Classes
  • Defined Constants
  • Typedefs
  • C++ Bindings for Point-to-Point Communication
  • C++ Bindings for Collective Communication
  • C++ Bindings for Groups, Contexts, and Communicators
  • C++ Bindings for Process Topologies
  • C++ Bindings for Environmental Inquiry
  • C++ Bindings for Profiling
  • C++ Bindings for Status Access
  • C++ Bindings for New 1.2 Functions
  • C++ Bindings for Exceptions
  • C++ Bindings on all MPI Classes
  • Construction / Destruction
  • Copy / Assignment
  • Comparison
  • Inter-language Operability
  • Function Name Cross Reference
  • Index


  • 1. Introduction to MPI-2


    Up: Contents Next: Background



    Up: Contents Next: Background


    1.1. Background


    Up: Introduction to MPI-2 Next: Organization of this Document Previous: Introduction to MPI-2

    Beginning in March 1995, the MPI Forum began meeting to consider corrections and extensions to the original MPI Standard document [5]. The first product of these deliberations was Version 1.1 of the MPI specification, released in June of 1995 (see
    http://www.mpi-forum.org for official MPI document releases). Since that time, effort has been focused in five types of areas.

      1. Further corrections and clarifications for the MPI-1.1 document.
      2. Additions to MPI-1.1 that do not significantly change its types of functionality (new datatype constructors, language interoperability, etc.).
      3. Completely new types of functionality (dynamic processes, one-sided communication, parallel I/O, etc.) that are what everyone thinks of as ``MPI-2 functionality.''
      4. Bindings for Fortran 90 and C++. This document specifies C++ bindings for both MPI-1 and MPI-2 functions, and extensions to the Fortran 77 binding of MPI-1 and MPI-2 to handle Fortran 90 issues.
      5. Discussions of areas in which the MPI process and framework seem likely to be useful, but where more discussion and experience are needed before standardization (e.g. 0-copy semantics on shared-memory machines, real-time specifications).

    Corrections and clarifications (items of type 1 in the above list) have been collected in Chapter Version 1.2 of MPI of this document, ``Version 1.2 of MPI.'' This chapter also contains the function for identifying the version number. Additions to MPI-1.1 (items of types 2, 3, and 4 in the above list) are in the remaining chapters, and constitute the specification for MPI-2. This document specifies Version 2.0 of MPI. Items of type 5 in the above list have been moved to a separate document, the ``MPI Journal of Development'' (JOD), and are not part of the MPI-2 Standard.

    This structure makes it easy for users and implementors to understand what level of MPI compliance a given implementation has:


    It is to be emphasized that forward compatibility is preserved. That is, a valid MPI-1.1 program is both a valid MPI-1.2 program and a valid MPI-2 program, and a valid MPI-1.2 program is a valid MPI-2 program.



    Up: Introduction to MPI-2 Next: Organization of this Document Previous: Introduction to MPI-2


    1.2. Organization of this Document


    Up: Introduction to MPI-2 Next: MPI-2 Terms and Conventions Previous: Background

    This document is organized as follows:


    The rest of this document contains the MPI-2 Standard Specification. It adds substantial new types of functionality to MPI, in most cases specifying functions for an extended computational model (e.g., dynamic process creation and one-sided communication) or for a significant new capability (e.g., parallel I/O).

    The following is a list of the chapters in MPI-2, along with a brief description of each.


    The Appendices are:


    The MPI Function Index is a simple index showing the location of the precise definition of each MPI-2 function, together with C, C++, and Fortran bindings.

    MPI-2 provides various interfaces to facilitate interoperability of distinct MPI implementations. Among these are the canonical data representation for MPI I/O and for MPI_PACK_EXTERNAL and MPI_UNPACK_EXTERNAL. The definition of an actual binding of these interfaces that will enable interoperability is outside the scope of this document.

    A separate document consists of ideas that were discussed in the MPI Forum and deemed to have value, but are not included in the MPI Standard. They are part of the ``Journal of Development'' (JOD), lest good ideas be lost and in order to provide a starting point for further work. The chapters in the JOD are




    Up: Introduction to MPI-2 Next: MPI-2 Terms and Conventions Previous: Background


    2. MPI-2 Terms and Conventions


    Up: Contents Next: Document Notation Previous: Organization of this Document

    This chapter explains notational terms and conventions used throughout the MPI-2 document, some of the choices that have been made, and the rationale behind those choices. It is similar to the MPI-1 Terms and Conventions chapter but differs in some major and minor ways. Some of the major areas of difference are the naming conventions, some semantic definitions, file objects, Fortran 90 vs Fortran 77, C++, processes, and interaction with signals.



    Up: Contents Next: Document Notation Previous: Organization of this Document


    2.1. Document Notation


    Up: MPI-2 Terms and Conventions Next: Naming Conventions Previous: MPI-2 Terms and Conventions


    Rationale.

    Throughout this document, the rationale for the design choices made in the interface specification is set off in this format. Some readers may wish to skip these sections, while readers interested in interface design may want to read them carefully. ( End of rationale.)

    Advice to users.

    Throughout this document, material aimed at users and that illustrates usage is set off in this format. Some readers may wish to skip these sections, while readers interested in programming in MPI may want to read them carefully. ( End of advice to users.)

    Advice to implementors.

    Throughout this document, material that is primarily commentary to implementors is set off in this format. Some readers may wish to skip these sections, while readers interested in MPI implementations may want to read them carefully. ( End of advice to implementors.)



    Up: MPI-2 Terms and Conventions Next: Naming Conventions Previous: MPI-2 Terms and Conventions


    2.2. Naming Conventions


    Up: MPI-2 Terms and Conventions Next: Procedure Specification Previous: Document Notation

    MPI-1 used informal naming conventions. In many cases, MPI-1 names for C functions are of the form Class_action_subset and in Fortran of the form CLASS_ACTION_SUBSET, but this rule is not uniformly applied. In MPI-2, an attempt has been made to standardize names of new functions according to the following rules. In addition, the C++ bindings for MPI-1 functions also follow these rules (see Section C++ Binding Issues ). C and Fortran function names for MPI-1 have not been changed.

      1. In C, all routines associated with a particular type of MPI object should be of the form Class_action_subset or, if no subset exists, of the form Class_action. In Fortran, all routines associated with a particular type of MPI object should be of the form CLASS_ACTION_SUBSET or, if no subset exists, of the form CLASS_ACTION. For C and Fortran we use the C++ terminology to define the Class. In C++, the routine is a method on Class and is named MPI::Class::Action_subset. If the routine is associated with a certain class, but does not make sense as an object method, it is a static member function of the class.


      2. If the routine is not associated with a class, the name should be of the form Action_subset in C and ACTION_SUBSET in Fortran, and in C++ should be scoped in the MPI namespace, MPI::Action_subset.


      3. The names of certain actions have been standardized. In particular, Create creates a new object, Get retrieves information about an object, Set sets this information, Delete deletes information, Is asks whether or not an object has a certain property.

    C and Fortran names for MPI-1 functions violate these rules in several cases. The most common exceptions are the omission of the Class name from the routine and the omission of the Action where one can be inferred.

    MPI identifiers are limited to 30 characters (31 with the profiling interface). This is done to avoid exceeding the limit on some compilation systems.



    Up: MPI-2 Terms and Conventions Next: Procedure Specification Previous: Document Notation


    2.3. Procedure Specification


    Up: MPI-2 Terms and Conventions Next: Semantic Terms Previous: Naming Conventions

    MPI procedures are specified using a language-independent notation. The arguments of procedure calls are marked as IN, OUT or INOUT. The meanings of these are:


    There is one special case --- if an argument is a handle to an opaque object (these terms are defined in Section Opaque Objects ), and the object is updated by the procedure call, then the argument is marked OUT. It is marked this way even though the handle itself is not modified --- we use the OUT attribute to denote that what the handle references is updated. Thus, in C++, IN arguments are either references or pointers to const objects.


    Rationale.

    The definition of MPI tries to avoid, to the largest possible extent, the use of INOUT arguments, because such use is error-prone, especially for scalar arguments. ( End of rationale.)
    MPI's use of IN, OUT and INOUT is intended to indicate to the user how an argument is to be used, but does not provide a rigorous classification that can be translated directly into all language bindings (e.g., INTENT in Fortran 90 bindings or const in C bindings). For instance, the ``constant'' MPI_BOTTOM can usually be passed to OUT buffer arguments. Similarly, MPI_STATUS_IGNORE can be passed as the OUT status argument.

    A common occurrence for MPI functions is an argument that is used as IN by some processes and OUT by other processes. Such an argument is, syntactically, an INOUT argument and is marked as such, although, semantically, it is not used in one call both for input and for output on a single process.

    Another frequent situation arises when an argument value is needed only by a subset of the processes. When an argument is not significant at a process then an arbitrary value can be passed as an argument.

    Unless specified otherwise, an argument of type OUT or type INOUT cannot be aliased with any other argument passed to an MPI procedure. An example of argument aliasing in C appears below. If we define a C procedure like this,

    void copyIntBuffer( int *pin, int *pout, int len ) 
    {   int i; 
        for (i=0; i<len; ++i) *pout++ = *pin++; 
    } 
    
    then a call to it in the following code fragment has aliased arguments.
    int a[10]; 
    copyIntBuffer( a, a+3, 7); 
    
    Although the C language allows this, such usage of MPI procedures is forbidden unless otherwise specified. Note that Fortran prohibits aliasing of arguments.

    All MPI functions are first specified in the language-independent notation. Immediately below this, the ANSI C version of the function is shown followed by a version of the same function in Fortran and then the C++ binding. Fortran in this document refers to Fortran 90; see Section Language Binding .



    Up: MPI-2 Terms and Conventions Next: Semantic Terms Previous: Naming Conventions


    2.4. Semantic Terms


    Up: MPI-2 Terms and Conventions Next: Data Types Previous: Procedure Specification

    When discussing MPI procedures the following semantic terms are used.

    { nonblocking}
    A procedure is nonblocking if the procedure may return before the operation completes, and before the user is allowed to reuse resources (such as buffers) specified in the call. A nonblocking request is started by the call that initiates it, e.g., MPI_ISEND. The word complete is used with respect to operations, requests, and communications. An operation completes when the user is allowed to reuse resources, and any output buffers have been updated; i.e. a call to MPI_TEST will return flag = true. A request is completed by a call to wait, which returns, or a test or get status call which returns flag = true. This completing call has two effects: the status is extracted from the request; in the case of test and wait, if the request was nonpersistent, it is freed. A communication completes when all participating operations complete.
    { blocking}
    A procedure is blocking if return from the procedure indicates the user is allowed to reuse resources specified in the call.
    { local}
    A procedure is local if completion of the procedure depends only on the local executing process.
    { non-local}
    A procedure is non-local if completion of the operation may require the execution of some MPI procedure on another process. Such an operation may require communication occurring with another user process.
    { collective}
    A procedure is collective if all processes in a process group need to invoke the procedure. A collective call may or may not be synchronizing. Collective calls over the same communicator must be executed in the same order by all members of the process group.
    { predefined}
    A predefined datatype is a datatype with a predefined (constant) name (such as MPI_INT, MPI_FLOAT_INT, or MPI_UB) or a datatype constructed with MPI_TYPE_CREATE_F90_INTEGER, MPI_TYPE_CREATE_F90_REAL, or MPI_TYPE_CREATE_F90_COMPLEX. The former are named whereas the latter are unnamed.
    { derived}
    A derived datatype is any datatype that is not predefined.
    { portable}
    A datatype is portable, if it is a predefined datatype, or it is derived from a portable datatype using only the type constructors MPI_TYPE_CONTIGUOUS, MPI_TYPE_VECTOR, MPI_TYPE_INDEXED, MPI_TYPE_INDEXED_BLOCK, MPI_TYPE_CREATE_SUBARRAY, MPI_TYPE_DUP, and MPI_TYPE_CREATE_DARRAY. Such a datatype is portable because all displacements in the datatype are in terms of extents of one predefined datatype. Therefore, if such a datatype fits a data layout in one memory, it will fit the corresponding data layout in another memory, if the same declarations were used, even if the two systems have different architectures. On the other hand, if a datatype was constructed using MPI_TYPE_CREATE_HINDEXED, MPI_TYPE_CREATE_HVECTOR or MPI_TYPE_CREATE_STRUCT, then the datatype contains explicit byte displacements (e.g., providing padding to meet alignment restrictions). These displacements are unlikely to be chosen correctly if they fit data layout on one memory, but are used for data layouts on another process, running on a processor with a different architecture.
    { equivalent}
    Two datatypes are equivalent if they appear to have been created with the same sequence of calls (and arguments) and thus have the same typemap. Two equivalent datatypes do not necessarily have the same cached attributes or the same names.



    Up: MPI-2 Terms and Conventions Next: Data Types Previous: Procedure Specification


    2.5. Data Types


    Up: MPI-2 Terms and Conventions Next: Opaque Objects Previous: Semantic Terms



    Up: MPI-2 Terms and Conventions Next: Opaque Objects Previous: Semantic Terms


    2.5.1. Opaque Objects


    Up: Data Types Next: Array Arguments Previous: Data Types

    MPI manages system memory that is used for buffering messages and for storing internal representations of various MPI objects such as groups, communicators, datatypes, etc. This memory is not directly accessible to the user, and objects stored there are opaque: their size and shape is not visible to the user. Opaque objects are accessed via handles, which exist in user space. MPI procedures that operate on opaque objects are passed handle arguments to access these objects. In addition to their use by MPI calls for object access, handles can participate in assignments and comparisons.

    In Fortran, all handles have type INTEGER. In C and C++, a different handle type is defined for each category of objects. In addition, handles themselves are distinct objects in C++. The C and C++ types must support the use of the assignment and equality operators.


    Advice to implementors.

    In Fortran, the handle can be an index into a table of opaque objects in a system table; in C it can be such an index or a pointer to the object. C++ handles can simply ``wrap up'' a table index or pointer.

    ( End of advice to implementors.)
    Opaque objects are allocated and deallocated by calls that are specific to each object type. These are listed in the sections where the objects are described. The calls accept a handle argument of matching type. In an allocate call this is an OUT argument that returns a valid reference to the object. In a call to deallocate this is an INOUT argument which returns with an ``invalid handle'' value. MPI provides an ``invalid handle'' constant for each object type. Comparisons to this constant are used to test for validity of the handle.

    A call to a deallocate routine invalidates the handle and marks the object for deallocation. The object is not accessible to the user after the call. However, MPI need not deallocate the object immediately. Any operation pending (at the time of the deallocate) that involves this object will complete normally; the object will be deallocated afterwards.

    An opaque object and its handle are significant only at the process where the object was created and cannot be transferred to another process.

    MPI provides certain predefined opaque objects and predefined, static handles to these objects. The user must not free such objects. In C++, this is enforced by declaring the handles to these predefined objects to be static const.


    Rationale.

    This design hides the internal representation used for MPI data structures, thus allowing similar calls in C, C++, and Fortran. It also avoids conflicts with the typing rules in these languages, and easily allows future extensions of functionality. The mechanism for opaque objects used here loosely follows the POSIX Fortran binding standard.

    The explicit separation of handles in user space and objects in system space allows space-reclaiming and deallocation calls to be made at appropriate points in the user program. If the opaque objects were in user space, one would have to be very careful not to go out of scope before any pending operation requiring that object completed. The specified design allows an object to be marked for deallocation, the user program can then go out of scope, and the object itself still persists until any pending operations are complete.

    The requirement that handles support assignment/comparison is made since such operations are common. This restricts the domain of possible implementations. The alternative would have been to allow handles to have been an arbitrary, opaque type. This would force the introduction of routines to do assignment and comparison, adding complexity, and was therefore ruled out. ( End of rationale.)

    Advice to users.

    A user may accidently create a dangling reference by assigning to a handle the value of another handle, and then deallocating the object associated with these handles. Conversely, if a handle variable is deallocated before the associated object is freed, then the object becomes inaccessible (this may occur, for example, if the handle is a local variable within a subroutine, and the subroutine is exited before the associated object is deallocated). It is the user's responsibility to avoid adding or deleting references to opaque objects, except as a result of MPI calls that allocate or deallocate such objects. ( End of advice to users.)

    Advice to implementors.

    The intended semantics of opaque objects is that opaque objects are separate from one another; each call to allocate such an object copies all the information required for the object. Implementations may avoid excessive copying by substituting referencing for copying. For example, a derived datatype may contain references to its components, rather then copies of its components; a call to MPI_COMM_GROUP may return a reference to the group associated with the communicator, rather than a copy of this group. In such cases, the implementation must maintain reference counts, and allocate and deallocate objects in such a way that the visible effect is as if the objects were copied. ( End of advice to implementors.)



    Up: Data Types Next: Array Arguments Previous: Data Types


    2.5.2. Array Arguments


    Up: Data Types Next: State Previous: Opaque Objects

    An MPI call may need an argument that is an array of opaque objects, or an array of handles. The array-of-handles is a regular array with entries that are handles to objects of the same type in consecutive locations in the array. Whenever such an array is used, an additional len argument is required to indicate the number of valid entries (unless this number can be derived otherwise). The valid entries are at the beginning of the array; len indicates how many of them there are, and need not be the size of the entire array. The same approach is followed for other array arguments. In some cases NULL handles are considered valid entries. When a NULL argument is desired for an array of statuses, one uses MPI_STATUSES_IGNORE.



    Up: Data Types Next: State Previous: Opaque Objects


    2.5.3. State


    Up: Data Types Next: Named Constants Previous: Array Arguments

    MPI procedures use at various places arguments with state types. The values of such a data type are all identified by names, and no operation is defined on them. For example, the MPI_TYPE_CREATE_SUBARRAY routine has a state argument order with values MPI_ORDER_C and MPI_ORDER_FORTRAN.



    Up: Data Types Next: Named Constants Previous: Array Arguments


    2.5.4. Named Constants


    Up: Data Types Next: Choice Previous: State

    MPI procedures sometimes assign a special meaning to a special value of a basic type argument; e.g., tag is an integer-valued argument of point-to-point communication operations, with a special wild-card value, MPI_ANY_TAG. Such arguments will have a range of regular values, which is a proper subrange of the range of values of the corresponding basic type; special values (such as MPI_ANY_TAG) will be outside the regular range. The range of regular values, such as tag, can be queried using environmental inquiry functions (Chapter 7 of the MPI-1 document). The range of other values, such as source, depends on values given by other MPI routines (in the case of source it is the communicator size).

    MPI also provides predefined named constant handles, such as MPI_COMM_WORLD.

    All named constants, with the exceptions noted below for Fortran, can be used in initialization expressions or assignments. These constants do not change values during execution. Opaque objects accessed by constant handles are defined and do not change value between MPI initialization ( MPI_INIT) and MPI completion ( MPI_FINALIZE).

    The constants that cannot be used in initialization expressions or assignments in Fortran are:

      MPI_BOTTOM 
      MPI_STATUS_IGNORE 
      MPI_STATUSES_IGNORE 
      MPI_ERRCODES_IGNORE 
      MPI_IN_PLACE 
      MPI_ARGV_NULL 
      MPI_ARGVS_NULL 
    

    Advice to implementors.

    In Fortran the implementation of these special constants may require the use of language constructs that are outside the Fortran standard. Using special values for the constants (e.g., by defining them through parameter statements) is not possible because an implementation cannot distinguish these values from legal data. Typically, these constants are implemented as predefined static variables (e.g., a variable in an MPI-declared COMMON block), relying on the fact that the target compiler passes data by address. Inside the subroutine, this address can be extracted by some mechanism outside the Fortran standard (e.g., by Fortran extensions or by implementing the function in C). ( End of advice to implementors.)



    Up: Data Types Next: Choice Previous: State


    2.5.5. Choice


    Up: Data Types Next: Addresses Previous: Named Constants

    MPI functions sometimes use arguments with a choice (or union) data type. Distinct calls to the same routine may pass by reference actual arguments of different types. The mechanism for providing such arguments will differ from language to language. For Fortran, the document uses <type> to represent a choice variable; for C and C++, we use void *.



    Up: Data Types Next: Addresses Previous: Named Constants


    2.5.6. Addresses


    Up: Data Types Next: File Offsets Previous: Choice

    Some MPI procedures use address arguments that represent an absolute address in the calling program. The datatype of such an argument is MPI_Aint in C, MPI::Aint in C++ and INTEGER (KIND=MPI_ADDRESS_KIND) in Fortran. There is the MPI constant MPI_BOTTOM to indicate the start of the address range.



    Up: Data Types Next: File Offsets Previous: Choice


    2.5.7. File Offsets


    Up: Data Types Next: Language Binding Previous: Addresses

    For I/O there is a need to give the size, displacement, and offset into a file. These quantities can easily be larger than 32 bits which can be the default size of a Fortran integer. To overcome this, these quantities are declared to be INTEGER (KIND=MPI_OFFSET_KIND) in Fortran. In C one uses MPI_Offset whereas in C++ one uses MPI::Offset.



    Up: Data Types Next: Language Binding Previous: Addresses


    2.6. Language Binding


    Up: MPI-2 Terms and Conventions Next: Deprecated Names and Functions Previous: File Offsets

    This section defines the rules for MPI language binding in general and for Fortran, ANSI C, and C++, in particular. (Note that ANSI C has been replaced by ISO C. References in MPI to ANSI C now mean ISO C.) Defined here are various object representations, as well as the naming conventions used for expressing this standard. The actual calling sequences are defined elsewhere.

    MPI bindings are for Fortran 90, though they are designed to be usable in Fortran 77 environments.

    Since the word PARAMETER is a keyword in the Fortran language, we use the word ``argument'' to denote the arguments to a subroutine. These are normally referred to as parameters in C and C++, however, we expect that C and C++ programmers will understand the word ``argument'' (which has no specific meaning in C/C++), thus allowing us to avoid unnecessary confusion for Fortran programmers.

    Since Fortran is case insensitive, linkers may use either lower case or upper case when resolving Fortran names. Users of case sensitive languages should avoid the ``mpi_'' and ``pmpi_'' prefixes.



    Up: MPI-2 Terms and Conventions Next: Deprecated Names and Functions Previous: File Offsets


    2.6.1. Deprecated Names and Functions


    Up: Language Binding Next: Fortran Binding Issues Previous: Language Binding

    A number of chapters refer to deprecated or replaced MPI-1 constructs. These are constructs that continue to be part of the MPI standard, but that users are recommended not to continue using, since MPI-2 provides better solutions. For example, the Fortran binding for MPI-1 functions that have address arguments uses INTEGER. This is not consistent with the C binding, and causes problems on machines with 32 bit INTEGERs and 64 bit addresses. In MPI-2, these functions have new names, and new bindings for the address arguments. The use of the old functions is deprecated. For consistency, here and a few other cases, new C functions are also provided, even though the new functions are equivalent to the old functions. The old names are deprecated. Another example is provided by the MPI-1 predefined datatypes MPI_UB and MPI_LB. They are deprecated, since their use is awkward and error-prone, while the MPI-2 function MPI_TYPE_CREATE_RESIZED provides a more convenient mechanism to achieve the same effect.

    The following is a list of all of the deprecated constructs. Note that the constants MPI_LB and MPI_UB are replaced by the function MPI_TYPE_CREATE_RESIZED; this is because their principle use was as input datatypes to MPI_TYPE_STRUCT to create resized datatypes. Also note that some C typedefs and Fortran subroutine names are included in this list; they are the types of callback functions.

    Deprecated MPI-2 Replacement
    MPI_ADDRESS MPI_GET_ADDRESS
    MPI_TYPE_HINDEXED MPI_TYPE_CREATE_HINDEXED
    MPI_TYPE_HVECTOR MPI_TYPE_CREATE_HVECTOR
    MPI_TYPE_STRUCT MPI_TYPE_CREATE_STRUCT
    MPI_TYPE_EXTENT MPI_TYPE_GET_EXTENT
    MPI_TYPE_UB MPI_TYPE_GET_EXTENT
    MPI_TYPE_LB MPI_TYPE_GET_EXTENT
    MPI_LB MPI_TYPE_CREATE_RESIZED
    MPI_UB MPI_TYPE_CREATE_RESIZED
    MPI_ERRHANDLER_CREATE MPI_COMM_CREATE_ERRHANDLER
    MPI_ERRHANDLER_GET MPI_COMM_GET_ERRHANDLER
    MPI_ERRHANDLER_SET MPI_COMM_SET_ERRHANDLER
    MPI_Handler_function MPI_Comm_errhandler_fn
    MPI_KEYVAL_CREATE MPI_COMM_CREATE_KEYVAL
    MPI_KEYVAL_FREE MPI_COMM_FREE_KEYVAL
    MPI_DUP_FN MPI_COMM_DUP_FN
    MPI_NULL_COPY_FN MPI_COMM_NULL_COPY_FN
    MPI_NULL_DELETE_FN MPI_COMM_NULL_DELETE_FN
    MPI_Copy_function MPI_Comm_copy_attr_function
    COPY_FUNCTION COMM_COPY_ATTR_FN
    MPI_Delete_function MPI_Comm_delete_attr_function
    DELETE_FUNCTION COMM_DELETE_ATTR_FN
    MPI_ATTR_DELETE MPI_COMM_DELETE_ATTR
    MPI_ATTR_GET MPI_COMM_GET_ATTR
    MPI_ATTR_PUT MPI_COMM_SET_ATTR



    Up: Language Binding Next: Fortran Binding Issues Previous: Language Binding


    2.6.2. Fortran Binding Issues


    Up: Language Binding Next: C Binding Issues Previous: Deprecated Names and Functions

    MPI-1.1 provided bindings for Fortran 77. MPI-2 retains these bindings but they are now interpreted in the context of the Fortran 90 standard. MPI can still be used with most Fortran 77 compilers, as noted below. When the term Fortran is used it means Fortran 90.

    All MPI names have an MPI_ prefix, and all characters are capitals. Programs must not declare variables, parameters, or functions with names beginning with the prefix MPI_. To avoid conflicting with the profiling interface, programs should also avoid functions with the prefix PMPI_. This is mandated to avoid possible name collisions.

    All MPI Fortran subroutines have a return code in the last argument. A few MPI operations which are functions do not have the return code argument. The return code value for successful completion is MPI_SUCCESS. Other error codes are implementation dependent; see the error codes in Chapter 7 of the MPI-1 document and Annex Language Binding in the MPI-2 document.

    Constants representing the maximum length of a string are one smaller in Fortran than in C and C++ as discussed in Section Constants .

    Handles are represented in Fortran as INTEGERs. Binary-valued variables are of type LOGICAL.

    Array arguments are indexed from one.

    The MPI Fortran binding is inconsistent with the Fortran 90 standard in several respects. These inconsistencies, such as register optimization problems, have implications for user codes that are discussed in detail in Section A Problem with Register Optimization . They are also inconsistent with Fortran 77.


    Additionally, MPI is inconsistent with Fortran 77 in a number of ways, as noted below.



    Up: Language Binding Next: C Binding Issues Previous: Deprecated Names and Functions


    2.6.3. C Binding Issues


    Up: Language Binding Next: C++ Binding Issues Previous: Fortran Binding Issues

    We use the ANSI C declaration format. All MPI names have an MPI_ prefix, defined constants are in all capital letters, and defined types and functions have one capital letter after the prefix. Programs must not declare variables or functions with names beginning with the prefix MPI_. To support the profiling interface, programs should not declare functions with names beginning with the prefix PMPI_.

    The definition of named constants, function prototypes, and type definitions must be supplied in an include file mpi.h.

    Almost all C functions return an error code. The successful return code will be MPI_SUCCESS, but failure return codes are implementation dependent.

    Type declarations are provided for handles to each category of opaque objects.

    Array arguments are indexed from zero.

    Logical flags are integers with value 0 meaning ``false'' and a non-zero value meaning ``true.''

    Choice arguments are pointers of type void *.

    Address arguments are of MPI defined type MPI_Aint. File displacements are of type MPI_Offset. MPI_Aint is defined to be an integer of the size needed to hold any valid address on the target architecture. MPI_Offset is defined to be an integer of the size needed to hold any valid file size on the target architecture.



    Up: Language Binding Next: C++ Binding Issues Previous: Fortran Binding Issues


    2.6.4. C++ Binding Issues


    Up: Language Binding Next: Processes Previous: C Binding Issues

    There are places in the standard that give rules for C and not for C++. In these cases, the C rule should be applied to the C++ case, as appropriate. In particular, the values of constants given in the text are the ones for C and Fortran. A cross index of these with the C++ names is given in Annex Language Binding .

    We use the ANSI C++ declaration format. All MPI names are declared within the scope of a namespace called MPI and therefore are referenced with an MPI:: prefix. Defined constants are in all capital letters, and class names, defined types, and functions have only their first letter capitalized. Programs must not declare variables or functions in the MPI namespace. This is mandated to avoid possible name collisions.

    The definition of named constants, function prototypes, and type definitions must be supplied in an include file mpi.h.


    Advice to implementors.

    The file mpi.h may contain both the C and C++ definitions. Usually one can simply use the defined value (generally __cplusplus, but not required) to see if one is using C++ to protect the C++ definitions. It is possible that a C compiler will require that the source protected this way be legal C code. In this case, all the C++ definitions can be placed in a different include file and the ``#include'' directive can be used to include the necessary C++ definitions in the mpi.h file. ( End of advice to implementors.)
    C++ functions that create objects or return information usually place the object or information in the return value. Since the language neutral prototypes of MPI functions include the C++ return value as an OUT parameter, semantic descriptions of MPI functions refer to the C++ return value by that parameter name (see Section Function Name Cross Reference ). The remaining C++ functions return void.

    In some circumstances, MPI permits users to indicate that they do not want a return value. For example, the user may indicate that the status is not filled in. Unlike C and Fortran where this is achieved through a special input value, in C++ this is done by having two bindings where one has the optional argument and one does not.

    C++ functions do not return error codes. If the default error handler has been set to MPI::ERRORS_THROW_EXCEPTIONS, the C++ exception mechanism is used to signal an error by throwing an MPI::Exception object.

    It should be noted that the default error handler (i.e., MPI::ERRORS_ARE_FATAL) on a given type has not changed. User error handlers are also permitted. MPI::ERRORS_RETURN simply returns control to the calling function; there is no provision for the user to retrieve the error code.

    User callback functions that return integer error codes should not throw exceptions; the returned error will be handled by the MPI implementation by invoking the appropriate error handler.


    Advice to users.

    C++ programmers that want to handle MPI errors on their own should use the MPI::ERRORS_THROW_EXCEPTIONS error handler, rather than MPI::ERRORS_RETURN, that is used for that purpose in C. Care should be taken using exceptions in mixed language situations. ( End of advice to users.)
    Opaque object handles must be objects in themselves, and have the assignment and equality operators overridden to perform semantically like their C and Fortran counterparts.

    Array arguments are indexed from zero.

    Logical flags are of type bool.

    Choice arguments are pointers of type void *.

    Address arguments are of MPI-defined integer type MPI::Aint, defined to be an integer of the size needed to hold any valid address on the target architecture. Analogously, MPI::Offset is an integer to hold file offsets.

    Most MPI functions are methods of MPI C++ classes. MPI class names are generated from the language neutral MPI types by dropping the MPI_ prefix and scoping the type within the MPI namespace. For example, MPI_DATATYPE becomes MPI::Datatype.

    The names of MPI-2 functions generally follow the naming rules given. In some circumstances, the new MPI-2 function is related to an MPI-1 function with a name that does not follow the naming conventions. In this circumstance, the language neutral name is in analogy to the MPI-1 name even though this gives an MPI-2 name that violates the naming conventions. The C and Fortran names are the same as the language neutral name in this case. However, the C++ names for MPI-1 do reflect the naming rules and can differ from the C and Fortran names. Thus, the analogous name in C++ to the MPI-1 name is different than the language neutral name. This results in the C++ name differing from the language neutral name. An example of this is the language neutral name of MPI_FINALIZED and a C++ name of MPI::Is_finalized.

    In C++, function typedefs are made publicly within appropriate classes. However, these declarations then become somewhat cumbersome, as with the following:

    typedef MPI::Grequest::Query_function();

    would look like the following:


    namespace MPI { 
      class Request { 
        // ... 
      }; 
     
      class Grequest : public MPI::Request { 
        // ... 
        typedef Query_function(void* extra_state, MPI::Status& status); 
      }; 
    }; 
    
    Rather than including this scaffolding when declaring C++ typedefs, we use an abbreviated form. In particular, we explicitly indicate the class and namespace scope for the typedef of the function. Thus, the example above is shown in the text as follows:
    typedef int MPI::Grequest::Query_function(void* extra_state, 
                                              MPI::Status& status) 
    

    The C++ bindings presented in Annex MPI-1 C++ Language Binding and throughout this document were generated by applying a simple set of name generation rules to the MPI function specifications. While these guidelines may be sufficient in most cases, they may not be suitable for all situations. In cases of ambiguity or where a specific semantic statement is desired, these guidelines may be superseded as the situation dictates.

      1. All functions, types, and constants are declared within the scope of a namespace called MPI.


      2. Arrays of MPI handles are always left in the argument list (whether they are IN or OUT arguments).


      3. If the argument list of an MPI function contains a scalar IN handle, and it makes sense to define the function as a method of the object corresponding to that handle, the function is made a member function of the corresponding MPI class. The member functions are named according to the corresponding MPI function name, but without the `` MPI_'' prefix and without the object name prefix (if applicable). In addition:

        1. The scalar IN handle is dropped from the argument list, and this corresponds to the dropped argument.


        2. The function is declared const.


      4. MPI functions are made into class functions (static) when they belong on a class but do not have a unique scalar IN or INOUT parameter of that class.


      5. If the argument list contains a single OUT argument that is not of type MPI_STATUS (or an array), that argument is dropped from the list and the function returns that value.


      Example The C++ binding for MPI_COMM_SIZE is int MPI::Comm::Get_size(void) const.


      6. If there are multiple OUT arguments in the argument list, one is chosen as the return value and is removed from the list.


      7. If the argument list does not contain any OUT arguments, the function returns void.


      Example The C++ binding for MPI_REQUEST_FREE is void MPI::Request::Free(void)


      8. MPI functions to which the above rules do not apply are not members of any class, but are defined in the MPI namespace.


      Example The C++ binding for MPI_BUFFER_ATTACH is void MPI::Attach_buffer(void* buffer, int size).


      9. All class names, defined types, and function names have only their first letter capitalized. Defined constants are in all capital letters.


      10. Any IN pointer, reference, or array argument must be declared const.


      11. Handles are passed by reference.


      12. Array arguments are denoted with square brackets ( []), not pointers, as this is more semantically precise.



    Up: Language Binding Next: Processes Previous: C Binding Issues


    2.7. Processes


    Up: MPI-2 Terms and Conventions Next: Error Handling Previous: C++ Binding Issues

    An MPI program consists of autonomous processes, executing their own code, in a MIMD style. The codes executed by each process need not be identical. The processes communicate via calls to MPI communication primitives. Typically, each process executes in its own address space, although shared-memory implementations of MPI are possible.

    This document specifies the behavior of a parallel program assuming that only MPI calls are used. The interaction of an MPI program with other possible means of communication, I/O, and process management is not specified. Unless otherwise stated in the specification of the standard, MPI places no requirements on the result of its interaction with external mechanisms that provide similar or equivalent functionality. This includes, but is not limited to, interactions with external mechanisms for process control, shared and remote memory access, file system access and control, interprocess communication, process signaling, and terminal I/O. High quality implementations should strive to make the results of such interactions intuitive to users, and attempt to document restrictions where deemed necessary.


    Advice to implementors.

    Implementations that support such additional mechanisms for functionality supported within MPI are expected to document how these interact with MPI. ( End of advice to implementors.)
    The interaction of MPI and threads is defined in Section MPI and Threads .



    Up: MPI-2 Terms and Conventions Next: Error Handling Previous: C++ Binding Issues


    2.8. Error Handling


    Up: MPI-2 Terms and Conventions Next: Implementation Issues Previous: Processes

    MPI provides the user with reliable message transmission. A message sent is always received correctly, and the user does not need to check for transmission errors, time-outs, or other error conditions. In other words, MPI does not provide mechanisms for dealing with failures in the communication system. If the MPI implementation is built on an unreliable underlying mechanism, then it is the job of the implementor of the MPI subsystem to insulate the user from this unreliability, or to reflect unrecoverable errors as failures. Whenever possible, such failures will be reflected as errors in the relevant communication call. Similarly, MPI itself provides no mechanisms for handling processor failures.

    Of course, MPI programs may still be erroneous. A program error can occur when an MPI call is made with an incorrect argument (non-existing destination in a send operation, buffer too small in a receive operation, etc.). This type of error would occur in any implementation. In addition, a resource error may occur when a program exceeds the amount of available system resources (number of pending messages, system buffers, etc.). The occurrence of this type of error depends on the amount of available resources in the system and the resource allocation mechanism used; this may differ from system to system. A high-quality implementation will provide generous limits on the important resources so as to alleviate the portability problem this represents.

    In C and Fortran, almost all MPI calls return a code that indicates successful completion of the operation. Whenever possible, MPI calls return an error code if an error occurred during the call. By default, an error detected during the execution of the MPI library causes the parallel computation to abort, except for file operations. However, MPI provides mechanisms for users to change this default and to handle recoverable errors. The user may specify that no error is fatal, and handle error codes returned by MPI calls by himself or herself. Also, the user may provide his or her own error-handling routines, which will be invoked whenever an MPI call returns abnormally. The MPI error handling facilities are described in Chapter 7 of the MPI-1 document and in Section Error Handlers of this document. The return values of C++ functions are not error codes. If the default error handler has been set to MPI::ERRORS_THROW_EXCEPTIONS, the C++ exception mechanism is used to signal an error by throwing an MPI::Exception object.

    Several factors limit the ability of MPI calls to return with meaningful error codes when an error occurs. MPI may not be able to detect some errors; other errors may be too expensive to detect in normal execution mode; finally some errors may be ``catastrophic'' and may prevent MPI from returning control to the caller in a consistent state.

    Another subtle issue arises because of the nature of asynchronous communications: MPI calls may initiate operations that continue asynchronously after the call returned. Thus, the operation may return with a code indicating successful completion, yet later cause an error exception to be raised. If there is a subsequent call that relates to the same operation (e.g., a call that verifies that an asynchronous operation has completed) then the error argument associated with this call will be used to indicate the nature of the error. In a few cases, the error may occur after all calls that relate to the operation have completed, so that no error value can be used to indicate the nature of the error (e.g., an error on the receiver in a send with the ready mode). Such an error must be treated as fatal, since information cannot be returned for the user to recover from it.

    This document does not specify the state of a computation after an erroneous MPI call has occurred. The desired behavior is that a relevant error code be returned, and the effect of the error be localized to the greatest possible extent. E.g., it is highly desirable that an erroneous receive call will not cause any part of the receiver's memory to be overwritten, beyond the area specified for receiving the message.

    Implementations may go beyond this document in supporting in a meaningful manner MPI calls that are defined here to be erroneous. For example, MPI specifies strict type matching rules between matching send and receive operations: it is erroneous to send a floating point variable and receive an integer. Implementations may go beyond these type matching rules, and provide automatic type conversion in such situations. It will be helpful to generate warnings for such non-conforming behavior.

    MPI-2 defines a way for users to create new error codes as defined in Section Error Classes, Error Codes, and Error Handlers .



    Up: MPI-2 Terms and Conventions Next: Implementation Issues Previous: Processes


    2.9. Implementation Issues


    Up: MPI-2 Terms and Conventions Next: Independence of Basic Runtime Routines Previous: Error Handling

    There are a number of areas where an MPI implementation may interact with the operating environment and system. While MPI does not mandate that any services (such as signal handling) be provided, it does strongly suggest the behavior to be provided if those services are available. This is an important point in achieving portability across platforms that provide the same set of services.



    Up: MPI-2 Terms and Conventions Next: Independence of Basic Runtime Routines Previous: Error Handling


    2.9.1. Independence of Basic Runtime Routines


    Up: Implementation Issues Next: Interaction with Signals Previous: Implementation Issues

    MPI programs require that library routines that are part of the basic language environment (such as write in Fortran and printf and malloc in ANSI C) and are executed after MPI_INIT and before MPI_FINALIZE operate independently and that their completion is independent of the action of other processes in an MPI program.

    Note that this in no way prevents the creation of library routines that provide parallel services whose operation is collective. However, the following program is expected to complete in an ANSI C environment regardless of the size of MPI_COMM_WORLD (assuming that printf is available at the executing nodes).

    int rank; 
    MPI_Init((void *)0, (void *)0); 
    MPI_Comm_rank(MPI_COMM_WORLD, &rank); 
    if (rank == 0) printf("Starting program\n"); 
    MPI_Finalize(); 
    
    The corresponding Fortran and C++ programs are also expected to complete.

    An example of what is not required is any particular ordering of the action of these routines when called by several tasks. For example, MPI makes neither requirements nor recommendations for the output from the following program (again assuming that I/O is available at the executing nodes).

    MPI_Comm_rank(MPI_COMM_WORLD, &rank); 
    printf("Output from task rank %d\n", rank); 
    
    In addition, calls that fail because of resource exhaustion or other error are not considered a violation of the requirements here (however, they are required to complete, just not to complete successfully).



    Up: Implementation Issues Next: Interaction with Signals Previous: Implementation Issues


    2.9.2. Interaction with Signals


    Up: Implementation Issues Next: Examples Previous: Independence of Basic Runtime Routines

    MPI does not specify the interaction of processes with signals and does not require that MPI be signal safe. The implementation may reserve some signals for its own use. It is required that the implementation document which signals it uses, and it is strongly recommended that it not use SIGALRM, SIGFPE, or SIGIO. Implementations may also prohibit the use of MPI calls from within signal handlers.

    In multithreaded environments, users can avoid conflicts between signals and the MPI library by catching signals only on threads that do not execute MPI calls. High quality single-threaded implementations will be signal safe: an MPI call suspended by a signal will resume and complete normally after the signal is handled.



    Up: Implementation Issues Next: Examples Previous: Independence of Basic Runtime Routines


    2.10. Examples


    Up: MPI-2 Terms and Conventions Next: Version 1.2 of MPI Previous: Interaction with Signals

    The examples in this document are for illustration purposes only. They are not intended to specify the standard. Furthermore, the examples have not been carefully checked or verified.



    Up: MPI-2 Terms and Conventions Next: Version 1.2 of MPI Previous: Interaction with Signals


    3. Version 1.2 of MPI


    Up: Contents Next: Version Number Previous: Examples

    This section contains clarifications and minor corrections to Version 1.1 of the MPI Standard. The only new function in MPI-1.2 is one for identifying which version of the MPI Standard the implementation being used conforms to. There are small differences between MPI-1 and MPI-1.1. There are very few differences (only those discussed in this chapter) between MPI-1.1 and MPI-1.2, but large differences (the rest of this document) between MPI-1.2 and MPI-2.



    Up: Contents Next: Version Number Previous: Examples


    3.1. Version Number


    Up: Version 1.2 of MPI Next: MPI-1.0 and MPI-1.1 Clarifications Previous: Version 1.2 of MPI

    In order to cope with changes to the MPI Standard, there are both compile-time and run-time ways to determine which version of the standard is in use in the environment one is using.

    The ``version'' will be represented by two separate integers, for the version and subversion:

    In C and C++,

        #define MPI_VERSION    1 
        #define MPI_SUBVERSION 2 
    
    in Fortran,
        INTEGER MPI_VERSION, MPI_SUBVERSION 
        PARAMETER (MPI_VERSION    = 1) 
        PARAMETER (MPI_SUBVERSION = 2) 
    

    For runtime determination,

    MPI_GET_VERSION( version, subversion )
    OUT versionversion number (integer)
    OUT subversionsubversion number (integer)

    int MPI_Get_version(int *version, int *subversion)

    MPI_GET_VERSION(VERSION, SUBVERSION, IERROR)
    INTEGER VERSION, SUBVERSION, IERROR

    MPI_GET_VERSION is one of the few functions that can be called before MPI_INIT and after MPI_FINALIZE. Its C++ binding can be found in the Annex, Section C++ Bindings for New 1.2 Functions .



    Up: Version 1.2 of MPI Next: MPI-1.0 and MPI-1.1 Clarifications Previous: Version 1.2 of MPI


    3.2. MPI-1.0 and MPI-1.1 Clarifications


    Up: Version 1.2 of MPI Next: Clarification of MPI_INITIALIZED Previous: Version Number

    As experience has been gained since the releases of the 1.0 and 1.1 versions of the MPI Standard, it has become apparent that some specifications were insufficiently clear. In this section we attempt to make clear the intentions of the MPI Forum with regard to the behavior of several MPI-1 functions. An MPI-1-compliant implementation should behave in accordance with the clarifications in this section.



    Up: Version 1.2 of MPI Next: Clarification of MPI_INITIALIZED Previous: Version Number


    3.2.1. Clarification of MPI_INITIALIZED


    Up: MPI-1.0 and MPI-1.1 Clarifications Next: Clarification of MPI_FINALIZE Previous: MPI-1.0 and MPI-1.1 Clarifications

    MPI_INITIALIZED returns true if the calling process has called MPI_INIT. Whether MPI_FINALIZE has been called does not affect the behavior of MPI_INITIALIZED.



    Up: MPI-1.0 and MPI-1.1 Clarifications Next: Clarification of MPI_FINALIZE Previous: MPI-1.0 and MPI-1.1 Clarifications


    3.2.2. Clarification of MPI_FINALIZE


    Up: MPI-1.0 and MPI-1.1 Clarifications Next: Clarification of status after MPI_WAIT and MPI_TEST Previous: Clarification of MPI_INITIALIZED

    This routine cleans up all MPI state. Each process must call MPI_FINALIZE before it exits. Unless there has been a call to MPI_ABORT, each process must ensure that all pending non-blocking communications are (locally) complete before calling MPI_FINALIZE. Further, at the instant at which the last process calls MPI_FINALIZE, all pending sends must be matched by a receive, and all pending receives must be matched by a send.

    For example, the following program is correct:

            Process 0                Process 1 
            ---------                --------- 
            MPI_Init();              MPI_Init(); 
            MPI_Send(dest=1);        MPI_Recv(src=0); 
            MPI_Finalize();          MPI_Finalize(); 
    
    Without the matching receive, the program is erroneous:
            Process 0                Process 1 
            -----------              ----------- 
            MPI_Init();              MPI_Init(); 
            MPI_Send (dest=1); 
            MPI_Finalize();          MPI_Finalize(); 
    

    A successful return from a blocking communication operation or from MPI_WAIT or MPI_TEST tells the user that the buffer can be reused and means that the communication is completed by the user, but does not guarantee that the local process has no more work to do. A successful return from MPI_REQUEST_FREE with a request handle generated by an MPI_ISEND nullifies the handle but provides no assurance of operation completion. The MPI_ISEND is complete only when it is known by some means that a matching receive has completed. MPI_FINALIZE guarantees that all local actions required by communications the user has completed will, in fact, occur before it returns.

    MPI_FINALIZE guarantees nothing about pending communications that have not been completed (completion is assured only by MPI_WAIT, MPI_TEST, or MPI_REQUEST_FREE combined with some other verification of completion).


    Example This program is correct:

    rank 0                          rank 1 
    ===================================================== 
    ...                             ... 
    MPI_Isend();                    MPI_Recv(); 
    MPI_Request_free();             MPI_Barrier(); 
    MPI_Barrier();                  MPI_Finalize(); 
    MPI_Finalize();                 exit(); 
    exit();                         
    


    Example This program is erroneous and its behavior is undefined:

    rank 0                          rank 1 
    ===================================================== 
    ...                             ... 
    MPI_Isend();                    MPI_Recv(); 
    MPI_Request_free();             MPI_Finalize(); 
    MPI_Finalize();                 exit(); 
    exit();                         
    

    If no MPI_BUFFER_DETACH occurs between an MPI_BSEND (or other buffered send) and MPI_FINALIZE, the MPI_FINALIZE implicitly supplies the MPI_BUFFER_DETACH.


    Example This program is correct, and after the MPI_Finalize, it is as if the buffer had been detached.

    rank 0                          rank 1 
    ===================================================== 
    ...                             ... 
    buffer = malloc(1000000);       MPI_Recv(); 
    MPI_Buffer_attach();            MPI_Finalize(); 
    MPI_Bsend();                    exit();               
    MPI_Finalize(); 
    free(buffer); 
    exit();                         
    


    Example In this example, MPI_Iprobe() must return a FALSE flag. MPI_Test_cancelled() must return a TRUE flag, independent of the relative order of execution of MPI_Cancel() in process 0 and MPI_Finalize() in process 1.

    The MPI_Iprobe() call is there to make sure the implementation knows that the ``tag1'' message exists at the destination, without being able to claim that the user knows about it.


    rank 0                          rank 1 
    ======================================================== 
    MPI_Init();                     MPI_Init(); 
    MPI_Isend(tag1); 
    MPI_Barrier();                  MPI_Barrier(); 
                                    MPI_Iprobe(tag2); 
    MPI_Barrier();                  MPI_Barrier(); 
                                    MPI_Finalize(); 
                                    exit(); 
    MPI_Cancel(); 
    MPI_Wait(); 
    MPI_Test_cancelled(); 
    MPI_Finalize(); 
    exit(); 
     
    

    Advice to implementors.

    An implementation may need to delay the return from MPI_FINALIZE until all potential future message cancellations have been processed. One possible solution is to place a barrier inside MPI_FINALIZE ( End of advice to implementors.)

    Once MPI_FINALIZE returns, no MPI routine (not even MPI_INIT) may be called, except for MPI_GET_VERSION, MPI_INITIALIZED, and the MPI-2 function MPI_FINALIZED. Each process must complete any pending communication it initiated before it calls MPI_FINALIZE. If the call returns, each process may continue local computations, or exit, without participating in further MPI communication with other processes. MPI_FINALIZE is collective on MPI_COMM_WORLD.


    Advice to implementors.

    Even though a process has completed all the communication it initiated, such communication may not yet be completed from the viewpoint of the underlying MPI system. E.g., a blocking send may have completed, even though the data is still