Plans for Future Development

What is it that separates OpenAD as it is from the close-to-ideal adjoint compiler?

We see the following five areas where further effort is required to push OpenAD to the desired level of robustness, efficiency, and ease of use. Priorities need to be set according to the requirements of our primary users and the level of available funding. The last statement probably applies to every large-scale software project based in an environment that is characterized by limited resources - both human and financial.
  1. Strategies for adjoint code generation
    1. pure adjoint code (no preaccumulation) for efficient gradient computation:
      Preaccumulation is advantageous if the adjoint model is supposed to be run many times at the same point, for example, to accumulate the full Jacobian for a problem with a moderate number of dependent outputs and a very large number of independent inputs. This situation does not apply to the evaluation of a single gradient. Adjoint code in the classical sense is likely to perform better.
    2. preaccumulation of adjoint-augmented statement-graphs for efficient gradient computation:
      At the level of single scalar assignments improvements over 1.1 can be achieved by considering the computation of the local gradients as a vertex elimination problem in the adjoint-augmented computational graph. We know how to solve this problem. The result can be used to generate adjoint code that performs less floating-point operations (flops) than the code generated by the classical approach.
    3. automated decisions about local code generation strategy by the adjoint compiler:
      In items 1.1 and 1.2 we strive for a low number of flops. Another relevant measure of complexity is the amount of memory required to store the values that are used by the adjoint computation. For certain bits of code it may be advantageous to preaccumulate the local Jacobian and store all its entries instead of storing the intermediate values that are required to compute this local Jacobian. This decision can be made by the adjoint compiler based on its internal representation of the code and the results of data-flow analyses (see point 2 below).
  2. AD-specific data-flow analyses
    1. activity (flow-sensitive and flow-insensitive):
      Knowing if a variable depends on an independent input and if some dependent output depends on it is fundamental for the ability to generate efficient adjoint code. A conservative estimate can be computed statically (at compile-time). The granularity of the information generated by the activity analysis has a strong impact on the complexity of generating the adjoint code and running it. The effective exploitation of very detailed activity information during the code generation phase is highly desirable.
    2. adjoint requisiteness (TBR + extensions):
      Building on the available information about activity it can be decided if a variable must be stored before its is overwritten or not. Consequently, the memory requirements of adjoint code can be decreased considerably.
    3. checkpoint requisiteness (for various reversal modes and types of checkpoints):
      Different program reversal strategies (see point 3 below) require the storage and retrieval of various types of checkpoints. Depending on the context the size of these checkpoints can be decreased considerably by exploiting the information generated by further data-flow analyses.
  3. Program reversal modes
    1. static (e.g., split, joint, delayed joint, etc.):
      A number of call graph reversal modes are static in the sense that they are fully determined at compile-time. Therefore, the mapping between the nodes in the extended dynamic call tree (EDCT - subroutine call pattern as it is realized at run-time) must be well-defined at compiler-time such that the corresponding code can be generated. Statically reversed codes have the advantage that the native compiler can potentially do a good job when optimizing them.
    2. dynamic (based on heuristic running on extended DCT)
      Often it may be advantageous to build the EDCT explicitly at run-time and to make the decision about the reversal mode dynamically. Thus, the structure of the EDCT as well as potential profiling information about the complexity of executing the subroutines that are associated with the single EDCT nodes can be exploited by heuristics for optimizing the adjoint code.
    3. user-guided (via domain-specific directives):
      Sometimes the user may know that some reversal strategy gives good results for a given section of the code. The user should be able to convey this information to the adjoint compiler via domain-specific directives (see 5).
  4. Language-coverage and library handling in adjoint code
    1. coverage for FortranXX, C, C++:
      OpenAD has been designed to handle various programming languages. Work is underway for FortranXX, C and C++. Further effort is needed to reach the desired level of robustness for the existing front-ends. New front-ends for other programming languages (MatLab, Java, ...) should be added as required. It is important to realize that software engineering problems make up a large fraction of the issues involved in the development of adjoint compiler technology. Without adequate programming support such a project is bound to fail.
    2. language concepts (e.g., array arithmetic, pointers and dynamic memory allocation, polymorphism):
      Many language concepts, in particular those found in object-oriented languages, have never been considered in the context of automatic adjoint code generation. We are aware of several hard theoretical and technical problems that need to be considered in this context. Without an answer to these open questions the correctness of the adjoint code cannot be guaranteed.
    3. support libraries (MPI, netcdf, etc.):
      Modern numerical simulation programs make extensive use of support libraries (for parallelism, i/o, ...) that need to be handled correctly in the adjoint code.
  5. AD-specific directives and intrinsics
    1. iterative linear and nonlinear solvers:
      Certain properties of numerical code that can be exploited during the source transformation are hard to determine automatically at compile-time. Iterative linear solvers do not need to be differentiated if the mathematical context is understood and mapped to the adjoint code. A delayed differentiation of nonlinear solvers (as soon as the values have almost converged) is desirable both for stability and efficiency.
    2. self-adjointness:
      The fact that a given section of the code is self-adjoint is hard to determine statically. The user may know about this property and should be able to convey this information to the adjoint compiler that can exploit it for the generation of more efficient and numerically stable adjoint code.
    3. intrinsics catalog:
      Certain subroutines are quasi-standard implementations in a given application area. Highly efficient and robust derivative code may be available and should be used as part of the automatically generated adjoint. OpenAD provides an catalog that allows the user to define application-specific quasi-intrinsic routines by providing code for the evaluation of their derivative. This code can be used by the adjoint compiler in a plug-and-play fashion to generate improved adjoint code.
Many other important issues, such as automatic sparsity detection, parallel checkpointing, further AD-specific data-flow analyses, are investigated by various other groups. We intent to evaluate their results for a possible inclusion in the pool of algorithms provided by OpenAD.