Plans for Future Development
What is it that separates OpenAD as it is from the close-to-ideal
adjoint compiler?
We see the following five areas where further effort is required to
push
OpenAD to the desired level of robustness, efficiency, and ease of use.
Priorities need to be set according to the requirements of our primary
users
and the level of available funding. The last statement probably applies
to
every large-scale software project based in an environment that is
characterized by limited resources - both human and financial.
- Strategies for adjoint code
generation
- pure adjoint code
(no preaccumulation) for efficient gradient computation:
Preaccumulation is advantageous if the adjoint model is supposed to be
run
many times at the same point, for example, to accumulate the full
Jacobian
for a problem with a moderate number of dependent outputs and a very
large
number of independent inputs. This situation does not apply to the
evaluation of a single gradient. Adjoint code in the classical sense is
likely to perform
better.
- preaccumulation of
adjoint-augmented statement-graphs for efficient gradient computation:
At the level of single scalar assignments improvements over 1.1 can be
achieved
by considering the computation of the local gradients as a vertex
elimination problem in the adjoint-augmented computational graph. We
know how to solve
this problem. The result can be used to generate adjoint code that
performs
less floating-point operations (flops) than the code generated by the
classical approach.
- automated
decisions about local code generation strategy by the adjoint compiler:
In items 1.1 and 1.2 we strive for a low number of flops. Another
relevant measure of complexity is the amount of memory required to
store the values that are used by the adjoint computation. For certain
bits of code it may be advantageous to preaccumulate the local Jacobian
and store all
its entries instead of storing the intermediate values that are
required
to compute this local Jacobian. This decision can be made by the
adjoint compiler based on its internal representation of the code and
the results
of data-flow analyses (see point 2 below).
- AD-specific data-flow analyses
- activity
(flow-sensitive and flow-insensitive):
Knowing if a variable depends on an independent input and if some
dependent
output depends on it is fundamental for the ability to generate
efficient adjoint code. A conservative estimate can be computed
statically (at compile-time). The granularity of the information
generated by the activity
analysis has a strong impact on the complexity of generating the
adjoint code
and running it. The effective exploitation of very detailed activity
information
during the code generation phase is highly desirable.
- adjoint
requisiteness (TBR + extensions):
Building on the available information about activity it can be decided
if
a variable must be stored before its is overwritten or not.
Consequently,
the memory requirements of adjoint code can be decreased considerably.
- checkpoint
requisiteness (for various reversal modes and types of checkpoints):
Different program reversal strategies (see point 3 below) require the
storage and retrieval
of various types of checkpoints. Depending on the context the size of
these checkpoints can be decreased considerably by exploiting the
information generated by further data-flow analyses.
- Program reversal modes
- static (e.g., split,
joint, delayed joint, etc.):
A number of call graph reversal modes are static in the sense that they
are
fully determined at compile-time. Therefore, the mapping between the
nodes in the extended dynamic call tree (EDCT - subroutine call pattern
as it
is realized at run-time) must be well-defined at compiler-time such
that
the corresponding code can be generated. Statically reversed codes have
the advantage that the native compiler can potentially do a good job
when
optimizing them.
- dynamic (based on heuristic running on extended DCT)
Often it may be advantageous to build the EDCT explicitly at run-time
and to
make the decision about the reversal mode dynamically. Thus, the
structure
of the EDCT as well as potential profiling information about the
complexity
of executing the subroutines that are associated with the single EDCT
nodes
can be exploited by heuristics for optimizing the adjoint code.
- user-guided (via
domain-specific directives):
Sometimes the user may know that some reversal strategy gives good
results for
a given section of the code. The user should be able to convey this
information
to the adjoint compiler via domain-specific directives (see 5).
- Language-coverage and library
handling in adjoint code
- coverage for
FortranXX, C, C++:
OpenAD has been designed to handle various programming languages. Work
is
underway for FortranXX, C and C++. Further effort is needed to reach
the desired level of robustness for the existing front-ends. New
front-ends
for other programming languages (MatLab, Java, ...) should be added as
required. It is important to realize that software engineering problems
make up
a large fraction of the issues involved in the development of adjoint
compiler technology. Without adequate programming support such a
project
is bound to fail.
- language concepts
(e.g., array arithmetic, pointers and dynamic
memory allocation, polymorphism):
Many language concepts, in particular those found in object-oriented
languages,
have never been considered in the context of automatic adjoint code
generation.
We are aware of several hard theoretical and technical problems that
need
to be considered in this context. Without an answer to these open
questions
the correctness of the adjoint code cannot be guaranteed.
- support libraries
(MPI, netcdf, etc.):
Modern numerical simulation programs make extensive use of support
libraries
(for parallelism, i/o, ...) that need to be handled correctly in the
adjoint code.
- AD-specific directives and intrinsics
- iterative linear and
nonlinear solvers:
Certain properties of numerical code that can be exploited during the
source transformation are hard to determine automatically
at compile-time. Iterative linear solvers do not need to be
differentiated
if the mathematical context is understood and mapped to the adjoint
code.
A delayed differentiation of nonlinear solvers (as soon as the values
have
almost converged) is desirable both for stability and efficiency.
- self-adjointness:
The fact that a given section of the code is self-adjoint is hard to
determine
statically. The user may know about this property and should be able to
convey this information to the adjoint compiler that can exploit it for
the generation of more efficient and numerically stable adjoint code.
- intrinsics catalog:
Certain subroutines are quasi-standard implementations in a given
application
area. Highly efficient and robust derivative code may be available and
should
be used as part of the automatically generated adjoint. OpenAD provides
an
catalog that allows the user to define application-specific
quasi-intrinsic
routines by providing code for the evaluation of their derivative. This
code can be used by the adjoint compiler in a plug-and-play fashion to
generate improved adjoint code.
Many other important issues, such as automatic sparsity detection,
parallel checkpointing, further AD-specific data-flow analyses, are
investigated by various other groups. We intent to evaluate their
results
for a possible inclusion in the pool of algorithms provided by OpenAD.