ARGONNE NATIONAL LABORATORY

9700 South Cass Avenue

Argonne, Illinois 60439











TAO Users Manual





Steve Benson
Lois Curfman McInnes
Jorge J. Moré
Todd Munson
Jason Sarich





Mathematics and Computer Science Division




Technical Report ANL/MCS-TM-242-Revision 1.9




This manual is intended for use with TAO version 1.9








August 29, 2007











This work was supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Computational and Technology Research, U.S. Department of Energy, under Contract W-31-109-Eng-38.


Contents

Preface

The Toolkit for Advanced Optimization (TAO) focuses on the development of algorithms and software for the solution of large-scale optimization problems on high-performance architectures. Areas of interest include nonlinear least squares, unconstrained and bound-constrained optimization, and general nonlinear optimization.

The development of TAO was motivated by the scattered support for parallel computations and the lack of reuse of external toolkits in current optimization software. Our aim is to use object-oriented techniques to produce high-quality optimization software for a range of computing environments ranging from serial workstations and laptops to massively parallel high-performance architectures. Our design decisions are strongly motivated by the challenges inherent in the use of large-scale distributed memory architectures and the reality of working with large, often poorly structured legacy codes for specific applications.

This manual describes the use of TAO. Since TAO is still under development, changes in usage and calling sequences may occur. TAO is fully supported; see the the web site http://www.mcs.anl.gov/tao for information on contacting the TAO developers.

Acknowledgments

The initial development of TAO was funded by the ACTS Toolkit Project in the Office of Advanced Scientific Computing Research, U.S. Department of Energy. We gratefully acknowledge their support.

TAO owes much to the developers of PETSc. We have benefitted from their experience, tools, software, and advice. In many ways, TAO is a natural outcome of the PETSc development. TAO has also benefitted from the work of various researchers who have provided solvers, test problems, and interfaces. In particular, we acknowledge

Finally, we thank all TAO users for their comments, bug reports, and encouragement.


Introduction to TAO

The Toolkit for Advanced Optimization (TAO) focuses on the design and implementation of optimization software for the solution of large-scale optimization applications on high-performance architectures. Our approach is motivated by the scattered support for parallel computations and lack of reuse of linear algebra software in currently available optimization software. The TAO design allows the reuse of toolkits that provide lower-level support (parallel sparse matrix data structures, preconditioners, solvers), and thus we are able to build on top of these toolkits instead of having to redevelop code. The advantages in terms of efficiency and development time are significant.

The TAO design philosophy uses object-oriented techniques of data and state encapsulation, abstract classes, and limited inheritance to create a flexible optimization toolkit. This chapter provides a short introduction to our design philosophy by describing the objects in TAO and the importance of this design. Since a major concern in the TAO project is the performance and scalability of optimization algorithms on large problems, we also present some performance resuls.

TAO Design Philosophy

The TAO design philosophy place strongs emphasis on the reuse of external tools where appropriate. Our design enables bidirectional connection to lower-level linear algebra support (e.g. parallel sparse matrix data structures) provided in toolkits such as PETSc [3] [4,2] as well as higher-level application frameworks. Our design decisions are strongly motivated by the challenges inherent in the use of large-scale distributed memory architectures and the reality of working with large and often poorly structured legacy codes for specific applications. Figure 1.1 illustrates how the TAO software works with external libraries and application code.

Figure 1.1: TAO Design
\begin{figure}\centerline{\epsfysize=3.5in \epsfbox{taofig.eps}}\end{figure}

The TAO solvers use four fundamental objects to define and solve optimization problems: vectors, index sets, matrices, and linear solvers. The concepts of vectors and matrices are standard, while an index set refers to a set of integers used to identify particular elements of vectors or matrices. An optimization algorithm is a sequence of well defined operations on these objects. These operations include vector sums, inner products, and matrix-vector multiplication. TAO makes no assumptions about the representation of these objects by passing pointers to data-structure-neutral objects for the execution of these numerical operations.

With sufficiently flexible abstract interfaces, TAO can support a variety of implementations of data structures and algorithms. These abstractions allow us to more easily experiment with a range of algorithmic and data structure options for realistic problems, such as within this case study. Such capabilities are critical for making high-performance optimization software adaptable to the continual evolution of parallel and distributed architectures and the research community's discovery of new algorithms that exploit their features.

Our current TAO implementation uses the parallel system infrastructure and linear algebra objects offered by PETSc, which uses MPI [13] for all interprocessor communication. The PETSc package supports objects for vectors, matrices, index sets, and linear solvers.

The TAO design philosophy eliminates some of the barriers in using independently developed software components by accepting data that is independent of representation and calling sequence written for particular data formats. The user can initialize an application with external frameworks, provide function information to a TAO solver, and call TAO to solve the application problem.

The use of abstractions for matrices and vectors in TAO optimization software also enables us to leverage automatic differentiation technology to facilitate the parallel computation of gradients and Hessians needed within optimization algorithms. We have demonstrated the viability of this approach through preliminary interfacing between TAO solvers and the automatic differentiation tools ADIFOR and ADIC. We are currently working on developing TAO interfaces that use special problem features (for example, partial separability, stencil information) in automatic differentiation computations.

Performance Results

A major concern in the TAO project is the performance and scalability of optimization algorithms on large problems. In this section we focus on the GPCG (gradient projection, conjugate gradient) algorithm for the solution of bound-constrained convex quadratic programming problems. Originally developed by Moré and Toraldo [20], the GPCG algorithm was designed for large-scale problems but had only been implemented for a single processor. GPCG combines the advantages of the identification properties of the gradient projection method with the finite termination properties of the conjugate gradient method. Moreover, the performance of the TAO implementation on large optimization problems is noteworthy.

Figure 1.2: The journal bearing problem with $\epsilon$ = 0.9
\begin{figure}\centerline{\epsfysize=3.0in \epsfbox{pjb.eps}}\end{figure}

We illustrate the performance of the GPCG algorithm by presenting results for a journal bearing problem with over 2.5 million variables. The journal bearing problem is a finite element approximation to a variational problem over a rectangular two-dimensional grid. A grid with $1600$ points in each direction, for example, is formulated as a bound constrained quadratic problem with $1600^2=2,560,000$ variables. The triangulation of the grid results in a matrix that has the usual five diagonal nonzero structure that arises from a difference approximation to the Laplacian operator. The journal bearing problem contains an eccentricity parameter, $\varepsilon \in (0,1)$, that influences the number of active variables at the solution and the difficulty in solving it. Figure 1.2 shows the solution of the journal bearing problem for $ \varepsilon = 0.9 $. The steep gradient in the solution makes this problem a difficult benchmark.

The performance results in Table 1.1 are noteworthy is several ways. First of all, the number of faces visited by GPCG is remarkably small. Other strategies can lead to a large number of gradient projection iterates, but the GPCG algorithm is remarkably efficient. Another interesting aspect of these results is that due to the low memory requirements of iterative solvers, we were able to solve these problems with only $ p = 8 $ processors. Strategies that rely on direct solvers are likely to need significantly more storage, and thus more processors. Finally, these results show that the GPCG implementation has excellent efficiency. For example, the efficiency of GPCG with respect to $ p = 8 $ processors ranges between $ 70\% $ and $ 100\% $ when $ \varepsilon = 0.1 $. This sustained efficiency is remarkable since the GPCG algorithm is solving a sequence of linear problems with a coefficient matrix set to the submatrix of the Hessian matrix with respect to the free variables for the current iterate. Thus, our implementation's repartitioning of submatrices effectively deals with the load-balancing problem that is inherent in the GPCG algorithm.


Table 1.1: Performance of GPCG on the journal bearing problem with $ 2.56 \cdot 10^6 $ variables.
$ \varepsilon $ $ p $ faces $n_{CG}$ time $t_{CG}$% $ \cal E $
0.1 8 46 431 7419 86 100
0.1 16 45 423 3706 83 100
0.1 32 45 427 2045 82 91
0.1 64 45 427 1279 82 73
0.9 8 37 105 2134 70 100
0.9 16 37 103 1124 71 95
0.9 32 38 100 618 69 86
0.9 64 38 99 397 68 67


An important aspect of our results that is not apparent from Table 1.1 is that for these results we were able to experiment easily with all the preconditioners offered by PETSc. In particular, we were able to compare the diagonal Jacobi preconditioner with block Jacobi and overlapping additive Schwarz preconditioners that use a zero-fill ILU solver in each block. We also experimented with a parallel zero-fill incomplete Cholesky preconditioner provided by a PETSc interface to the BlockSolve95 [15] package of Jones and Plassmann. Interestingly enough, the diagonal Jacobi preconditioner achieved better performance on this problem.


Basic Usage of TAO Solvers

TAO contains unconstrained minimization, bound constrained minimization, and nonlinear complementarity solvers. The structure of these problems can differ significantly, but TAO has a similar interface to all of its solvers. Routines that most solvers have in common will be discussed in this chapter. A complete list of options can be found by consulting the manual pages. Many of the options can also be set at the command line. These options can also be found in manual pages or by running a program with the -help option.

Initialize and Finalize

The first TAO routine in any application should be TaoInitialize(). Most TAO programs begin with a call to
   info = TaoInitialize(int *argc,char ***argv,char *file_name, 
                        char *help_message);
This command initializes TAO, as well as MPI, PETSc, and other packages to which TAO applications may link (if these have not yet been initialized elsewhere). In particular, the arguments argc and argv are the command line arguments delivered in all C and C++ programs; these arguments initialize the options database. The argument file_name optionally indicates an alternative name for an options file, which by default is called .petscrc and resides in the user's home directory.

One of the last routines that all TAO programs should call is

   info = TaoFinalize();
This routine finalizes TAO and any other libraries that may have been initialized during the TaoInitialize() phase. For example, TaoFinalize() calls MPI_Finalize() if TaoInitialize() began MPI. If MPI was initiated externally from TAO (by either the user or another software package), then the user is responsible for calling MPI_Finalize().

Creation and Destruction

A TAO solver can be created with the command

   info = TaoCreate(MPI_Comm comm,TaoMethod method,TAO_SOLVER *newsolver);
The first argument in this routine is an MPI communicator indicating which processes are involved in the solution process. In most cases, this should be set to MPI_COMM_WORLD. The second argument in this creation routine specifies the default method that should be be used to solve the optimization problem. The third argument in TaoCreate() is a pointer to a TAO solver object. This routine creates the object and returns it to the user. The TAO object is then to be used in all TAO routines.

The various types of TAO solvers and the flags that identify them will be discussed in the following chapters. The solution method should be carefully chosen depending upon the problem that is being solved. Some solvers, for instance, are meant for problems with no constraints, while other solvers acknowledge constraints in the problem and solve them accordingly. The user must also be aware of the derivative information that is available. Some solvers require second-order information, while other solvers require only gradient or function information. The TaoMethod can also be set to TAO_NULL in the TaoCreate() routine if the user selects a method at runtime using the options database. The command line option -tao_method followed by an TAO method will override any method specified by the second argument. The command line option -tao_method tao_lmvm, for instance, will specify the limited memory variable metric method for unconstrained optimization. Note that the TaoMethod variable is a string that requires quotation marks in an application program, but quotation marks are not required at the command line. The method that TAO uses to solve an optimization problem can be changed at a later point in the program with the command TaoSetMethod(), whose arguments are a TAO solver and a string that uniquely identifies a method for solving the problem.

Each TAO solver that has been created should also be destroyed using the command

   info = TaoDestroy(TAO_SOLVER solver);
This routine frees the internal data structures used by the solver.


Convergence

Although TAO and its solvers set default parameters that are useful for many problems, it may be necessary for the user to modify these parameters to change the behavior and convergence of various algorithms.

One convergence criterion for most algorithms concerns the of digits of accuracy needed in the solution. In particular, one convergence test employed by TAO attempts to stop when the error in the constraints is less than $\epsilon_{crtol}$, and either

\begin{displaymath}\frac{ \vert f(X) - f(X^*)\vert}{ \vert f(X)\vert + 1} \leq \...
...rtol}
\;\textnormal{or}\;
f(X) - f(X^*) \leq \epsilon_{fatol}, \end{displaymath}

where $X^*$ is the current approximation to $X$. TAO estimates $f(X) - f(X^*)$ with either the square of the norm of the gradient or the duality gap. A relative tolerance of $\epsilon_{frtol}=0.01$ indicates that two significant digits are desired in the objective function. Each solver sets its own convergence tolerances, but they can be changed using the routine TaoSetTolerances() . Another set of convergence tolerances can be set with TaoSetGradientTolerances(). These tolerances terminate the solver when the norm of the gradient function (or Lagrangian function for bound-constrained problems) is sufficiently close to zero.

Other stopping criteria include a minimum trust region radius or a maximum number of iterations. These parameters can be set with the routines TaoSetTrustRegionTolerance() and TaoSetMaximumIterates(). Similarly, a maximum number of function evaluations can be set with the command TaoSetMaximumFunctionEvaluations() .

Viewing Solutions

The routine

   int TaoSolveApplication(TAO_APPLICATION, TAO_SOLVER);
will apply the solver to the application that has been created by the user.

To see parameters and performance statistics for the solver, the routine

   int TaoView(TAO_SOLVER);
can be used. This routine will display to standard output the number of function evaluations need by the solver and other information specific to the solver.

The progress of the optimization solver can be monitored with the runtime option -tao_monitor. Although monitoring routines can be customized, the default monitoring routine will print out several relevant statistics to the screen.

The user also has access to information about the current solution. The current iteration number, objective function value, gradient norm, infeasibility norm, and step length can be retrieved with the command

   int TaoGetSolutionStatus(TAO_SOLVER tao, int* iterate, double* f, 
                            double* gnorm, double *cnorm, double *xdiff, 
                            TaoTerminateReason *reason)
The last argument returns a code that indicates the reason that the solver terminated. Positive numbers indicate that a solution has been found, while negative numbers indicate a failure. A list of reasons can be found in the manual page for TaoGetTerminationReason().

The user set vectors containing the solution and gradient before solving the problem, but pointers to these vectors can also be retrieved with the commands TaoGetSolution() and TaoGetGradient(). Dual variables and other relevant information are also available. This information can be obtained during user-defined routines such as a function evaluation and customized monitoring routine, or after the solver has terminated.

TAO Solvers


Unconstrained Minimization

Unconstrained minimization is used to minimize a function of many variables without any constraints on the variables, such as bounds. The methods available in TAO for solving these problems can be classified according to the amount of derivative information required:
  1. Function evaluation only - Nelder-Mead method (tao_nm)
  2. Function and gradient evaluations - limited-memory, variable-metric method (tao_lmvm) and nonlinear conjugate gradient method (tao_cg)
  3. Function, gradient, and Hessian evaluations - Newton line-search method (tao_nls) and Newton trust-region method (tao_ntr)
The best method to use depends on the particular problem being solved and the accuracy required in the solution. If a Hessian evaluation routine is available, then the Newton line-search and Newton trust-region methods will be the best performers. When a Hessian evaluation routine is not available, then the limited-memory, variable-metric method is likely to perform best. The Nelder-Mead method should be used only as a last resort when no gradient information is available.

Each solver has a set of options associated with it that can be set with command line arguments. A brief description of these algorithms and the associated options are discussed in this chapter.

Nelder-Mead

The Nelder-Mead algorithm [24] is a direct search method for finding a local minimum of a function $f(x)$. This algorithm does not require any gradient or Hessian information of $f$, and therefore has some expected advantages and disadvantages compared to the other TAO solvers. The obvious advantage is that it is easier to write an application when no derivatives need to be calculated. The downside is that this algorithm can be very slow to converge or can even stagnate, and performs poorly for large numbers of variables.

This solver keeps a set of $N+1$ sorted vectors ${x_1,x_2,\ldots,x_{N+1}}$ and their corresponding objective function values $f_1 \leq f_2 \leq \ldots \leq f_{N+1}$. At each iteration, $x_{N+1}$ is removed from the set and replaced with

\begin{displaymath}
x(\mu) = (1+\mu) \frac{1}{N} \sum_{i=1}^N x_i - \mu x_{N+1},
\end{displaymath}

where $\mu$ can be one of ${\mu_0,2\mu_0,\frac{1}{2}\mu_0,-\frac{1}{2}\mu_0}$ depending upon the values of each possible $f(x(\mu))$.

The algorithm terminates when the residual $f_{N+1} - f_1$ becomes sufficiently small. Because of the way new vectors can be added to the sorted set, the minimum function value and/or the residual may not be impacted at each iteration.

There are two options that can be set specifically for the Nelder-Mead algorithm, -tao_nm_lamda <value> sets the initial set of vectors ($x_0$ plus value in each cartesion direction), the default value is $1$. tao_nm_mu <value> sets the value of $\mu_0$, the default is $\mu_0=1$.


Limited-Memory, Variable-Metric Method

The limited-memory, variable-metric method solves the system of equations

\begin{displaymath}
H_k d_k = -\nabla f(x_k),
\end{displaymath}

where $H_k$ is a positive definite approximation to the Hessian matrix obtained by using the BFGS update formula with a limited number of previous iterates and gradient evaluations. The inverse of $H_k$ can readily be applied to obtain the direction $d_k$. Having obtained the direction, a Moré-Thuente line search is applied to compute a step length, $\tau_k$, that approximately solves the one-dimensional optimization problem

\begin{displaymath}
\min_\tau f(x_k + \tau d_k).
\end{displaymath}

The current iterate and Hessian approximation are updated and the process is repeated until the method converges. This algorithm is the default unconstrained minimization solver and can be selected using the TaoMethod tao_lmvm. For best efficiency, function and gradient evaluations should be performed simultaneously when using this algorithm.

The primary factors determining the behavior of this algorithm are the number of vectors stored for the Hessian approximation and the scaling matrix used when computing the direction. The number of vectors stored can be set with the command line argument -tao_lmm_vectors <int>; $5$ is the default value. Increasing the number of vectors results in a better Hessian approximation and can decrease the number of iterations required to compute a solution to the optimization problem. However, as the number of vectors increases, more memory is consumed and each direction calculation takes longer to compute. Therefore, a trade off must be made between the quality of the Hessian approximation, the memory requirements, and the time to compute the direction.

During the computation of the direction, the inverse of an initial Hessian approximation $H_{0,k}$ is applied. The choice of $H_{0,k}$ has a significant impact on the quality of the direction obtained and can result in a decrease in the number of function and gradient evaluations required to solve the optimization problem. However, the calculation of $H_{0,k}$ at each iteration can have a significant impact on the time required to update the limited-memory BFGS approximation and the cost of obtaining the direction. By default, $H_{0,k}$ is a diagonal matrix obtained from the diagonal entries of a Broyden approximation to the Hessian matrix. The calculation of $H_{0,k}$ can be modified with the command line argument -tao_lmm_scale_type <none,scalar,broyden>. Each scaling method is described below. The scalar and broyden techniques are inspired by [].

none
This scaling method uses the identity matrix as $H_{0,k}$. No extra computations are required when obtaining the search direction or updating the Hessian approximation. However, the number of functions and gradient evaluations required to converge to a solution is typically much larger than the number required when using other scaling methods.
scalar
This scaling method uses a multiple of the identity matrix as $H_{0,k}$. The scalar value $\sigma$ is chosen by solving the one-dimensional optimization problem

\begin{displaymath}
\min_\sigma \Vert\sigma^\alpha Y - \sigma^{\alpha - 1} S\Vert _F^2,
\end{displaymath}

where $\alpha \in [0,1]$ is given, and $S$ and $Y$ are the matrices of past iterate and gradient information required by the limited-memory BFGS update formula. The optimal value for $\sigma$ can be written down explicitly. This choice of $\sigma$ attempts to satisfy the secant equation $\sigma Y = S$. Since this equation cannot typically be satisfied by a scalar, a least norm solution is computed. The amount of past iterate and gradient information used is set by the command line argument tao_lmm_scalar_history <int>, which must be less than or equal to the number of vectors kept for the BFGS approximation. The default value is 5. The choice for $\alpha$ is made with the command line argument tao_lmm_scalar_alpha <double>; $1$ is the default value. This scaling method offers a good compromise between no scaling and broyden scaling.
broyden
This scaling method uses a positive-definite diagonal matrix obtained from the diagonal entries of the Broyden approximation to the Hessian for the scaling matrix. The Broyden approximation is a family of approximations parametrized by a constant $\phi$; $\phi = 0$ gives the BFGS formula and $\phi = 1$ gives the DFP formula. The value of $\phi$ is set with the command line argument -tao_lmm_broyden_phi <double>. The default value for $\phi$ is $0.125$. This scaling method requires the most computational effort of available choices, but typically results in a significant reduction in the number of function and gradient evaluations taken to compute a solution.

An additional rescaling of the diagonal matrix can be applied to further improve performance when using the broyden scaling method. The rescaling method can be set with the command line argument -tao_lmm_rescale_type <none,scalar,gl>; scalar is the default rescaling method. The rescaling method applied can have a large impact on the number of function and gradient evaluations necessary to compute a solution to the optimization problem, but increases the time required to update the BFGS approximation. Each rescaling method is described below. These techniques are inspired by [].

none
This rescaling method does not modify the diagonal scaling matrix.
scalar
This rescaling method chooses a scalar value $\sigma$ by solving the one-dimensional optimization problem

\begin{displaymath}
\min_\sigma \Vert\sigma^\alpha H_{0,k}^{\beta} Y - \sigma^{\alpha - 1} H_{0,k}^{\beta - 1} S\Vert _F^2,
\end{displaymath}

where $\alpha \in [0,1]$ and $\beta \in [0,1]$ are given, $H_{0,k}$ is the positive-definite diagonal scaling matrix computed by using the Broyden update, and $S$ and $Y$ are the matrices of past iterate and gradient information required by the limited-memory BFGS update formula. This choice of $\sigma$ attempts to satisfy the secant equation $\sigma H_{0,k} Y = S$. Since this equation cannot typically be satisfied by a scalar, a least norm solution is computed. The scaling matrix used is then $\sigma H_{0,k}$. The amount of past iterate and gradient information used is set by the command line argument tao_lmm_rescale_history <int>, which must be less than or equal to the number of vectors kept for the BFGS approximation. The default value is 5. The choice for $\alpha$ is made with the command line argument tao_lmm_rescale_alpha <double>; $1$ is the default value. The choice for $\beta$ is made with the command line argument tao_lmm_rescale_beta <double>; $0.5$ is the default value.
gl
This scaling method is the same as the scalar rescaling method, but the previous value for the scaling matrix $H_{0,k-1}$ is used when computing $\sigma$. This is the rescaling method suggested in [].

Finally, a limit can be placed on the difference between the scaling matrix computed at this iteration and the previous value for the scaling matrix. The limiting type can be set with the command line argument -tao_lmm_limit_type <none,average,relative,absolute>; none is the default value. Each of these methods is described below when using the scalar scaling method. The techniques are the same when using the broyden scaling method, but are applied to each entry in the diagonal matrix.

none
Set $\sigma_k = \sigma$, where $\sigma$ is the value computed by the scaling method.
average
Set $\sigma_k = \mu \sigma + (1 - \mu) \sigma_{k-1}$, where $\sigma$ is the value computed by the scaling method, $\sigma_{k-1}$ is the previous value, and $\mu \in [0,1]$ is given.
relative
Set $\sigma_k = \mbox{median}\left\{ (1 - \mu) \sigma_{k-1}, \sigma, (1+\mu) \sigma_{k-1}\right\}$, where $\sigma$ is the value computed by the scaling method, $\sigma_{k-1}$ is the previous value, and $\mu \in [0,1]$ is given.
absolute
Set $\sigma_k = \mbox{median}\left\{\sigma_{k-1} - \nu, \sigma, \sigma_{k-1} + \nu\right\}$, where $\sigma$ is the value computed by the scaling method, $\sigma_{k-1}$ is the previous value, and $\nu$ is given.
The value for $\mu$ is set with the command line argument -tao_lmm_limit_mu <double>; $1$ is the default value. The value for $\nu$ is set with the command line argument -tao_lmm_limit_nu <double>. The default value is 100.

The default values for the scaling are based on many tests using the unconstrained problems from the MINPACK-2 test set. These tests were used to narrow the choices to a few sets of values. These values were then run on the unconstrained problems from the CUTEr test set to obtain the default values supplied.


Table 4.1: Summary of lmvm options
Name Value Default Description
-tao_lmm_vectors int 5 Number of vectors for Hessian approximation
-tao_lmm_scale_type none, scalar, broyden broyden Type of scaling method to use
-tao_lmm_scalar_history int 5 Number of vectors to use when scaling
-tao_lmm_scalar_alpha double 1 Value of $\alpha$ for scalar scaling method
-tao_lmm_broyden_phi double 0.125 Value of $\alpha$ for scalar scaling method
-tao_lmm_rescale_type none, scalar, gl scalar Type of rescaling method to use
-tao_lmm_rescale_history int 5 Number of vectors to use when rescaling
-tao_lmm_rescale_alpha double 1 Value of $\alpha$ for rescaling method
-tao_lmm_rescale_beta double 0.5 Value of $\beta$ for rescaling method
-tao_lmm_limit_type none, average, relative, absolute none Type of limit to impose on scaling matrix
-tao_lmm_limit_mu double 1 Value of $\mu$ for limit type
-tao_lmm_limit_nu double 100 Value of $\nu$ for limit type


Nonlinear Conjugate Gradient Method

The nonlinear conjugate gradient method can be viewed as an extensions of the conjugate gradient method for solving symmetric, positive-definite linear systems of equations. This algorithm requires only function and gradient evaluations as well as a line search. The TAO implementation uses a Moré-Thuente line search to obtain the step length. The nonlinear conjugate gradient method can be selected by using the TaoMethod tao_cg. For the best efficiency, function and gradient evaluations should be performed simultaneously when using this algorithm.

Five variations are currently supported by the TAO implementation: the Fletcher-Reeves method, the Polak-Ribiére method, the Polak-Ribiére-Plus method[25], the Hestenes-Stiefel method, and the Dai-Yuan method. These conjugate gradient methods can be specified by using the command line argument tao_cg_type <fr,pr,prp,hs,dy>, respectively. The default value is prp.

The conjugate gradient method incorporates automatic restarts when successive gradients are not sufficiently orthogonal. TAO measures the orthogonality by dividing the inner product of the gradient at the current point and the gradient at the previous point by the square of the Euclidean norm of the gradient at the current point. When the absolute value of this ratio is greater than $\eta$, the algorithm restarts using the gradient direction. The parameter $\eta$ can be set using the command line argument -tao_cg_eta <double>; 0.1 is the default value.


Newton Line-Search Method

The Newton line-search method solves the symmetric system of equations

\begin{displaymath}
H_k d_k = -g_k
\end{displaymath}

to obtain a step $d_k$, where $H_k$ is the Hessian of the objective function at $x_k$ and $g_k$ is the gradient of the objective function at $x_k$. For problems where the Hessian matrix is indefinite, the perturbed system of equations

\begin{displaymath}
(H_k + \rho_k I) d_k = -g_k
\end{displaymath}

is solved to obtain the direction, where $\rho_k$ is a positive constant. If the direction computed is not a descent direction, the (scaled) steepest descent direction is used instead. Having obtained the direction, a Moré-Thuente line search is applied to obtain a step length, $\tau_k$, that approximately solves the one-dimensional optimization problem

\begin{displaymath}
\min_\tau f(x_k + \tau d_k).
\end{displaymath}

The Newton line-search method can be set using the TaoMethod tao_nls. For the best efficiency, function and gradient evaluations should be performed simultaneously when using this algorithm.

The system of equations is approximately solved by applying the conjugate gradient method, Steihaug-Toint conjugate gradient method, generalized Lanczos method, or an alternative Krylov subspace method supplied by PETSc. The method used to solve the systems of equations is specified with the command line argument -tao_nls_ksp_type <cg,stcg,gltr,petsc>; cg is the default. When the type is set to petsc, the method set with the PETSc -ksp_type command line argument is used. For example, to use GMRES as the linear system solver, one would use the the command line arguments -tao_nls_ksp_type petsc -ksp_type gmres. Internally, the PETSc implementations for the conjugate gradient methods and the generalized Lanczos method are used. See the PETSc manual for further information on changing the behavior of the linear system solvers.

A good preconditioner reduces the number of iterations required to solve the linear system of equations. For the conjugate gradient methods and generalized Lanczos method, this preconditioner must be symmetric and positive definite. The available options are to use no preconditioner, the absolute value of the diagonal of the Hessian matrix, a limited-memory BFGS approximation to the Hessian matrix, or one of the other preconditioners provided by the PETSc package. These preconditioners are specified by the command line argument -tao_nls_pc_type <none,ahess,bfgs,petsc>, respectively. The default is the bfgs preconditioner. When the preconditioner type is set to petsc, the preconditioner set with the PETSc -pc_type command line argument is used. For example, to use an incomplete Cholesky factorization for the preconditioner, one would use the command line arguments -tao_nls_pc_type petsc -pc_type icc. See the PETSc manual for further information on changing the behavior of the preconditioners.

The choice of scaling matrix can have a significant impact on the quality of the Hessian approximation when using the bfgs preconditioner and affect the number of iterations required by the linear system solver. The choices for scaling matrices are the same as those discussed for the limited-memory, variable-metric algorithm. For Newton methods, however, the option exists to use a scaling matrix based on the true Hessian matrix. In particular, the implementation supports using the absolute value of the diagonal of the Hessian matrix or the absolute value of the diagonal of the perturbed Hessian matrix. The scaling matrix to use with the bfgs preconditioner is set with the command line argument -tao_nls_bfgs_scale_type <bfgs,ahess,phess>; phess is the default. The bfgs scaling matrix is derived from the BFGS options. The ahess scaling matrix is the absolute value of the diagonal of the Hessian matrix. The phess scaling matrix is the absolute value of the diagonal of the perturbed Hessian matrix.

The perturbation $\rho_k$ is added when the direction returned by the Krylov subspace method is either not a descent direction, the Krylov method diverged due to an indefinite preconditioner or matrix, or a direction of negative curvature was found. In the two latter cases, if the step returned is a descent direction, it is used during the line search. Otherwise, a steepest descent direction is used during the line search. The perturbation is decreased as long as the Krylov subspace method reports success and increased if further problems are encountered. There are three cases: initializing, increasing, and decreasing the perturbation. These cases are described below.

  1. If $\rho_k$ is zero and a problem was detected with either the direction on the Krylov subspace method, the perturbation is initialized to

    \begin{displaymath}
\rho_{k+1} = \mbox{median}\left\{\mbox{imin}, \mbox{imfac} * \Vert g(x_k)\Vert, \mbox{imax}\right\},
\end{displaymath}

    where imin is set with the command line argument -tao_nls_imin <double> with a default value of $10^{-4}$, imfac by -tao_nls_imfac with a default value of 0.1, and imax by -tao_nls_imax with a default value of 100. When using the gltr method to solve the system of equations, an estimate of the minimum eigenvalue $\lambda_1$ of the Hessian matrix is available. This value is use to initialize the perturbation to $\rho_{k+1} = \max\left\{\rho_{k+1}, -\lambda_1\right\}$.
  2. If $\rho_k$ is nonzero and a problem was detected with either the direction or Krylov subspace method, the perturbation is increased to

    \begin{displaymath}
\rho_{k+1} = \min\left\{\mbox{pmax}, \max\left\{\mbox{pgfac} * \rho_k, \mbox{pmgfac} * \Vert g(x_k)\Vert\right\}\right\},
\end{displaymath}

    where pgfac is set with the command line argument -tao_nls_pgfac with a default value of 10, pmgfac by -tao_nls_pmgfac with a default value of 0.1, and pmax by -tao_nls_pmax with a default value of 100.
  3. If $\rho_k$ is nonzero and no problems were detected with either the direction or Krylov subspace method, the perturbation is decreased to

    \begin{displaymath}
\rho_{k+1} = \min\left\{\mbox{psfac} * \rho_k, \mbox{pmsfac} * \Vert g(x_k)\Vert\right\},
\end{displaymath}

    where psfac is set with the command line argument -tao_nls_psfac with a default value of 0.4, and pmsfac by -tao_nls_pmsfac with a default value of 0.1. Moreover, if $\rho_{k+1} < \mbox{pmin}$ then $\rho_{k+1} = 0$, where pmin is set with the command line argument -tao_nls_pmin and has a default value of $10^{-12}$.

When using stcg or gltr to solve the linear systems of equation, a trust-region radius need to be initialized and updated. This trust-region radius limits the size of the step computed. The method for initializing the trust-region radius is set with the command line argument -tao_nls_init_type <constant,direction,interpolation>; interpolation, which chooses an initial value based on the interpolation scheme found in [5], is the default. This scheme performs a number of function and gradient evaluations to determine a radius such that the reduction predicted by the quadratic model along the gradient direction coincides with the actual reduction in the nonlinear function. The iterate obtaining the best objective function value is used as the starting point for the main line-search algorithm. The constant method initializes the trust-region radius by using the value specified with the -tao_trust0 <double> command line argument, where the default value is 100. The direction technique solves the first quadratic optimization problem by using a standard conjugate gradient method and initializes the trust-region to $\Vert s_0\Vert$.

Finally, the method for updating the trust-region radius is set with the command line argument -tao_nls_update_type <step,reduction,interpolation>; step is the default. The step method updates the trust-region radius based on the value of $\tau_k$. In particular,

\begin{displaymath}
\Delta_{k+1} = \left\{\begin{array}{ll}
\omega_1 \mbox{min}(...
...rt) & \mbox{if } \tau_k \in [\nu_4, \infty)
\end{array}\right.
\end{displaymath}

where $0 < \omega_1 < \omega_2 < \omega_3 = 1 < \omega_4 < \omega_5$ and $0 < \nu_1 < \nu_2 < \nu_3 < \nu_4$ are constants. The reduction method computes the ratio of the actual reduction in the objective function to the reduction predicted by the quadratic model for the full step, $\kappa_k = \frac{f(x_k) - f(x_k + d_k)}{q(x_k) - q(x_k + d_k)}$, where $q_k$ is the quadratic model. The radius is then updated as:

\begin{displaymath}
\Delta_{k+1} = \left\{\begin{array}{ll}
\alpha_1 \mbox{min}(...
... & \mbox{if } \kappa_k \in [\eta_4, \infty)
\end{array}\right.
\end{displaymath}

where $0 < \alpha_1 < \alpha_2 < \alpha_3 = 1 < \alpha_4 < \alpha_5$ and $0 < \eta_1 < \eta_2 < \eta_3 < \eta_4$ are constants. The interpolation method uses the same interpolation mechanism as in the initialization to compute a new value for the trust-region radius.


Table 4.2: Summary of nls options
Name Value Default Description
-tao_nls_ksp_type cg, stcg, gltr, petsc cg Type of Krylov subspace method to use when solving linear system
-tao_nls_pc_type none, ahess, bfgs, petsc bfgs Type of preconditioner to use when solving linear system
-tao_nls_bfgs_scale_type ahess, phess, bfgs phess Type of scaling matrix to use with BFGS preconditioner
-tao_nls_sval double $0$ Initial perturbation value
-tao_nls_imin double $10^{-4}$ Minimum initial perturbation value
-tao_nls_imax double $100$ Maximum initial perturbation value
-tao_nls_imfac double $0.1$ Factor applied to norm of gradient when initializing perturbation
-tao_nls_pmax double $100$ Maximum perturbation when increasing value
-tao_nls_pgfac double $10$ Growth factor applied to perturbation when increasing value
-tao_nls_pmgfac double $0.1$ Factor applied to norm of gradient when increasing perturbation
-tao_nls_pmin double $10^{-12}$ Minimum perturbation when decreasing value; smaller values set to zero
-tao_nls_psfac double $0.4$ Shrink factor applied to perturbation when decreasing value
-tao_nls_pmsfac double $0.1$ Factor applied to norm of gradient when decreasing perturbation


Table 4.3: Summary of nls options (continued)
Name Value Default Description
-tao_nls_init_type constant, direction, interpolation interpolation Method used to initialize trust-region radius when using stcg or gltr
-tao_nls_mu1_i double 0.35 $\mu_1$ in interpolation init
-tao_nls_mu2_i double 0.50 $\mu_2$ in interpolation init
-tao_nls_gamma1_i double 0.0625 $\gamma_1$ in interpolation init
-tao_nls_gamma2_i double 0.50 $\gamma_2$ in interpolation init
-tao_nls_gamma3_i double 2.00 $\gamma_3$ in interpolation init
-tao_nls_gamma4_i double 5.00 $\gamma_4$ in interpolation init
-tao_nls_theta_i double 0.25 $\theta$ in interpolation init
-tao_nls_update_type step, reduction, interpolation step Method used to update trust-region radius when using stcg or gltr
-tao_nls_nu1 double 0.25 $\nu_1$ in step update
-tao_nls_nu2 double 0.50 $\nu_2$ in step update
-tao_nls_nu3 double 1.00 $\nu_3$ in step update
-tao_nls_nu4 double 1.25 $\nu_4$ in step update
-tao_nls_omega1 double 0.25 $\omega_1$ in step update
-tao_nls_omega2 double 0.50 $\omega_2$ in step update
-tao_nls_omega3 double 1.00 $\omega_3$ in step update
-tao_nls_omega4 double 2.00 $\omega_4$ in step update
-tao_nls_omega5 double 4.00 $\omega_5$ in step update
-tao_nls_eta1 double $10^{-4}$ $\eta_1$ in reduction update
-tao_nls_eta2 double 0.25 $\eta_2$ in reduction update
-tao_nls_eta3 double 0.50 $\eta_3$ in reduction update
-tao_nls_eta4 double 0.90 $\eta_4$ in reduction update
-tao_nls_alpha1 double 0.25 $\alpha_1$ in reduction update
-tao_nls_alpha2 double 0.50 $\alpha_2$ in reduction update
-tao_nls_alpha3 double 1.00 $\alpha_3$ in reduction update
-tao_nls_alpha4 double 2.00 $\alpha_4$ in reduction update
-tao_nls_alpha5 double 4.00 $\alpha_5$ in reduction update
-tao_nls_mu1 double 0.10 $\mu_1$ in interpolation update
-tao_nls_mu2 double 0.50 $\mu_2$ in interpolation update
-tao_nls_gamma1 double 0.25 $\gamma_1$ in interpolation update
-tao_nls_gamma2 double 0.50 $\gamma_2$ in interpolation update
-tao_nls_gamma3 double 2.00 $\gamma_3$ in interpolation update
-tao_nls_gamma4 double 4.00 $\gamma_4$ in interpolation update
-tao_nls_theta double 0.05 $\theta$ in interpolation update


Newton Trust-Region Method

The Newton trust-region method solves the constrained quadratic programming problem

\begin{displaymath}
\begin{array}{ll}
\min_d & \frac{1}{2}d^T H_k d + g_k^T d \\
\mbox{subject to} & \Vert d\Vert \leq \Delta_k
\end{array}\end{displaymath}

to obtain a direction $d_k$, where $H_k$ is the Hessian of the objective function at $x_k$, $g_k$ is the gradient of the objective function at $x_k$ and $\Delta_k$ is the trust-region radius. If $x_k + d_k$ sufficiently reduces the nonlinear objective function, then the step is accepted and the trust-region radius is updated. However, if $x_k + d_k$ does not sufficiently reduce the nonlinear objective function, then the step is rejected, the trust-region radius is reduced, and the quadratic program is re-solved using the updated trust-region radius. The Newton trust-region method can be set using TaoMethod tao_ntr. For the best efficiency, function and gradient evaluations should be performed separately when using this algorithm.

The quadratic optimization problem is approximately solved by applying the Steihaug-Toint conjugate gradient method or generalized Lanczos method to the symmetric system of equations $H_k d = -g_k$. The method used to solve the system of equations is specified with the command line argument -tao_ntr_ksp_type <stcg,gltr>; stcg is the default. Internally, the PETSc implementations for the Steihaug-Toint method and the generalized Lanczos method are used. See the PETSc manual for further information on changing the behavior of these linear system solvers.

A good preconditioner reduces the number of iterations required to compute the direction. For the Steihaug-Toint conjugate gradient method and generalized Lanczos method, this preconditioner must be symmetric and positive definite. The available options are to use no preconditioner, the absolute value of the diagonal of the Hessian matrix, a limited-memory BFGS approximation to the Hessian matrix, or one of the other preconditioners provided by the PETSc package. These preconditioners are specified by the the command line argument -tao_ntr_pc_type <none,ahess,bfgs,petsc>, respectively. The default is the bfgs preconditioner. When the preconditioner type is set the to petsc, the preconditioner set with the PETSc -pc_type command line argument is used. For example, to use an incomplete Cholesky factorization for the preconditioner, one would use the command line arguments -tao_ntr_pc_type petsc -pc_type icc. See the PETSc manual for further information on changing the behavior of the preconditioners.

The choice of scaling matrix can have a significant impact on the quality of the Hessian approximation when using the bfgs preconditioner and affect the number of iterations required by the linear system solver. The choices for scaling matrices are the same as those discussed for the limited-memory, variable-metric algorithm. For Newton methods, however, the option exists to use a scaling matrix based on the true Hessian matrix. In particular, the implementation supports using the absolute value of the diagonal of the Hessian matrix. The scaling matrix to use with the bfgs preconditioner is set with the command line argument -tao_ntr_bfgs_scale_type <ahess,bfgs>; ahess is the default. The bfgs scaling matrix is derived from the BFGS options. The ahess scaling matrix is the absolute value of the diagonal of the Hessian matrix.

The method for computing an initial trust-region radius is set with the command line argument -tao_ntr_init_type <constant,direction,interpolation>; interpolation, which chooses an initial value based on the interpolation scheme found in [5], is the default. This scheme performs a number of function and gradient evaluations to determine a radius such that the reduction predicted by the quadratic model along the gradient direction coincides with the actual reduction in the nonlinear function. The iterate obtaining the best objective function value is used as the starting point for the main line-search algorithm. The constant method initializes the trust-region radius by using the value specified with the -tao_trust0 <double> command line argument, where the default value is 100. The direction technique solves the first quadratic optimization problem by using a standard conjugate gradient method and initializes the trust-region to $\Vert s_0\Vert$.

Finally, the method for updating the trust-region radius is set with the command line argument -tao_ntr_update_type <reduction,interpolation>; reduction is the default. The reduction method computes the ratio of the actual reduction in the objective function to the reduction predicted by the quadratic model for the full step, $\kappa_k = \frac{f(x_k) - f(x_k + d_k)}{q(x_k) - q(x_k + d_k)}$, where $q_k$ is the quadratic model. The radius is then updated as:

\begin{displaymath}
\Delta_{k+1} = \left\{\begin{array}{ll}
\alpha_1 \mbox{min}(...
... & \mbox{if } \kappa_k \in [\eta_4, \infty)
\end{array}\right.
\end{displaymath}

where $0 < \alpha_1 < \alpha_2 < \alpha_3 = 1 < \alpha_4 < \alpha_5$ and $0 < \eta_1 < \eta_2 < \eta_3 < \eta_4$ are constants. The interpolation method uses the same interpolation mechanism as in the initialization to compute a new value for the trust-region radius.


Table 4.4: Summary of ntr options
Name Value Default Description
-tao_ntr_ksp_type stcg, gltr stcg Type of Krylov subspace method to use when solving linear system
-tao_ntr_pc_type none, ahess, bfgs, petsc bfgs Type of preconditioner to use when solving linear system
-tao_ntr_bfgs_scale_type ahess, bfgs ahess Type of scaling matrix to use with BFGS preconditioner
-tao_ntr_init_type constant, direction, interpolation interpolation Method used to initialize trust-region radius
-tao_ntr_mu1_i double 0.35 $\mu_1$ in interpolation init
-tao_ntr_mu2_i double 0.50 $\mu_2$ in interpolation init
-tao_ntr_gamma1_i double 0.0625 $\gamma_1$ in interpolation init
-tao_ntr_gamma2_i double 0.50 $\gamma_2$ in interpolation init
-tao_ntr_gamma3_i double 2.00 $\gamma_3$ in interpolation init
-tao_ntr_gamma4_i double 5.00 $\gamma_4$ in interpolation init
-tao_ntr_theta_i double 0.25 $\theta$ in interpolation init
-tao_ntr_update_type reduction, interpolation reduction Method used to update trust-region radius
-tao_ntr_eta1 double $10^{-4}$ $\eta_1$ in reduction update
-tao_ntr_eta2 double 0.25 $\eta_2$ in reduction update
-tao_ntr_eta3 double 0.50 $\eta_3$ in reduction update
-tao_ntr_eta4 double 0.90 $\eta_4$ in reduction update
-tao_ntr_alpha1 double 0.25 $\alpha_1$ in reduction update
-tao_ntr_alpha2 double 0.50 $\alpha_2$ in reduction update
-tao_ntr_alpha3 double 1.00 $\alpha_3$ in reduction update
-tao_ntr_alpha4 double 2.00 $\alpha_4$ in reduction update
-tao_ntr_alpha5 double 4.00 $\alpha_5$ in reduction update
-tao_ntr_mu1 double 0.10 $\mu_1$ in interpolation update
-tao_ntr_mu2 double 0.50 $\mu_2$ in interpolation update
-tao_ntr_gamma1 double 0.25 $\gamma_1$ in interpolation update
-tao_ntr_gamma2 double 0.50 $\gamma_2$ in interpolation update
-tao_ntr_gamma3 double 2.00 $\gamma_3$ in interpolation update
-tao_ntr_gamma4 double 4.00 $\gamma_4$ in interpolation update
-tao_ntr_theta double 0.05 $\theta$ in interpolation update


Bound Constrained Optimization

Bound constrained optimization algorithms minimize $f: \, \mathbb{R}^n \to \mathbb{R}$, subject to upper or lower bounds on some of the variables. These solvers also bounds on the variables as well as objective function, gradient, and possibly Hessian information.


Newton Trust Region

The TRON [18] algorithm is an active set method that uses a combination of gradient projections and a preconditioned conjugate gradient method to minimize an objective function. Each iteration of the TRON algorithm requires function, gradient, and Hessian evaluations. In each iteration, the algorithm first applies several conjugate gradients. After these iterates, the TRON solver momentarily ignores the variables that equal one of its bounds and applies a preconditioned conjugate gradient method to a quadratic model of the free variables.

The TRON algorithm solves a reduced linear system defined by the rows and columns corresponding to the variables that lie between the upper and lower bounds. When running in parallel, these rows can either remain on their current processor or be redistributed evenly over all of the processors with the command TaoSelectSubset(). The TRON algorithm applies a trust region to the conjugate gradients to ensure convergence. The initial trust region can be set using the command TaoSetTrustRegionRadius() and the current trust region size can be found using the command TaoGetTrustRegionRadius(). The initial trust region can significantly alter the rate of convergence for the algorithm and should be tuned and adjusted for optimal performance.

Gradient Projection-Conjugate Gradient Method

The GPCG [20] algorithm is much like the TRON algorithm, discussed in Section 4.2.1, except that it assumes that the objective function is quadratic and convex. Therefore, it evaluates the function, gradient, and Hessian only once. Since the objective function is quadratic, the algorithm does not use a trust region. All of the options that apply to TRON, except for trust region options, also apply to GPCG.

Interior Point Newton Algorithm

The BQPIP algorithm is an interior point algorithm for bound constrained quadratic optimization. It can be set using the TaoMethod of tao_bqpip. Since it assumes the objective function is quadratic, it evaluates the function, gradient, and Hessian only once. In this algorithm all of the variables are free variables. This method also requires the solution of systems of linear equations, whose solver can be accessed and modified with the command TaoGetLinearSolver().

Limited Memory Variable Metric Method

This method is the bound constrained variant of the LMVM method for unconstrained optimization. It uses projected gradients to approximate the Hessian - eliminating the need for Hessian evaluations. The method can be set using TaoMethod tao_blmvm. The command TaoLMVMSetSize(), which sets the number of vectors to be used in the Hessian approximation, also applies to this method.


KT Method

This method calculates points satisfying the first-order necessary optimality conditions. The method uses the mixed complementarity problem solvers from Section 4.3 to calculate the solutions. The choice of complementarity solver is specified with the runtime option -tao_kt_method with the default being the tao_ssils method.


Complementarity

Mixed complementarity problems, or box-constrained variational inequalities, are related to nonlinear systems of equations. They are defined by a continuously differentiable function, $F:\mathbb{R}^n \to \mathbb{R}^n$, and bounds, $\ell \in \{\mathbb{R}\cup \{-\infty\}\}^n$ and $u \in \{\mathbb{R}\cup \{\infty\}\}^n$, on the variables such that $\ell \leq u$. Given this information, $\mbox{\boldmath\(x\)}^* \in [\ell,u]$ is a solution to MCP($F$, $\ell$, $u$) if for each $i \in \{1, \ldots, n\}$ we have at least one of the following:

\begin{eqnarray*}
\begin{array}{ll}
F_i(x^*) \geq 0 & \mbox{if } x^*_i = \ell_i ...
...i < u_i \\
F_i(x^*) \leq 0 & \mbox{if } x^*_i = u_i.
\end{array}\end{eqnarray*}


Note that when $\ell = \{-\infty\}^n$ and $u = \{\infty\}^n$ we have a nonlinear system of equations, and $\ell = \{0\}^n$ and $u = \{\infty\}^n$ corresponds to the nonlinear complementarity problem [6].

Simple complementarity conditions arise from the first-order optimality conditions from optimization [16,17]. In the simple bound constrained optimization case, these conditions correspond to MCP($\nabla f$, $\ell$, $u$), where $f: \mathbb{R}^n \to \mathbb{R}$ is the objective function. In a one-dimensional setting these conditions are intuitive. If the solution is at the lower bound, then the function must be increasing and $\nabla f \geq 0$. However, if the solution is at the upper bound, then the function must be decreasing and $\nabla f \leq 0$. Finally, if the solution is strictly between the bounds, we must be at a stationary point and $\nabla f = 0$. Other complementarity problems arise in economics and engineering [9], game theory [23], and finance [14].

Evaluation routines for $F$ and its Jacobian must be supplied prior to solving the application. The bounds, $[\ell,u]$, on the variables must also be provided. If no starting point is supplied, a default starting point of all zeros is used.

Semismooth Methods

TAO has two implementations of semismooth algorithms [22,7,8] for solving mixed complementarity problems. Both are based upon a reformulation of the mixed complementarity problem as a nonsmooth system of equations using the Fischer-Burmeister function [10]. A nonsmooth Newton method is applied to the reformulated system to calculate a solution. The theoretical properties of such methods are detailed in the aforementioned references.

The Fischer-Burmeister function, $\phi:\mathbb{R}^2 \to \mathbb{R}$, is defined as,

\begin{eqnarray*}
\phi(a,b) := \sqrt{a^2 + b^2} - a - b.
\end{eqnarray*}


This function has the following key property

\begin{eqnarray*}
\begin{array}{lcr}
\phi(a,b) = 0 & \Leftrightarrow & a \geq 0,\; b \geq 0,\; ab = 0
\end{array}\end{eqnarray*}


used when reformulating the mixed complementarity problem the system of equations $\Phi(x) = 0$ where $\Phi:\mathbb{R}^n \to \mathbb{R}^n$. The reformulation is defined component-wise as

\begin{eqnarray*}
\Phi_i(x) := \left\{ \begin{array}{ll}
\phi(x_i - l_i, F_i(x)...
... & \mbox{if } -\infty < l_i = u_i < \infty.
\end{array} \right.
\end{eqnarray*}


We note that $\Phi$ is not differentiable everywhere, but satisfies a semismoothness property [19,26,27]. Furthermore, the natural merit function, $\Psi(x) := \frac{1}{2} \Vert \Phi(x) \Vert _2^2$, is continuously differentiable.

The two semismooth TAO solvers both solve the system $\Phi(x) = 0$ by applying a nonsmooth newton method with a line-search. We calculate a direction, $d^k$, by solving the system $H^kd^k = -\Phi(x^k)$ where $H^k$ is an element of the $B$-subdifferential [27] of $\Phi$ at $x^k$. If the direction calculated does not satisfy a suitable descent condition, then we use the negative gradient of the merit function, $-\nabla \Psi(x^k)$, as the search direction. A standard Armijo search [1] is used to find the new iteration. Non-monotone searches [11] are also available by setting appropriate run-time options. See Section 6.2 for further details.

The first semismooth algorithm available in TAO is not guaranteed to remain feasible with respect to the bounds, $[\ell, u]$, and is termed an infeasible semismooth method. This method can be specified using the TaoMethod tao_ssils. In this case, the descent test used is that

\begin{eqnarray*}
\nabla \Psi(x^k)^Td^k \leq -\delta\Vert d^k \Vert^\rho.
\end{eqnarray*}


Both $\delta > 0$ and $\rho > 2$ can be modified using the run-time commands -tao_ssils_delta <delta> and -tao_ssils_rho <rho> respect