[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Load Balancing and KSPSolve



Satish,

Logs attached...hope they help.

Thanks,

Tim.

Satish Balay wrote:
Can you send the -log_summary for your runs [say p=1, p=8]

Satish

On Tue, 20 Nov 2007, Tim Stitt wrote:

Hi all (again),

I finally got some data back from the KSP PETSc code that I put together to
solve this sparse inverse matrix problem I was looking into. Ideally I am
aiming for a O(N) (time complexity) approach to getting the first 'k' columns
of the inverse of a sparse matrix.

To recap the method: I have my solver which uses KSPSolve in a loop that
iterates over the first k columns of an identity matrix B and computes the
corresponding x vector.

I am just a bit curious about some of the timings I am obtaining...which I
hope someone can explain. Here are the timings I obtained for a global sparse
matrix (4704 x 4704) and solving for the first 1176 columns in the identity
using P processes (processors) on our cluster.

(Timings are given in seconds for each process performing work in the loop and
were obtained by encapsulating the loop with the cpu_time() Fortran intrinsic.
The MUMPS package was requested for factorisation/solving, although similar
timings were obtained for both the native solver and SUPERLU)

P=1  [30.92]
P=2  [15.47, 15.54]
P=4  [4.68, 5.49, 4.67, 5.07]
P=8  [2.36, 4,23, 2.81, 2.54, 3.42, 2.22, 1.41, 3.15]
P=16 [1.04, 0.45, 1.08, 0.27, 0.87, 0.93, 1.1, 1.06, 0.29, 0.34, 0.73, 0.25,
0.43, 1.09, 1.08, 1.1]

Firstly, I notice very good scalability up to 16 processes...is this expected
(by those people who use these solvers regularly)?

Also I notice that the timings per process vary as we scale up. Is this a
load-balancing problem related to more non-zero values being on a given
processor than others? Once again is this expected?

Please excuse my ignorance of matters relating to these solvers and their
operation...as it really isn't my field of expertise.

Regards,

Tim.





--
Dr. Timothy Stitt <timothy_dot_stitt_at_ichec.ie>
HPC Application Consultant - ICHEC (www.ichec.ie)

Dublin Institute for Advanced Studies
5 Merrion Square - Dublin 2 - Ireland

+353-1-6621333 (tel) / +353-1-6621477 (fax)

Creating /localscratch/pbstmp.159913.l2cu33.ichec.ie
Working directory is /ichec/work/staff/tstitt/SolverCode
 
 Running with  1  processes
 
 Matrix has order  4704  rows by  4704  columns
 
 Number of RHS is:  4704
 
 Master Solve Time is:  110.194252
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./solver on a pathscale named h2au40 with 1 processor, by tstitt Tue Nov 20 20:32:25 2007
Using Petsc Release Version 2.3.3, Patch 7, Fri Oct 26 14:21:35 CDT 2007 HG revision: 2e223033ba960114833e1f9713ab393ec78c056f

                         Max       Max/Min        Avg      Total 
Time (sec):           1.173e+02      1.00000   1.173e+02
Objects:              1.100e+01      1.00000   1.100e+01
Flops:                9.561e+10      1.00000   9.561e+10  9.561e+10
Flops/sec:            8.152e+08      1.00000   8.152e+08  8.152e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       5.000e+00      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.1728e+02 100.0%  9.5606e+10 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  5.000e+00 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   The code for various complex numbers numerical       #
      #   kernels uses C++, which generally is not well        #
      #   optimized.  For performance that is about 4-5 times  #
      #   faster, specify --with-fortran-kernels=generic       #
      #   when running config/configure.py.                    #
      #                                                        #
      ##########################################################




      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event                Count      Time (sec)     Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatSolve            4704 1.0 1.0548e+02 1.0 8.63e+08 1.0 0.0e+00 0.0e+00 0.0e+00 90 95  0  0  0  90 95  0  0  0   863
MatLUFactorNum         1 1.0 4.4833e+00 1.0 1.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00  4  5  0  0  0   4  5  0  0  0  1011
MatILUFactorSym        1 1.0 5.4705e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0 20   0  0  0  0 20     0
MatAssemblyBegin       1 1.0 5.9605e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.0177e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 1.0262e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 7.6232e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0 80   0  0  0  0 80     0
VecSet              4704 1.0 9.1822e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin    4704 1.0 7.7252e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd      4704 1.0 9.2847e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetup               1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve            4704 1.0 1.1025e+02 1.0 8.67e+08 1.0 0.0e+00 0.0e+00 5.0e+00 94100  0  0100  94100  0  0100   867
PCSetUp                1 1.0 4.6143e+00 1.0 9.83e+08 1.0 0.0e+00 0.0e+00 5.0e+00  4  5  0  0100   4  5  0  0100   983
PCApply             4704 1.0 1.0551e+02 1.0 8.63e+08 1.0 0.0e+00 0.0e+00 0.0e+00 90 95  0  0  0  90 95  0  0  0   863
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix     2              2  155965032     0
           Index Set     5              5      86120     0
                 Vec     2              2     151872     0
       Krylov Solver     1              1          0     0
      Preconditioner     1              1        168     0
========================================================================================================================
Average time to get PetscTime(): 1.50204e-06
OptionTable: -log_summary
OptionTable: -mat_type aijmumps
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16
Configure run at: Thu Nov 15 23:52:44 2007
Configure options: --with-cxx=mpiCC --with-cc=mpicc --with-mpi-dir=/usr/local/mpich2/path3.0/ --with-blas-lib=/opt/packages/path-compat/acml/pathscale64/lib/libacml.a --with-lapack-lib=/opt/packages/path-compat/acml/pathscale64/lib/libacml.a --with-timer=mpi --with-fc=mpif77 --download-mumps=1 --download-scalapack=1 --download-superlu_dist=1 --download-superlu=1 --with-shared=0 --CXXOPTFLAGS=-fast --FOPTFLAGS=-fast --COPTFLAGS=-fast --download-blacs=1 --with-scalar-type=complex --with-debugging=0 --download-spooles=1
-----------------------------------------
Libraries compiled on Thu Nov 15 23:52:50 GMT 2007 on l2cu28 
Machine characteristics: Linux l2cu28 2.6.5-7.287.3-smp_perfctr #3 SMP Wed Oct 17 21:27:48 BST 2007 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /ichec/work/staff/tstitt/petsc-2.3.3-p7
Using PETSc arch: pathscale_O3
-----------------------------------------
Using C compiler: mpicc -fPIC   
Using Fortran compiler: mpif77 -fPIC    
-----------------------------------------
Using include paths: -I/ichec/work/staff/tstitt/petsc-2.3.3-p7 -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/bmake/pathscale_O3 -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/include -I/usr/X11R6/include -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/MUMPS_4.7.3/pathscale_O3/include -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SCALAPACK/pathscale_O3/include -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/blacs-dev/pathscale_O3/include -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_DIST_2.0-Jan_5_2006/pathscale_O3/SRC -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/spooles-2.2/pathscale_O3/ -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_3.0-Jan_5_2006/pathscale_O3/SRC -I/usr/local/mpich2/path3.0/include    
------------------------------------------
Using C linker: mpicc -fPIC 
Using Fortran linker: mpif77 -fPIC  
Using libraries: -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/lib/pathscale_O3 -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/lib/pathscale_O3 -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc        -L/usr/X11R6/lib64 -lX11 -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/MUMPS_4.7.3/pathscale_O3/lib -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/MUMPS_4.7.3/pathscale_O3/lib -lcmumps -ldmumps -lsmumps -lzmumps -lpord -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SCALAPACK/pathscale_O3 -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SCALAPACK/pathscale_O3 -lscalapack -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/blacs-dev/pathscale_O3 -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/blacs-dev/pathscale_O3 -lblacs -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_DIST_2.0-Jan_5_2006/pathscale_O3 -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_DIST_2.0-Jan_5_2006/pathscale_O3 -lsuperlu_dist_2.0 /ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/spooles-2.2/pathscale_O3/MPI/src/spoolesMPI.a /ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/spooles-2.2/pathscale_O3/spooles.a -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_3.0-Jan_5_2006/pathscale_O3 -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_3.0-Jan_5_2006/pathscale_O3 -lsuperlu_3.0  -Wl,-rpath,/opt/packages/path-compat/acml/pathscale64/lib -L/opt/packages/path-compat/acml/pathscale64/lib -lacml -Wl,-rpath,/opt/packages/path-compat/acml/pathscale64/lib -L/opt/packages/path-compat/acml/pathscale64/lib -lacml -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -L/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -L/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -L/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -L/usr/lib/../lib64 -ldl -lpmpich -lmpich -lpthread -lrt -lpscrt -lgcc_eh -lpathfstart -lpathfortran -lmv -lmpath -lm -lm -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -lm -lm -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -L/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -L/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -L/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -L/usr/lib/../lib64 -ldl -lpmpich -lmpich -lpthread -lrt -lpscrt -lgcc_eh -lpathfstart -lpathfortran -lmv -lmpath -lm -lm -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -lm -lm -lm -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -L/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -L/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -L/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -L/usr/lib/../lib64 -ldl -lpmpich -lmpich -lpthread -lrt -lpscrt -lgcc_eh -lpathfstart -lpathfortran -lmv -lmpath -lm -lm -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -lm -lm  -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -L/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -L/opt/packages/pathscale3.0/lib/3.0 -ldl -lpmpich -lmpich -lpthread -lrt -lpscrt -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -L/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -L/usr/lib/../lib64 -lgcc_eh -ldl  
------------------------------------------
Deleting /localscratch/pbstmp.159913.l2cu33.ichec.ie
Creating /localscratch/pbstmp.159912.l2cu33.ichec.ie
Working directory is /ichec/work/staff/tstitt/SolverCode
 
 Running with  8  processes
 
 Matrix has order  4704  rows by  4704  columns
 
 Number of RHS is:  4704
 
 Worker Solve Time is:  6.86795616
 Master Solve Time is:  8.66668129
 Worker Solve Time is:  8.85465431
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./solver on a pathscale named h3cu06 with 8 processors, by tstitt Tue Nov 20 20:30:38 2007
Using Petsc Release Version 2.3.3, Patch 7, Fri Oct 26 14:21:35 CDT 2007 HG revision: 2e223033ba960114833e1f9713ab393ec78c056f

                         Max       Max/Min        Avg      Total 
 Worker Solve Time is:  10.0444736
 Worker Solve Time is:  13.47995
 Worker Solve Time is:  12.8490467
 Worker Solve Time is:  16.845438
 Worker Solve Time is:  11.9151878
Time (sec):           2.533e+01      1.00016   2.533e+01
Objects:              2.400e+01      1.00000   2.400e+01
Flops:                7.214e+09      1.89167   5.406e+09  4.325e+10
Flops/sec:            2.847e+08      1.89142   2.134e+08  1.707e+09
MPI Messages:         4.000e+00      1.33333   3.500e+00  2.800e+01
MPI Message Lengths:  6.336e+03      1.20824   1.676e+03  4.693e+04
MPI Reductions:       1.766e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.5333e+01 100.0%  4.3245e+10 100.0%  2.800e+01 100.0%  1.676e+03      100.0%  1.413e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   The code for various complex numbers numerical       #
      #   kernels uses C++, which generally is not well        #
      #   optimized.  For performance that is about 4-5 times  #
      #   faster, specify --with-fortran-kernels=generic       #
      #   when running config/configure.py.                    #
      #                                                        #
      ##########################################################




      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event                Count      Time (sec)     Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatSolve            4704 1.0 1.6348e+01 2.6 6.21e+08 1.8 0.0e+00 0.0e+00 0.0e+00 42 98  0  0  0  42 98  0  0  0  2590
MatLUFactorNum         1 1.0 1.6377e-01 4.7 1.46e+09 1.2 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  5479
MatILUFactorSym        1 1.0 4.3809e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 4.8079e-02101.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 4.6219e-02 1.2 0.00e+00 0.0 2.8e+01 1.7e+03 7.0e+00  0  0100100  0   0  0100100  0     0
MatGetRowIJ            1 1.0 5.0068e-06 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.0180e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              9408 1.0 3.2616e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin    4704 1.0 1.2538e+01 5.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+04 32  0  0  0100  32  0  0  0100     0
VecAssemblyEnd      4704 1.0 1.8426e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetup               2 1.0 4.7684e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve            4704 1.0 1.6674e+01 2.5 6.19e+08 1.8 0.0e+00 0.0e+00 5.0e+00 43100  0  0  0  43100  0  0  0  2594
PCSetUp                2 1.0 1.6838e-01 4.5 1.36e+09 1.2 0.0e+00 0.0e+00 5.0e+00  0  2  0  0  0   0  2  0  0  0  5329
PCSetUpOnBlocks     4704 1.0 1.8029e-01 3.7 1.14e+09 1.1 0.0e+00 0.0e+00 5.0e+00  0  2  0  0  0   0  2  0  0  0  4977
PCApply             4704 1.0 1.6482e+01 2.6 6.15e+08 1.8 0.0e+00 0.0e+00 0.0e+00 42 98  0  0  0  42 98  0  0  0  2569
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix     6              4   29267076     0
           Index Set     7              7      17312     0
                 Vec     6              6      41352     0
         Vec Scatter     1              1          0     0
       Krylov Solver     2              2          0     0
      Preconditioner     2              2        256     0
========================================================================================================================
Average time to get PetscTime(): 2.40803e-06
Average time for MPI_Barrier(): 0.00820498
Average time for zero size MPI_Send(): 0.000125885
OptionTable: -log_summary
OptionTable: -mat_type aijmumps
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16
Configure run at: Thu Nov 15 23:52:44 2007
Configure options: --with-cxx=mpiCC --with-cc=mpicc --with-mpi-dir=/usr/local/mpich2/path3.0/ --with-blas-lib=/opt/packages/path-compat/acml/pathscale64/lib/libacml.a --with-lapack-lib=/opt/packages/path-compat/acml/pathscale64/lib/libacml.a --with-timer=mpi --with-fc=mpif77 --download-mumps=1 --download-scalapack=1 --download-superlu_dist=1 --download-superlu=1 --with-shared=0 --CXXOPTFLAGS=-fast --FOPTFLAGS=-fast --COPTFLAGS=-fast --download-blacs=1 --with-scalar-type=complex --with-debugging=0 --download-spooles=1
-----------------------------------------
Libraries compiled on Thu Nov 15 23:52:50 GMT 2007 on l2cu28 
Machine characteristics: Linux l2cu28 2.6.5-7.287.3-smp_perfctr #3 SMP Wed Oct 17 21:27:48 BST 2007 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /ichec/work/staff/tstitt/petsc-2.3.3-p7
Using PETSc arch: pathscale_O3
-----------------------------------------
Using C compiler: mpicc -fPIC   
Using Fortran compiler: mpif77 -fPIC    
-----------------------------------------
Using include paths: -I/ichec/work/staff/tstitt/petsc-2.3.3-p7 -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/bmake/pathscale_O3 -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/include -I/usr/X11R6/include -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/MUMPS_4.7.3/pathscale_O3/include -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SCALAPACK/pathscale_O3/include -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/blacs-dev/pathscale_O3/include -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_DIST_2.0-Jan_5_2006/pathscale_O3/SRC -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/spooles-2.2/pathscale_O3/ -I/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_3.0-Jan_5_2006/pathscale_O3/SRC -I/usr/local/mpich2/path3.0/include    
------------------------------------------
Using C linker: mpicc -fPIC 
Using Fortran linker: mpif77 -fPIC  
Using libraries: -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/lib/pathscale_O3 -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/lib/pathscale_O3 -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc        -L/usr/X11R6/lib64 -lX11 -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/MUMPS_4.7.3/pathscale_O3/lib -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/MUMPS_4.7.3/pathscale_O3/lib -lcmumps -ldmumps -lsmumps -lzmumps -lpord -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SCALAPACK/pathscale_O3 -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SCALAPACK/pathscale_O3 -lscalapack -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/blacs-dev/pathscale_O3 -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/blacs-dev/pathscale_O3 -lblacs -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_DIST_2.0-Jan_5_2006/pathscale_O3 -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_DIST_2.0-Jan_5_2006/pathscale_O3 -lsuperlu_dist_2.0 /ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/spooles-2.2/pathscale_O3/MPI/src/spoolesMPI.a /ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/spooles-2.2/pathscale_O3/spooles.a -Wl,-rpath,/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_3.0-Jan_5_2006/pathscale_O3 -L/ichec/work/staff/tstitt/petsc-2.3.3-p7/externalpackages/SuperLU_3.0-Jan_5_2006/pathscale_O3 -lsuperlu_3.0  -Wl,-rpath,/opt/packages/path-compat/acml/pathscale64/lib -L/opt/packages/path-compat/acml/pathscale64/lib -lacml -Wl,-rpath,/opt/packages/path-compat/acml/pathscale64/lib -L/opt/packages/path-compat/acml/pathscale64/lib -lacml -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -L/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -L/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -L/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -L/usr/lib/../lib64 -ldl -lpmpich -lmpich -lpthread -lrt -lpscrt -lgcc_eh -lpathfstart -lpathfortran -lmv -lmpath -lm -lm -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -lm -lm -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -L/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -L/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -L/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -L/usr/lib/../lib64 -ldl -lpmpich -lmpich -lpthread -lrt -lpscrt -lgcc_eh -lpathfstart -lpathfortran -lmv -lmpath -lm -lm -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -lm -lm -lm -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -L/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -L/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -L/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -L/usr/lib/../lib64 -ldl -lpmpich -lmpich -lpthread -lrt -lpscrt -lgcc_eh -lpathfstart -lpathfortran -lmv -lmpath -lm -lm -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -lm -lm  -Wl,-rpath,/usr/local/mpich2/path3.0/lib64 -L/usr/local/mpich2/path3.0/lib64 -Wl,-rpath,/opt/packages/pathscale3.0/lib/3.0 -L/opt/packages/pathscale3.0/lib/3.0 -ldl -lpmpich -lmpich -lpthread -lrt -lpscrt -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib64 -Wl,-rpath,/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -Wl,-rpath,/lib/../lib64 -L/lib/../lib64 -Wl,-rpath,/usr/lib/../lib64 -L/usr/lib/../lib64 -lgcc_eh -ldl  
------------------------------------------
Deleting /localscratch/pbstmp.159912.l2cu33.ichec.ie