[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PETSc CG solver uses more iterations than other CG solver



Thank you for your reply.

One boundary cell is defined to have constant pressure, since that makes the equation system have a unique solution. I tried your command, and it lowered the number of iterations for most of the time steps, but for some it reached the maximum number of iterations (10000) without converging.

I also tried making all the boundaries von Neumann and using your command. That made the number of iterations more constant, instead of varying between 700 and 2000, it stayed on around 1200. But it actually increased the average number of iterations somewhat. Still far from the performance of the other solver.
I've also checked the convergence criteria, and it is the same for both solvers.



Siterer Lisandro Dalcin <dalcinl@xxxxxxxxx>:

On 3/9/07, Knut Erik Teigen <knutert@xxxxxxxxxxxx> wrote:
To solve the Navier-Stokes equations, I use an explicit Runge-Kutta
method with Chorin's projection method,
so a Poisson equation with von Neumann boundary conditions for the
pressure has to be solved at every time-step.

All boundary conditions are Neumann type? In that case, please try to run your program with the following command line option:

-ksp_constant_null_space

and let me know if this corrected your problem.


The equation system is
positive definite, so I use the CG solver with the ICC preconditioner.
The problem is that the PETSc solver seems to need a lot more iterations
to reach the solution than another CG solver I'm using. On a small test
problem(a rising bubble) with a 60x40 grid, the PETSc solver needs over
1000 iterations on average, while the other solver needs less than 100.
I am using KSPSetInitialGuessNonzero, without this the number of
iterations is even higher.
I have also tried applying PETSc to a similar problem, solving the
Poisson equation with von Neumann boundaries and a forcing function of
f=sin(pi * x)+sin(pi *y). For this problem, the number of iterations is
almost exactly the same for PETSc and the other solver.

Does anyone know what the problem might be? Any help is greatly
appreciated. I've included the -ksp_view of one of the time steps
and the -log_summary below.

Regards,
Knut Erik Teigen
MSc student
Norwegian University of Science and Technology

Output from -ksp_view:

KSP Object:
 type: cg
 maximum iterations=10000
 tolerances:  relative=1e-06, absolute=1e-50, divergence=10000
 left preconditioning
PC Object:
 type: icc
   ICC: 0 levels of fill
   ICC: factor fill ratio allocated 1
   ICC: factor fill ratio needed 0.601695
        Factored matrix follows
       Matrix Object:
         type=seqsbaij, rows=2400, cols=2400
         total: nonzeros=7100, allocated nonzeros=7100
             block size is 1
 linear system matrix = precond matrix:
 Matrix Object:
   type=seqaij, rows=2400, cols=2400
   total: nonzeros=11800, allocated nonzeros=12000
     not using I-node routines
Poisson converged after 1403 iterations

Output from -log_summary

---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------

./run on a gcc-ifc-d named iept0415 with 1 processor, by knutert Fri Mar
9 17:06:05 2007
Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST 2007
HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80

                        Max       Max/Min        Avg      Total
Time (sec):           5.425e+02      1.00000   5.425e+02
Objects:              7.000e+02      1.00000   7.000e+02
Flops:                6.744e+10      1.00000   6.744e+10  6.744e+10
Flops/sec:            1.243e+08      1.00000   1.243e+08  1.243e+08
Memory:               4.881e+05      1.00000              4.881e+05
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       1.390e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                           e.g., VecAXPY() for real vectors of length N
--> 2N flops
                           and VecAXPY() for complex vectors of length
N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
---  -- Message Lengths --  -- Reductions --
                       Avg     %Total     Avg     %Total   counts   %
Total     Avg         %Total   counts   %Total
0:      Main Stage: 5.4246e+02 100.0%  6.7437e+10 100.0%  0.000e+00
0.0%  0.000e+00        0.0%  1.390e+03 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
  Count: number of times phase was executed
  Time and Flops/sec: Max - maximum over all processors
                      Ratio - ratio of maximum to minimum over all
processors
  Mess: number of messages sent
  Avg. len: average message length
  Reduct: number of global reductions
  Global: entire computation
  Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
     %T - percent time in this phase         %F - percent flops in this
phase
     %M - percent messages in this phase     %L - percent message
lengths in this phase
     %R - percent reductions in this phase
  Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------


########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run config/configure.py # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ##########################################################




########################################################## # # # WARNING!!! # # # # This code was run without the PreLoadBegin() # # macros. To get timing results we always recommend # # preloading. otherwise timing numbers may be # # meaningless. # ##########################################################


Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot           1668318 1.0 1.9186e+01 1.0 4.17e+08 1.0 0.0e+00 0.0e+00
0.0e+00  4 12  0  0  0   4 12  0  0  0   417
VecNorm           835191 1.0 1.0935e+01 1.0 3.67e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2  6  0  0  0   2  6  0  0  0   367
VecCopy              688 1.0 1.6126e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY          1667630 1.0 2.4141e+01 1.0 3.32e+08 1.0 0.0e+00 0.0e+00
0.0e+00  4 12  0  0  0   4 12  0  0  0   332
VecAYPX           833815 1.0 1.9062e+01 1.0 2.10e+08 1.0 0.0e+00 0.0e+00
0.0e+00  4  6  0  0  0   4  6  0  0  0   210
VecAssemblyBegin     688 1.0 8.5354e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd       688 1.0 7.8177e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult           834503 1.0 1.5044e+02 1.0 1.18e+08 1.0 0.0e+00 0.0e+00
0.0e+00 28 26  0  0  0  28 26  0  0  0   118
MatSolve          835191 1.0 1.8130e+02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00
0.0e+00 33 38  0  0  0  33 38  0  0  0   142
MatCholFctrNum         1 1.0 1.1630e-03 1.0 2.06e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     2
MatICCFactorSym        1 1.0 2.9588e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin     688 1.0 2.6343e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       688 1.0 1.0964e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.6798e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetup               1 1.0 2.7585e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             688 1.0 4.2809e+02 1.0 1.58e+08 1.0 0.0e+00 0.0e+00
1.4e+03 79100  0  0100  79100  0  0100   158
PCSetUp                1 1.0 1.7900e-03 1.0 1.34e+06 1.0 0.0e+00 0.0e+00
4.0e+00  0  0  0  0  0   0  0  0  0  0     1
PCApply           835191 1.0 1.8864e+02 1.0 1.36e+08 1.0 0.0e+00 0.0e+00
0.0e+00 35 38  0  0  0  35 38  0  0  0   136
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants'
Mem.

--- Event Stage 0: Main Stage

Index Set 2 2 19640 0
Vec 694 693 299376 0
Matrix 2 2 56400 0
Krylov Solver 1 1 36 0
Preconditioner 1 1 108 0
========================================================================================================================
Average time to get PetscTime(): 2.86102e-07
OptionTable: -ksp_type cg
OptionTable: -log_summary -ksp_view
OptionTable: -pc_type icc
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
sizeof(PetscScalar) 8
Configure run at: Thu Mar 8 11:54:22 2007
Configure options: --with-cc=gcc --with-fc=ifort
--download-f-blas-lapack=1 --download-mpich=1 --with-debugging=1
--with-shared=0
-----------------------------------------
Libraries compiled on Thu Mar 8 12:08:22 CET 2007 on iept0415
Machine characteristics: Linux iept0415 2.6.16.21-0.8-default #1 Mon Jul
3 18:25:39 UTC 2006 i686 i686 i386 GNU/Linux
Using PETSc directory: /opt/petsc-2.3.2-p8
Using PETSc arch: gcc-ifc-debug
-----------------------------------------
Using C compiler: gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing
-g3
Using Fortran compiler: ifort -fPIC -g
-----------------------------------------
Using include paths: -I/opt/petsc-2.3.2-p8
-I/opt/petsc-2.3.2-p8/bmake/gcc-ifc-debug -I/opt/petsc-2.3.2-p8/include
-I/opt/petsc-2.3.2-p8/externalpackages/mpich2-1.0.4p1/gcc-ifc-debug/include -I/usr/X11R6/include
------------------------------------------
Using C linker: gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -g3
Using Fortran linker: ifort -fPIC -g
Using libraries: -Wl,-rpath,/opt/petsc-2.3.2-p8/lib/gcc-ifc-debug
-L/opt/petsc-2.3.2-p8/lib/gcc-ifc-debug -lpetscts -lpetscsnes -lpetscksp
-lpetscdm -lpetscmat -lpetscvec -lpetsc
-Wl,-rpath,/opt/petsc-2.3.2-p8/externalpackages/mpich2-1.0.4p1/gcc-ifc-debug/lib -L/opt/petsc-2.3.2-p8/externalpackages/mpich2-1.0.4p1/gcc-ifc-debug/lib -lmpich -lnsl -lrt -L/usr/X11R6/lib -lX11 -Wl,-rpath,/opt/petsc-2.3.2-p8/externalpackages/fblaslapack/gcc-ifc-debug -L/opt/petsc-2.3.2-p8/externalpackages/fblaslapack/gcc-ifc-debug -lflapack -Wl,-rpath,/opt/petsc-2.3.2-p8/externalpackages/fblaslapack/gcc-ifc-debug -L/opt/petsc-2.3.2-p8/externalpackages/fblaslapack/gcc-ifc-debug -lfblas -lm -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0 -L/usr/lib/gcc/i586-suse-linux/4.1.0 -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0/../../../../i586-suse-linux/lib -L/usr/lib/gcc/i586-suse-linux/4.1.0/../../../../i586-suse-linux/lib -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0/../../.. -L/usr/lib/gcc/i586-suse-linux/4.1.0/../../.. -ldl -lgcc_s -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0 -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0/../../../../i586-suse-linux/lib -Wl,-rpath,/usr/lib/gcc/!
i5!
86-suse-linux/4.1.0/../../.. -Wl,-rpath,/opt/intel/fc/9.1.036/lib -L/opt/intel/fc/9.1.036/lib -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0/ -L/usr/lib/gcc/i586-suse-linux/4.1.0/ -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0/../../../ -L/usr/lib/gcc/i586-suse-linux/4.1.0/../../../ -lifport -lifcore -limf -lm -lipgo -lirc -lirc_s -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0 -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0/../../../../i586-suse-linux/lib -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0/../../.. -lm -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0 -L/usr/lib/gcc/i586-suse-linux/4.1.0 -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0/../../../../i586-suse-linux/lib -L/usr/lib/gcc/i586-suse-linux/4.1.0/../../../../i586-suse-linux/lib -Wl,-rpath,/usr/lib/gcc/i586-suse-linux/4.1.0/../../.. -L/usr/lib/gcc/i586-suse-linux/4.1.0/../../.. -ldl -lgcc_s -ldl






--
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594