[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Slow speed after changing from serial to parallel



Hi,

I was initially using LU and Hypre to solve my serial code. I switched to the default GMRES when I converted the parallel code. I've now redo the test using KSPBCGS and also Hypre BommerAMG. Seems like MatAssemblyBegin, VecAYPX, VecScatterEnd (in bold) are the problems. What should I be checking? Here's the results for 1 and 2 processor for each solver. Thank you so much!

*1 processor KSPBCGS *

************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************


---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./a.out on a atlas3-mp named atlas3-c45 with 1 processor, by g0306332 Wed Apr 16 08:32:21 2008
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b


                        Max       Max/Min        Avg      Total
Time (sec):           8.176e+01      1.00000   8.176e+01
Objects:              2.700e+01      1.00000   2.700e+01
Flops:                1.893e+10      1.00000   1.893e+10  1.893e+10
Flops/sec:            2.315e+08      1.00000   2.315e+08  2.315e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       3.743e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops


Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 8.1756e+01 100.0% 1.8925e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 3.743e+03 100.0%


------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops/sec: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


##########################################################
# #
# WARNING!!! #
# #
# This code was run without the PreLoadBegin() #
# macros. To get timing results we always recommend #
# preloading. otherwise timing numbers may be #
# meaningless. #
##########################################################





Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------


--- Event Stage 0: Main Stage

MatMult 1498 1.0 1.6548e+01 1.0 3.55e+08 1.0 0.0e+00 0.0e+00 0.0e+00 20 31 0 0 0 20 31 0 0 0 355
MatSolve 1500 1.0 3.2228e+01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 39 31 0 0 0 39 31 0 0 0 183
MatLUFactorNum 2 1.0 2.0642e-01 1.0 1.02e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 102
MatILUFactorSym 2 1.0 2.0250e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 1.7963e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 1.0 3.8147e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 2 1.0 2.6301e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 1.0190e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetup 2 1.0 2.8230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 6.7238e+01 1.0 2.81e+08 1.0 0.0e+00 0.0e+00 3.7e+03 82100 0 0100 82100 0 0100 281
PCSetUp 2 1.0 4.3527e-01 1.0 4.85e+07 1.0 0.0e+00 0.0e+00 6.0e+00 1 0 0 0 0 1 0 0 0 0 48
PCApply 1500 1.0 3.2232e+01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 39 31 0 0 0 39 31 0 0 0 183
VecDot 2984 1.0 5.3279e+00 1.0 4.84e+08 1.0 0.0e+00 0.0e+00 3.0e+03 7 14 0 0 80 7 14 0 0 80 484
VecNorm 754 1.0 1.1453e+00 1.0 5.74e+08 1.0 0.0e+00 0.0e+00 7.5e+02 1 3 0 0 20 1 3 0 0 20 574
VecCopy 2 1.0 3.2830e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 3 1.0 3.9389e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 2244 1.0 4.8304e+00 1.0 4.02e+08 1.0 0.0e+00 0.0e+00 0.0e+00 6 10 0 0 0 6 10 0 0 0 402
VecAYPX 752 1.0 1.5623e+00 1.0 4.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 419
VecWAXPY 1492 1.0 5.0827e+00 1.0 2.54e+08 1.0 0.0e+00 0.0e+00 0.0e+00 6 7 0 0 0 6 7 0 0 0 254
VecAssemblyBegin 2 1.0 2.6703e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 5.2452e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------


Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

Matrix 4 4 300369852 0
Krylov Solver 2 2 8 0
Preconditioner 2 2 336 0
Index Set 6 6 15554064 0
Vec 13 13 44937496 0
========================================================================================================================
Average time to get PetscTime(): 3.09944e-07
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Jan 8 22:22:08 2008
Configure options: --with-memcmp-ok --sizeof_char=1 --sizeof_void_p=8 --sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --sizeof_long_long=8 --sizeof_float=4 --sizeof_double=8 --bits_per_byte=8 --sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-vendor-compilers=intel --with-x=0 --with-hypre-dir=/home/enduser/g0306332/lib/hypre --with-debugging=0 --with-batch=1 --with-mpi-shared=0 --with-mpi-include=/usr/local/topspin/mpi/mpich/include --with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a --with-mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun --with-blas-lapack-dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
-----------------------------------------
Libraries compiled on Tue Jan 8 22:34:13 SGT 2008 on atlas3-c01


*2 processors KSPBCGS


*
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------


./a.out on a atlas3-mp named atlas3-c25 with 2 processors, by g0306332 Wed Apr 16 08:37:25 2008
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b


                        Max       Max/Min        Avg      Total
Time (sec):           3.795e+02      1.00000   3.795e+02
Objects:              3.800e+01      1.00000   3.800e+01
Flops:                8.592e+09      1.00000   8.592e+09  1.718e+10
Flops/sec:            2.264e+07      1.00000   2.264e+07  4.528e+07
MPI Messages:         1.335e+03      1.00000   1.335e+03  2.670e+03
MPI Message Lengths:  6.406e+06      1.00000   4.798e+03  1.281e+07
MPI Reductions:       1.678e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops


Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.7950e+02 100.0% 1.7185e+10 100.0% 2.670e+03 100.0% 4.798e+03 100.0% 3.357e+03 100.0%


------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops/sec: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------



########################################################## # # # WARNING!!! # # # # This code was run without the PreLoadBegin() # # macros. To get timing results we always recommend # # preloading. otherwise timing numbers may be # # meaningless. # ##########################################################


Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------


--- Event Stage 0: Main Stage

MatMult 1340 1.0 7.4356e+01 1.6 5.87e+07 1.6 2.7e+03 4.8e+03 0.0e+00 16 31100100 0 16 31100100 0 72
MatSolve 1342 1.0 4.3794e+01 1.2 7.08e+07 1.2 0.0e+00 0.0e+00 0.0e+00 11 31 0 0 0 11 31 0 0 0 123
MatLUFactorNum 2 1.0 2.5116e-01 1.0 7.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 153
MatILUFactorSym 2 1.0 2.3831e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
*MatAssemblyBegin 2 1.0 7.9380e-0116482.3 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0*
MatAssemblyEnd 2 1.0 2.4782e-01 1.0 0.00e+00 0.0 2.0e+00 2.4e+03 7.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 1.0 5.0068e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 2 1.0 1.8508e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 8.6530e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetup 3 1.0 1.9901e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 3.3575e+02 1.0 2.56e+07 1.0 2.7e+03 4.8e+03 3.3e+03 88100100100100 88100100100100 51
PCSetUp 3 1.0 5.0751e-01 1.0 3.79e+07 1.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 76
PCSetUpOnBlocks 1 1.0 4.4248e-02 1.0 4.39e+07 1.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 88
PCApply 1342 1.0 4.9832e+01 1.2 6.56e+07 1.2 0.0e+00 0.0e+00 0.0e+00 12 31 0 0 0 12 31 0 0 0 108
VecDot 2668 1.0 2.0710e+02 1.2 6.70e+06 1.2 0.0e+00 0.0e+00 2.7e+03 50 13 0 0 79 50 13 0 0 79 11
VecNorm 675 1.0 2.9565e+01 3.3 3.33e+07 3.3 0.0e+00 0.0e+00 6.7e+02 5 3 0 0 20 5 3 0 0 20 20
VecCopy 2 1.0 2.4400e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1338 1.0 5.9052e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 2007 1.0 2.2173e+01 2.6 1.03e+08 2.6 0.0e+00 0.0e+00 0.0e+00 4 10 0 0 0 4 10 0 0 0 79
*VecAYPX 673 1.0 2.8062e+00 4.0 4.29e+08 4.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 213*
VecWAXPY 1334 1.0 4.8052e+00 2.4 2.84e+08 2.4 0.0e+00 0.0e+00 0.0e+00 1 7 0 0 0 1 7 0 0 0 240
VecAssemblyBegin 2 1.0 1.4091e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
*VecScatterBegin 1334 1.0 1.1666e-01 5.9 0.00e+00 0.0 2.7e+03 4.8e+03 0.0e+00 0 0100100 0 0 0100100 0 0*
VecScatterEnd 1334 1.0 5.2569e+01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Matrix 6 6 283964900 0
Krylov Solver 3 3 8 0
Preconditioner 3 3 424 0
Index Set 8 8 12965152 0
Vec 17 17 34577080 0
Vec Scatter 1 1 0 0
========================================================================================================================
Average time to get PetscTime(): 8.10623e-07 Average time for MPI_Barrier(): 5.72205e-07 Average time for zero size MPI_Send(): 1.90735e-06 OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Jan 8 22:22:08 2008
@
@


*1 processor Hypre

* ************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************


---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./a.out on a atlas3-mp named atlas3-c45 with 1 processor, by g0306332 Wed Apr 16 08:45:38 2008
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b


                        Max       Max/Min        Avg      Total
Time (sec):           2.059e+01      1.00000   2.059e+01
Objects:              3.400e+01      1.00000   3.400e+01
Flops:                3.151e+08      1.00000   3.151e+08  3.151e+08
Flops/sec:            1.530e+07      1.00000   1.530e+07  1.530e+07
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       2.400e+01      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops


Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.0590e+01 100.0% 3.1512e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 2.400e+01 100.0%


------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops/sec: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


##########################################################
# #
# WARNING!!! #
# #
# This code was run without the PreLoadBegin() #
# #
# This code was run without the PreLoadBegin() #
# macros. To get timing results we always recommend #
# preloading. otherwise timing numbers may be #
# meaningless. #
##########################################################



Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------


--- Event Stage 0: Main Stage

MatMult 12 1.0 2.6237e-01 1.0 4.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 35 0 0 0 1 35 0 0 0 424
MatSolve 7 1.0 4.5932e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 33 0 0 0 2 33 0 0 0 223
MatLUFactorNum 1 1.0 1.2635e-01 1.0 1.36e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 136
MatILUFactorSym 1 1.0 1.3007e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 1 0 0 0 4 1 0 0 0 4 0
MatConvert 1 1.0 4.1277e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatAssemblyBegin 2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 1.3946e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatGetRow 432000 1.0 8.4685e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 1.0 3.0994e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.6376e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 8 0 0 0 0 8 0
MatZeroEntries 2 1.0 8.2422e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 6 1.0 1.0955e-01 1.0 3.31e+08 1.0 0.0e+00 0.0e+00 6.0e+00 1 12 0 0 25 1 12 0 0 25 331
KSPSetup 2 1.0 2.5418e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 5.9363e+00 1.0 5.31e+07 1.0 0.0e+00 0.0e+00 1.8e+01 29100 0 0 75 29100 0 0 75 53
PCSetUp 2 1.0 1.5691e+00 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 5.0e+00 8 5 0 0 21 8 5 0 0 21 11
PCApply 14 1.0 3.7548e+00 1.0 2.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00 18 33 0 0 0 18 33 0 0 0 27
VecMDot 6 1.0 7.7139e-02 1.0 2.35e+08 1.0 0.0e+00 0.0e+00 6.0e+00 0 6 0 0 25 0 6 0 0 25 235
VecNorm 14 1.0 9.9192e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 7.0e+00 0 6 0 0 29 0 6 0 0 29 183
VecScale 7 1.0 5.4052e-03 1.0 5.59e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 559
VecCopy 1 1.0 2.0301e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 9 1.0 1.1883e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 7 1.0 2.8702e-02 1.0 3.91e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 391
VecAYPX 6 1.0 2.8528e-02 1.0 3.63e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 363
VecMAXPY 7 1.0 4.1699e-02 1.0 5.59e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 559
VecAssemblyBegin 2 1.0 2.3842e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 25 0 0 0 0 25 0
VecAssemblyEnd 2 1.0 4.0531e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 7 1.0 1.3958e-02 1.0 6.50e+08 1.0 0.0e+00 0.0e+00 7.0e+00 0 3 0 0 29 0 3 0 0 29 650
------------------------------------------------------------------------------------------------------------------------


Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

             Matrix     3              3  267569524     0
      Krylov Solver     2              2      17224     0
     Preconditioner     2              2        440     0
          Index Set     3              3   10369032     0
                Vec    24             24   82961752     0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
OptionTable: -log_summary
Compiled without FORTRAN kernels


*2 processors Hypre*
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************


---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./a.out on a atlas3-mp named atlas3-c48 with 2 processors, by g0306332 Wed Apr 16 08:46:56 2008 Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b

                        Max       Max/Min        Avg      Total
Time (sec):           9.614e+01      1.02903   9.478e+01
Objects:              4.100e+01      1.00000   4.100e+01
Flops:                2.778e+08      1.00000   2.778e+08  5.555e+08
Flops/sec:            2.973e+06      1.02903   2.931e+06  5.862e+06
MPI Messages:         7.000e+00      1.00000   7.000e+00  1.400e+01
MPI Message Lengths:  3.120e+04      1.00000   4.457e+03  6.240e+04
MPI Reductions:       1.650e+01      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops


Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 9.4784e+01 100.0% 5.5553e+08 100.0% 1.400e+01 100.0% 4.457e+03 100.0% 3.300e+01 100.0%


------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops/sec: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------



########################################################## # # # WARNING!!! # # # # This code was run without the PreLoadBegin() # # macros. To get timing results we always recommend # # preloading. otherwise timing numbers may be # # meaningless. # ##########################################################


Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s


--- Event Stage 0: Main Stage

MatMult 12 1.0 4.5412e-01 2.0 4.34e+08 2.0 1.2e+01 4.8e+03 0.0e+00 0 36 86 92 0 0 36 86 92 0 438
MatSolve 7 1.0 5.0386e-01 1.1 2.28e+08 1.1 0.0e+00 0.0e+00 0.0e+00 1 37 0 0 0 1 37 0 0 0 407
MatLUFactorNum 1 1.0 9.5120e-01 1.6 2.98e+07 1.6 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 36
MatILUFactorSym 1 1.0 1.1285e+01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 9 0 0 0 3 9 0 0 0 3 0
MatConvert 1 1.0 6.2023e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
*MatAssemblyBegin 2 1.0 3.1003e+01246.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 16 0 0 0 6 16 0 0 0 6 0*
MatAssemblyEnd 2 1.0 2.2413e+00 1.9 0.00e+00 0.0 2.0e+00 2.4e+03 7.0e+00 2 0 14 8 21 2 0 14 8 21 0
MatGetRow 216000 1.0 9.2643e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 3 1.0 5.9605e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 2.4464e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 6 0 0 0 0 6 0
MatZeroEntries 2 1.0 6.1072e+00 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0
KSPGMRESOrthog 6 1.0 4.4529e-02 1.3 5.26e+08 1.3 0.0e+00 0.0e+00 6.0e+00 0 7 0 0 18 0 7 0 0 18 815
KSPSetup 2 1.0 1.8315e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
KSPSolve 2 1.0 3.0572e+01 1.1 9.64e+06 1.1 1.2e+01 4.8e+03 1.8e+01 31100 86 92 55 31100 86 92 55 18
PCSetUp 2 1.0 2.0424e+01 1.3 1.07e+06 1.3 0.0e+00 0.0e+00 5.0e+00 19 6 0 0 15 19 6 0 0 15 2
PCApply 14 1.0 2.9443e+00 1.0 3.56e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 37 0 0 0 3 37 0 0 0 70
VecMDot 6 1.0 2.7561e-02 1.6 5.15e+08 1.6 0.0e+00 0.0e+00 6.0e+00 0 3 0 0 18 0 3 0 0 18 658
*VecNorm 14 1.0 1.4223e+00 5.1 5.45e+07 5.1 0.0e+00 0.0e+00 7.0e+00 1 5 0 0 21 1 5 0 0 21 21*
VecScale 7 1.0 1.8604e-02 1.0 8.25e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 163
VecCopy 1 1.0 3.0069e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 9 1.0 3.2693e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 7 1.0 3.0581e-02 1.1 3.98e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 706
*VecAYPX 6 1.0 4.4344e+00147.6 3.45e+08147.6 0.0e+00 0.0e+00 0.0e+00 2 4 0 0 0 2 4 0 0 0 5*
VecMAXPY 7 1.0 2.1892e-02 1.0 5.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 1066
VecAssemblyBegin 2 1.0 9.2602e-0412.5 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 18 0 0 0 0 18 0
VecAssemblyEnd 2 1.0 7.8678e-06 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 6 1.0 9.3222e-05 1.1 0.00e+00 0.0 1.2e+01 4.8e+03 0.0e+00 0 0 86 92 0 0 0 86 92 0 0
*VecScatterEnd 6 1.0 1.9959e-011404.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0*
VecNormalize 7 1.0 2.3088e-02 1.0 1.98e+08 1.0 0.0e+00 0.0e+00 7.0e+00 0 2 0 0 21 0 2 0 0 21 393
------------------------------------------------------------------------------------------------------------------------


Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

Matrix 5 5 267571932 0
Krylov Solver 2 2 17224 0
Preconditioner 2 2 440 0
Index Set 5 5 10372120 0
Vec 26 26 53592184 0
Vec Scatter 1 1 0 0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 8.10623e-07
Average time for zero size MPI_Send(): 1.43051e-06
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Tue Jan 8 22:22:08 2008



Matthew Knepley wrote:
The convergence here is jsut horrendous. Have you tried using LU to check
your implementation? All the time is in the solve right now. I would first
try a direct method (at least on a small problem) and then try to understand
the convergence behavior. MUMPS can actually scale very well for big problems.

  Matt