|
Hi, I've email my school super computing staff and they told me that the queue which I'm using is one meant for testing, hence, it's handling of work load is not good. I've sent my job to another queue and it's run on 4 processors. It's my own code because there seems to be something wrong with the server displaying the summary when using -log_summary with ex2f.F. I'm trying it again. Anyway comparing just kspsolve between the two, the speedup is about 2.7. However, I noticed that for the 4 processors one, its MatAssemblyBegin is 1.5158e+02, which is more than KSPSolve's 4.7041e+00. So is MatAssemblyBegin's time included in KSPSolve? If not, does it mean that there's something wrong about my MatAssemblyBegin? Thank you For 1 processor: ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a atlas3 named atlas3-c28 with 1 processor, by g0306332 Fri Apr 18 08:46:11 2008 Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b Max Max/Min Avg Total Time (sec): 1.322e+02 1.00000 1.322e+02 Objects: 2.200e+01 1.00000 2.200e+01 Flops: 2.242e+08 1.00000 2.242e+08 2.242e+08 Flops/sec: 1.696e+06 1.00000 1.696e+06 1.696e+06 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 2.100e+01 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.3217e+02 100.0% 2.2415e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 2.100e+01 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops/sec: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was run without the PreLoadBegin() # # macros. To get timing results we always recommend # # preloading. otherwise timing numbers may be # # meaningless. # ########################################################## Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 6 1.0 1.8572e-01 1.0 3.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 31 0 0 0 0 31 0 0 0 377 MatConvert 1 1.0 1.1636e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatAssemblyBegin 1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 8.8531e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRow 1296000 1.0 2.6576e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 1 1.0 4.0531e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 1 1.0 4.4700e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 6 1.0 2.1104e-01 1.0 5.16e+08 1.0 0.0e+00 0.0e+00 6.0e+00 0 49 0 0 29 0 49 0 0 29 516 KSPSetup 1 1.0 6.5601e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 1.2883e+01 1.0 1.74e+07 1.0 0.0e+00 0.0e+00 1.5e+01 10100 0 0 71 10100 0 0 71 17 PCSetUp 1 1.0 4.4342e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 3 0 0 0 10 3 0 0 0 10 0 PCApply 7 1.0 7.7337e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 6 0 0 0 0 0 VecMDot 6 1.0 9.8586e-02 1.0 5.52e+08 1.0 0.0e+00 0.0e+00 6.0e+00 0 24 0 0 29 0 24 0 0 29 552 VecNorm 7 1.0 6.9757e-02 1.0 2.60e+08 1.0 0.0e+00 0.0e+00 7.0e+00 0 8 0 0 33 0 8 0 0 33 260 VecScale 7 1.0 2.9803e-02 1.0 3.04e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 304 VecCopy 1 1.0 6.1009e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 9 1.0 3.1438e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 1 1.0 7.5161e-03 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 345 VecMAXPY 7 1.0 1.4444e-01 1.0 4.85e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 31 0 0 0 0 31 0 0 0 485 VecAssemblyBegin 2 1.0 4.2915e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 29 0 0 0 0 29 0 VecAssemblyEnd 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 7 1.0 9.9603e-02 1.0 2.73e+08 1.0 0.0e+00 0.0e+00 7.0e+00 0 12 0 0 33 0 12 0 0 33 273 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. --- Event Stage 0: Main Stage Matrix 1 1 98496004 0 Krylov Solver 1 1 17216 0 Preconditioner 1 1 272 0 Vec 19 19 186638392 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 OptionTable: -log_summary Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Jan 9 14:33:02 2008 Configure options: --with-cc=icc --with-fc=ifort --with-x=0 --with-blas-lapack-dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared --with-mpi-dir=/lsftmp/g0306332/mpich2/ --with-debugging=0 --with-hypre-dir=/home/enduser/g0306332/lib/hypre_shared ----------------------------------------- Libraries compiled on Wed Jan 9 14:33:36 SGT 2008 on atlas3-c01 Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /home/enduser/g0306332/petsc-2.3.3-p8 Using PETSc arch: atlas3 ----------------------------------------- Using C compiler: icc -fPIC -O for 4 processors ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./a.out on a atlas3-mp named atlas3-c23 with 4 processors, by g0306332 Fri Apr 18 08:22:11 2008 Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b Max Max/Min Avg Total 0.000000000000000E+000 58.1071298622710 0.000000000000000E+000 58.1071298622710 0.000000000000000E+000 58.1071298622710 0.000000000000000E+000 58.1071298622710 Time (sec): 3.308e+02 1.00177 3.305e+02 Objects: 2.900e+01 1.00000 2.900e+01 Flops: 5.605e+07 1.00026 5.604e+07 2.242e+08 Flops/sec: 1.697e+05 1.00201 1.695e+05 6.782e+05 MPI Messages: 1.400e+01 2.00000 1.050e+01 4.200e+01 MPI Message Lengths: 1.248e+05 2.00000 8.914e+03 3.744e+05 MPI Reductions: 7.500e+00 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.3051e+02 100.0% 2.2415e+08 100.0% 4.200e+01 100.0% 8.914e+03 100.0% 3.000e+01 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops/sec: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was run without the PreLoadBegin() # # macros. To get timing results we always recommend # # preloading. otherwise timing numbers may be # # meaningless. # # preloading. otherwise timing numbers may be # # meaningless. # ########################################################## Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 6 1.0 8.2640e-02 1.6 3.37e+08 1.6 3.6e+01 9.6e+03 0.0e+00 0 31 86 92 0 0 31 86 92 0 846 MatConvert 1 1.0 2.1472e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 1 1.0 1.5158e+022254.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 22 0 0 0 7 22 0 0 0 7 0 MatAssemblyEnd 1 1.0 1.5766e-01 1.1 0.00e+00 0.0 6.0e+00 4.8e+03 7.0e+00 0 0 14 8 23 0 0 14 8 23 0 MatGetRow 324000 1.0 8.9608e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 2 1.0 5.9605e-06 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 1 1.0 5.8902e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 6 1.0 1.1247e-01 1.7 4.11e+08 1.7 0.0e+00 0.0e+00 6.0e+00 0 49 0 0 20 0 49 0 0 20 968 KSPSetup 1 1.0 1.5483e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 4.7041e+00 1.0 1.19e+07 1.0 3.6e+01 9.6e+03 1.5e+01 1100 86 92 50 1100 86 92 50 48 PCSetUp 1 1.0 1.5953e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 7 0 0 0 0 7 0 PCApply 7 1.0 2.6580e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecMDot 6 1.0 7.3443e-02 2.2 4.13e+08 2.2 0.0e+00 0.0e+00 6.0e+00 0 24 0 0 20 0 24 0 0 20 741 VecNorm 7 1.0 2.5193e-01 1.1 1.94e+07 1.1 0.0e+00 0.0e+00 7.0e+00 0 8 0 0 23 0 8 0 0 23 72 VecScale 7 1.0 6.6319e-03 2.8 9.64e+08 2.8 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 1368 VecCopy 1 1.0 2.3100e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 9 1.0 1.4173e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 1 1.0 2.9502e-03 1.7 3.72e+08 1.7 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 879 VecMAXPY 7 1.0 4.9046e-02 1.4 5.09e+08 1.4 0.0e+00 0.0e+00 0.0e+00 0 31 0 0 0 0 31 0 0 0 1427 VecAssemblyBegin 2 1.0 4.3297e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 20 0 0 0 0 20 0 VecAssemblyEnd 2 1.0 5.2452e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 6 1.0 6.9666e-04 6.3 0.00e+00 0.0 3.6e+01 9.6e+03 0.0e+00 0 0 86 92 0 0 0 86 92 0 0 VecScatterEnd 6 1.0 1.4806e-02102.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 7 1.0 2.5431e-01 1.1 2.86e+07 1.1 0.0e+00 0.0e+00 7.0e+00 0 12 0 0 23 0 12 0 0 23 107 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. --- Event Stage 0: Main Stage Matrix 3 3 49252812 0 Krylov Solver 1 1 17216 0 Preconditioner 1 1 272 0 Index Set 2 2 5488 0 Vec 21 21 49273624 0 Vec Scatter 1 1 0 0 ======================================================================================================================== Average time to get PetscTime(): 1.90735e-07 Average time for MPI_Barrier(): 5.62668e-06 Average time for zero size MPI_Send(): 6.73532e-06 OptionTable: -log_summary Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Tue Jan 8 22:22:08 2008 |