[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: profiling PETSc code



On 8/1/06, Matt Funk <mafunk@xxxxxxxx> wrote:
Hi,

well, now i do get  summary:
...
     #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecNorm              200 1.0 5.6217e-03 1.0 2.07e+08 1.0 0.0e+00 0.0e+00
1.0e+02  0 36  0  0  7   0 36  0  0 31   207
VecCopy              200 1.0 4.2303e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX              100 1.0 3.2036e-03 1.0 1.82e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0 18  0  0  0   0 18  0  0  0   182
MatMult              100 1.0 8.3530e-03 1.0 3.49e+07 1.0 0.0e+00 0.0e+00
0.0e+00  1  9  0  0  0   1  9  0  0  0    35
MatSolve             200 1.0 2.5591e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00
0.0e+00  2 18  0  0  0   2 18  0  0  0    23
MatSolveTranspos     100 1.0 2.1357e-02 1.0 1.36e+07 1.0 0.0e+00 0.0e+00
0.0e+00  2  9  0  0  0   2  9  0  0  0    14
MatLUFactorNum       100 1.0 4.6215e-02 1.0 6.30e+06 1.0 0.0e+00 0.0e+00
0.0e+00  3  9  0  0  0   3  9  0  0  0     6
MatILUFactorSym        1 1.0 4.4894e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  1     0
MatAssemblyBegin       1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       100 1.0 1.1220e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetOrdering         1 1.0 2.5296e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  1     0
KSPSetup             100 1.0 5.3692e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.4e+01  0  0  0  0  1   0  0  0  0  4     0
KSPSolve             100 1.0 9.0056e-02 1.0 3.23e+07 1.0 0.0e+00 0.0e+00
3.0e+02  7 91  0  0 21   7 91  0  0 93    32
PCSetUp              100 1.0 4.9087e-02 1.0 5.93e+06 1.0 0.0e+00 0.0e+00
4.0e+00  4  9  0  0  0   4  9  0  0  1     6
PCApply              300 1.0 4.9106e-02 1.0 1.78e+07 1.0 0.0e+00 0.0e+00
0.0e+00  4 27  0  0  0   4 27  0  0  0    18
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

           Index Set     3              3      35976     0
                 Vec   109            109    2458360     0
              Matrix     2              2      23304     0
       Krylov Solver     1              1          0     0
      Preconditioner     1              1        168     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)

...

am i using the push and pop calls in an manner they are not intended to be
used?

Not exactly. You need to register a stage first before pushing it.

 http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Profiling/PetscLogStageRegister.html

plus, how can i see what's going on with respect to why it takes so much
longer to solve the system in parallel than in serial without being able to
specify the stages (i.e single out the KSPSolve call)?

There are 100 calls to KSPSolve() which collectively take .1s. Your problem is most likely in matrix setup. I would bet that you have not preallocated the space correctly. Therefore, a malloc() is called every time you insert a value. You can check the number of mallocs using -info.

  Matt


mat





On Tuesday 01 August 2006 15:57, Matthew Knepley wrote:
> ke out your stage push/pop for the moment, and the log_summary
> call. Just run with -log_summary and send the output as a test.
>
> Thanks,
>
> Matt




--
"Failure has a thousand explanations. Success doesn't need one" -- Sir
Alec Guiness