Hi,
I have converted the poisson eqn part of the CFD code to parallel. The grid
size tested is 600x720. For the momentum eqn, I used another serial linear
solver (nspcg) to prevent mixing of results. Here's the output summary:
--- Event Stage 0: Main Stage
MatMult 8776 1.0 1.5701e+02 2.2 2.43e+08 2.2 1.8e+04 4.8e+03
0.0e+00 10 11100100 0 10 11100100 0 217
MatSolve 8777 1.0 2.8379e+02 2.9 1.73e+08 2.9 0.0e+00 0.0e+00
0.0e+00 17 11 0 0 0 17 11 0 0 0 120
MatLUFactorNum 1 1.0 2.7618e-02 1.2 8.68e+07 1.2 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 140
MatILUFactorSym 1 1.0 2.4259e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
*MatAssemblyBegin 1 1.0 5.6334e+01853005.4 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00 3 0 0 0 0 3 0 0 0 0 0*
MatAssemblyEnd 1 1.0 4.7958e-02 1.0 0.00e+00 0.0 2.0e+00 2.4e+03
7.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 3.0994e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 3.8640e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 1 1.0 1.8353e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 8493 1.0 6.2636e+02 1.3 2.32e+08 1.3 0.0e+00 0.0e+00
8.5e+03 50 72 0 0 49 50 72 0 0 49 363
KSPSetup 2 1.0 1.0490e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 9.9177e+02 1.0 1.59e+08 1.0 1.8e+04 4.8e+03
1.7e+04 89100100100100 89100100100100 317
PCSetUp 2 1.0 5.5893e-02 1.2 4.02e+07 1.2 0.0e+00 0.0e+00
3.0e+00 0 0 0 0 0 0 0 0 0 0 69
PCSetUpOnBlocks 1 1.0 5.5777e-02 1.2 4.03e+07 1.2 0.0e+00 0.0e+00
3.0e+00 0 0 0 0 0 0 0 0 0 0 69
PCApply 8777 1.0 2.9987e+02 2.9 1.63e+08 2.9 0.0e+00 0.0e+00
0.0e+00 18 11 0 0 0 18 11 0 0 0 114
VecMDot 8493 1.0 5.3381e+02 2.2 2.36e+08 2.2 0.0e+00 0.0e+00
8.5e+03 35 36 0 0 49 35 36 0 0 49 213
*VecNorm 8777 1.0 1.8237e+0210.2 2.13e+0810.2 0.0e+00 0.0e+00
8.8e+03 9 2 0 0 51 9 2 0 0 51 42*
*VecScale 8777 1.0 5.9594e+00 4.7 1.49e+09 4.7 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 636*
VecCopy 284 1.0 4.2563e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 9062 1.0 1.5833e+01 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 567 1.0 1.4142e+00 2.8 4.90e+08 2.8 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 346
VecMAXPY 8777 1.0 2.6692e+02 2.7 6.15e+08 2.7 0.0e+00 0.0e+00
0.0e+00 16 38 0 0 0 16 38 0 0 0 453
VecAssemblyBegin 2 1.0 1.6093e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 4.7684e-06 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
*VecScatterBegin 8776 1.0 6.6898e-01 6.7 0.00e+00 0.0 1.8e+04 4.8e+03
0.0e+00 0 0100100 0 0 0100100 0 0*
*VecScatterEnd 8776 1.0 1.7747e+0130.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0*
*VecNormalize 8777 1.0 1.8366e+02 7.7 2.39e+08 7.7 0.0e+00 0.0e+00
8.8e+03 9 4 0 0 51 9 4 0 0 51 62*
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Matrix 4 4 49227380 0
Krylov Solver 2 2 17216 0
Preconditioner 2 2 256 0
Index Set 5 5 2596120 0
Vec 40 40 62243224 0
Vec Scatter 1 1 0 0
========================================================================================================================
Average time to get PetscTime(): 4.05312e-07 Average time
for MPI_Barrier(): 7.62939e-07
Average time for zero size MPI_Send(): 2.02656e-06
OptionTable: -log_summary
The PETSc manual states that ratio should be close to 1. There's quite a
few *(in bold)* which are >1 and MatAssemblyBegin seems to be very big. So
what could be the cause?
I wonder if it has to do the way I insert the matrix. My steps are:
(cartesian grids, i loop faster than j, fortran)
For matrix A and rhs
Insert left extreme cells values belonging to myid
if (myid==0) then
insert corner cells values
insert south cells values
insert internal cells values
else if (myid==num_procs-1) then
insert corner cells values
insert north cells values
insert internal cells values
else
insert internal cells values
end if
Insert right extreme cells values belonging to myid
All these values are entered into a big_A(size_x*size_y,5) matrix. int_A
stores the position of the values. I then do
call MatZeroEntries(A_mat,ierr)
do k=ksta_p+1,kend_p !for cells belonging to myid
do kk=1,5
II=k-1
JJ=int_A(k,kk)-1
call MatSetValues(A_mat,1,II,1,JJ,big_A(k,kk),ADD_VALUES,ierr)
end do
end do
call MatAssemblyBegin(A_mat,MAT_FINAL_ASSEMBLY,ierr)
call MatAssemblyEnd(A_mat,MAT_FINAL_ASSEMBLY,ierr)
I wonder if the problem lies here.I used the big_A matrix because I was
migrating from an old linear solver. Lastly, I was told to widen my window
to 120 characters. May I know how do I do it?
Thank you very much.
Matthew Knepley wrote:
On Mon, Apr 14, 2008 at 8:43 AM, Ben Tay <zonexo@xxxxxxxxx> wrote:
Hi Matthew,
I think you've misunderstood what I meant. What I'm trying to say is
initially I've got a serial code. I tried to convert to a parallel one.
Then
I tested it and it was pretty slow. Due to some work requirement, I need
to
go back to make some changes to my code. Since the parallel is not
working
well, I updated and changed the serial one.
Well, that was a while ago and now, due to the updates and changes, the
serial code is different from the old converted parallel code. Some
files
were also deleted and I can't seem to get it working now. So I thought I
might as well convert the new serial code to parallel. But I'm not very
sure
what I should do 1st.
Maybe I should rephrase my question in that if I just convert my
poisson
equation subroutine from a serial PETSc to a parallel PETSc version,
will it
work? Should I expect a speedup? The rest of my code is still serial.
You should, of course, only expect speedup in the parallel parts
Matt
Thank you very much.
Matthew Knepley wrote:
I am not sure why you would ever have two codes. I never do this.
PETSc
is designed to write one code to run in serial and parallel. The PETSc
part
should look identical. To test, run the code yo uhave verified in
serial
and
output PETSc data structures (like Mat and Vec) using a binary viewer.
Then run in parallel with the same code, which will output the same
structures. Take the two files and write a small verification code
that
loads both versions and calls MatEqual and VecEqual.
Matt
On Mon, Apr 14, 2008 at 5:49 AM, Ben Tay <zonexo@xxxxxxxxx> wrote:
Thank you Matthew. Sorry to trouble you again.
I tried to run it with -log_summary output and I found that there's
some
errors in the execution. Well, I was busy with other things and I
just
came
back to this problem. Some of my files on the server has also been
deleted.
It has been a while and I remember that it worked before, only
much
slower.
Anyway, most of the serial code has been updated and maybe it's
easier
to
convert the new serial code instead of debugging on the old parallel
code
now. I believe I can still reuse part of the old parallel code.
However,
I
hope I can approach it better this time.
So supposed I need to start converting my new serial code to
parallel.
There's 2 eqns to be solved using PETSc, the momentum and poisson. I
also
need to parallelize other parts of my code. I wonder which route is
the
best:
1. Don't change the PETSc part ie continue using PETSC_COMM_SELF,
modify
other parts of my code to parallel e.g. looping, updating of values
etc.
Once the execution is fine and speedup is reasonable, then modify
the
PETSc
part - poisson eqn 1st followed by the momentum eqn.
2. Reverse the above order ie modify the PETSc part - poisson eqn
1st
followed by the momentum eqn. Then do other parts of my code.
I'm not sure if the above 2 mtds can work or if there will be
conflicts. Of
course, an alternative will be:
3. Do the poisson, momentum eqns and other parts of the code
separately.
That is, code a standalone parallel poisson eqn and use samples
values
to
test it. Same for the momentum and other parts of the code. When
each of
them is working, combine them to form the full parallel code.
However,
this
will be much more troublesome.
I hope someone can give me some recommendations.
Thank you once again.
Matthew Knepley wrote:
1) There is no way to have any idea what is going on in your code
without -log_summary output
2) Looking at that output, look at the percentage taken by the
solver
KSPSolve event. I suspect it is not the biggest component,
because
it is very scalable.
Matt
On Sun, Apr 13, 2008 at 4:12 AM, Ben Tay <zonexo@xxxxxxxxx> wrote:
Hi,
I've a serial 2D CFD code. As my grid size requirement
increases,
the
simulation takes longer. Also, memory requirement becomes a
problem.
Grid
size 've reached 1200x1200. Going higher is not possible due to
memory
problem.
I tried to convert my code to a parallel one, following the
examples
given.
I also need to restructure parts of my code to enable parallel
looping.
I
1st changed the PETSc solver to be parallel enabled and then I
restructured
parts of my code. I proceed on as longer as the answer for a
simple
test
case is correct. I thought it's not really possible to do any
speed
testing
since the code is not fully parallelized yet. When I finished
during
most of
the conversion, I found that in the actual run that it is much
slower,
although the answer is correct.
So what is the remedy now? I wonder what I should do to check
what's
wrong.
Must I restart everything again? Btw, my grid size is 1200x1200.
I
believed
it should be suitable for parallel run of 4 processors? Is that
so?
Thank you.