[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Load Balancing and KSPSolve



Satish,

Thanks for your helpful comments. I am unsure why the VecAssembyBegin() routine is taking a high percentage of the wall-clock when modifications to the parallel vector should be local (all I am doing is working out which element in the RHS b vector should be 1 and setting it).

Here is my loop for iterating through the RHS Identity matrix and setting the relevant element to 1...prior to the call to KSPSolve. I then reset that value to 0 after the Solve in preparation for the next iteration.

! Get vector index range per process
call VecGetOwnershipRange(B,firstElement,lastElement,error);

do column=0,rhs-1   ! Loop over RHS columns in Identity Matrix

    if ((column.ge.firstElement).and.(column.lt.lastElement)) then
       call VecSetValue(B,column,one,INSERT_VALUES,error)
    end if

    call VecAssemblyBegin(B,error)
    call VecAssemblyEnd(B,error)

    ! Solve Ax=b
    call KSPSolve(ksp,b,x,error);!CHKERRQ(error)

    if ((column.ge.firstElement).and.(column.lt.lastElement)) then
       call VecSetValue(B,column,zero,INSERT_VALUES,error)
    end if

 end do

Can you identify if I am doing something stupid which could be compromising the efficiency of the Assembly routine?

Thanks again,

Tim.

Satish Balay wrote:
a couple of comments:

Looks like most of the time is spent in MatSolve(). [90% for np=1]

However on np=8 run, you have MatSolve() taking 42% time, whereas
VecAssemblyBegin() taking 32% time. Depending upon whats beeing done
with VecSetValues()/VecAssembly() - you might be able to reduce this
time considerably. [ If you can generate values locally - then no
communication is required. If you need to communicate values - then
you can explore VecScatters() for more efficient communication]

Wrt MatSolve() on 8 procs, the max/min time between any 2 procs is
2.6.  [i.e slowest proc is taking 16 sec, so the fastest proc would
probably be taking 6 sec.]. The max/min ratio of flops across procs is
1.8. So there is indeed a load balance issue that is contributing to
different times on different processors [I guess the slowest proc is
doing almost twice the amount of work as the fastest proc].

Satish

On Tue, 20 Nov 2007, Tim Stitt wrote:

Satish,

Logs attached...hope they help.

Thanks,

Tim.

Satish Balay wrote:
Can you send the -log_summary for your runs [say p=1, p=8]

Satish

On Tue, 20 Nov 2007, Tim Stitt wrote:

Hi all (again),

I finally got some data back from the KSP PETSc code that I put together
to
solve this sparse inverse matrix problem I was looking into. Ideally I am
aiming for a O(N) (time complexity) approach to getting the first 'k'
columns
of the inverse of a sparse matrix.

To recap the method: I have my solver which uses KSPSolve in a loop that
iterates over the first k columns of an identity matrix B and computes the
corresponding x vector.

I am just a bit curious about some of the timings I am obtaining...which I
hope someone can explain. Here are the timings I obtained for a global
sparse
matrix (4704 x 4704) and solving for the first 1176 columns in the
identity
using P processes (processors) on our cluster.

(Timings are given in seconds for each process performing work in the loop
and
were obtained by encapsulating the loop with the cpu_time() Fortran
intrinsic.
The MUMPS package was requested for factorisation/solving, although
similar
timings were obtained for both the native solver and SUPERLU)

P=1  [30.92]
P=2  [15.47, 15.54]
P=4  [4.68, 5.49, 4.67, 5.07]
P=8  [2.36, 4,23, 2.81, 2.54, 3.42, 2.22, 1.41, 3.15]
P=16 [1.04, 0.45, 1.08, 0.27, 0.87, 0.93, 1.1, 1.06, 0.29, 0.34, 0.73,
0.25,
0.43, 1.09, 1.08, 1.1]

Firstly, I notice very good scalability up to 16 processes...is this
expected
(by those people who use these solvers regularly)?

Also I notice that the timings per process vary as we scale up. Is this a
load-balancing problem related to more non-zero values being on a given
processor than others? Once again is this expected?

Please excuse my ignorance of matters relating to these solvers and their
operation...as it really isn't my field of expertise.

Regards,

Tim.






--
Dr. Timothy Stitt <timothy_dot_stitt_at_ichec.ie>
HPC Application Consultant - ICHEC (www.ichec.ie)

Dublin Institute for Advanced Studies
5 Merrion Square - Dublin 2 - Ireland

+353-1-6621333 (tel) / +353-1-6621477 (fax)