Satish,
! Get vector index range per process call VecGetOwnershipRange(B,firstElement,lastElement,error);
do column=0,rhs-1 ! Loop over RHS columns in Identity Matrix
if ((column.ge.firstElement).and.(column.lt.lastElement)) then
call VecSetValue(B,column,one,INSERT_VALUES,error)
end if call VecAssemblyBegin(B,error)
call VecAssemblyEnd(B,error) ! Solve Ax=b
call KSPSolve(ksp,b,x,error);!CHKERRQ(error) if ((column.ge.firstElement).and.(column.lt.lastElement)) then
call VecSetValue(B,column,zero,INSERT_VALUES,error)
end ifend do
Thanks again,
Tim.
a couple of comments:
Looks like most of the time is spent in MatSolve(). [90% for np=1]
However on np=8 run, you have MatSolve() taking 42% time, whereas VecAssemblyBegin() taking 32% time. Depending upon whats beeing done with VecSetValues()/VecAssembly() - you might be able to reduce this time considerably. [ If you can generate values locally - then no communication is required. If you need to communicate values - then you can explore VecScatters() for more efficient communication]
Wrt MatSolve() on 8 procs, the max/min time between any 2 procs is 2.6. [i.e slowest proc is taking 16 sec, so the fastest proc would probably be taking 6 sec.]. The max/min ratio of flops across procs is 1.8. So there is indeed a load balance issue that is contributing to different times on different processors [I guess the slowest proc is doing almost twice the amount of work as the fastest proc].
Satish
On Tue, 20 Nov 2007, Tim Stitt wrote:
Satish,
Logs attached...hope they help.
Thanks,
Tim.
Satish Balay wrote:
Can you send the -log_summary for your runs [say p=1, p=8]
Satish
On Tue, 20 Nov 2007, Tim Stitt wrote:
Hi all (again),
I finally got some data back from the KSP PETSc code that I put together to solve this sparse inverse matrix problem I was looking into. Ideally I am aiming for a O(N) (time complexity) approach to getting the first 'k' columns of the inverse of a sparse matrix.
To recap the method: I have my solver which uses KSPSolve in a loop that iterates over the first k columns of an identity matrix B and computes the corresponding x vector.
I am just a bit curious about some of the timings I am obtaining...which I hope someone can explain. Here are the timings I obtained for a global sparse matrix (4704 x 4704) and solving for the first 1176 columns in the identity using P processes (processors) on our cluster.
(Timings are given in seconds for each process performing work in the loop and were obtained by encapsulating the loop with the cpu_time() Fortran intrinsic. The MUMPS package was requested for factorisation/solving, although similar timings were obtained for both the native solver and SUPERLU)
P=1 [30.92] P=2 [15.47, 15.54] P=4 [4.68, 5.49, 4.67, 5.07] P=8 [2.36, 4,23, 2.81, 2.54, 3.42, 2.22, 1.41, 3.15] P=16 [1.04, 0.45, 1.08, 0.27, 0.87, 0.93, 1.1, 1.06, 0.29, 0.34, 0.73, 0.25, 0.43, 1.09, 1.08, 1.1]
Firstly, I notice very good scalability up to 16 processes...is this expected (by those people who use these solvers regularly)?
Also I notice that the timings per process vary as we scale up. Is this a load-balancing problem related to more non-zero values being on a given processor than others? Once again is this expected?
Please excuse my ignorance of matters relating to these solvers and their operation...as it really isn't my field of expertise.
Regards,
Tim.
-- Dr. Timothy Stitt <timothy_dot_stitt_at_ichec.ie> HPC Application Consultant - ICHEC (www.ichec.ie)
Dublin Institute for Advanced Studies 5 Merrion Square - Dublin 2 - Ireland
+353-1-6621333 (tel) / +353-1-6621477 (fax)