[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Stalling once linear system becomes a certain size
- To: petsc-users@xxxxxxxxxxx
- Subject: Re: Stalling once linear system becomes a certain size
- From: "Matthew Knepley" <knepley@xxxxxxxxx>
- Date: Mon, 7 Apr 2008 08:34:19 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=+8c+RHOBKh9SE8Z9ZF/ZoSJfe+ATIShDH5W9zO7l6tQ=; b=Nz8HYk6xLoyWqB5LKQJ3kQT1C83FMZezRZHHp8pX439+mW4bb/JUfpf6GQ0H3ill7ZCGV8DAf2MY390wMRJINebaaNvBN60bed0i+uQElQQ/+PnwF+kbLLx7PUGlovVblV48ZbJgHBPPlaYqjeSNwUZNhlaThg9yZina698uT6U=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=WrBQhysjhWyRaKIaSjQoB1ZaMwFRA+UN7P8hVdtpdgtrngb2u2bh1kqFVZZo5lpMZXx3zM4yfF4eL/XtlZqWMN8z1VK3c4SLGs0hbvJAhqX68b6jc+aaepcnU2YbIisHUxNeJv6CEDScA7/GqktqkCqzcCxmXi5+bWtdTku9WrI=
- In-reply-to: <alpine.LFD.1.10.0804070824080.5395@asterix>
- References: <47F9EEDE.1050604@gmail.com> <alpine.LFD.1.10.0804070824080.5395@asterix>
- Reply-to: petsc-users@xxxxxxxxxxx
- Sender: owner-petsc-users@xxxxxxxxxxx
It sounds like he is saying that the iterative solvers fail to
converge. It could be
that the systems become much more ill-conditioned. When solving anything,
first use LU
-ksp_type preonly -pc_type lu
to determine if the system is consistent. Then use something simple, like
GMRES by itself
-ksp_type gmres -pc_type none -ksp_monitor_singular_value
-ksp_gmres_restart 500
to get an idea of the condition number. Then start trying other solvers and PCs.
Matt
On Mon, Apr 7, 2008 at 8:28 AM, Satish Balay <balay@xxxxxxxxxxx> wrote:
>
> On Mon, 7 Apr 2008, David Knezevic wrote:
>
> > Hello,
> >
> > I am trying to run a PETSc code on a parallel machine (it may be relevant that
> > each node contains four AMD Opteron Quad-Core 64-bit processors (16 cores in
> > all) as an SMP unit with 32GB of memory) and I'm observing some behaviour I
> > don't understand.
> >
> > I'm using PETSC_COMM_SELF in order to construct the same matrix on each
> > processor (and solve the system with a different right-hand side vector on
> > each processor), and when each linear system is around 315x315 (block-sparse),
> > then each linear system is solved very quickly on each processor (approx
> > 7x10^{-4} seconds), but when I increase the size of the linear system to
> > 350x350 (or larger), the linear solves completely stall. I've tried a number
> > of different solvers and preconditioners, but nothing seems to help. Also,
> > this code has worked very well on other machines, although the machines I have
> > used it on before have not had this architecture in which each node is an SMP
> > unit. I was wondering if you have observed this kind of issue before?
> >
> > I'm using PETSc 2.3.3, compiled with the Intel 10.1 compiler.
>
> I would sugest running the code in a debugger to determine the exact
> location where the stall happens [with the minimum number of procs]
>
> mpiexec -n 4 ./exe -start_in_debugger
>
> By default the above tries to open xterms on the localhost - so to get
> this working on the cluster - you might need proper
> ssh-x11-portforwarding setup to the node, and then use the extra
> command line option '-display'
>
> [when the job kinda hangs - I would do ctrl-c in gdb and look at the
> stack trace on each mpi-thread]
>
> Satish
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener