[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: warn message in log summary




When we started PETSc 2 I thought of PETSc errors as ALWAYS being catastrophic: that is the program could NOT continue running.

  Later we started to play with possibly recovering from some errors
and I added the crude PetscException mechanism. For example
I used it to allow the user in their SNES FormFunction to indicate
the input vector was not in the domain, SNES would catch this
and allow the program to continue.

  I am very nervous about mixing a catastrophic error handling system
WITH an exception system. I'd like to go back to the model:
"once seterrq() is called ANYWHERE there is no possibility of
continuing the program. " This means that all "exceptions" have to
be handled on a case by case basis directly with the code. For
example I just added SNESSetFunctionDomainError() to
replace the previous use of SETERRQ(PETSC_ERR_ARG_DOMAIN).
The "handling" of these custom code is then required to properly
handle the resources like PetscLogEventEnd().

  Comments?


Barry





On Nov 26, 2007, at 4:19 PM, Lisandro Dalcin wrote:

On 11/26/07, Barry Smith <bsmith@xxxxxxxxxxx> wrote:
  I've looked long and hard for a PETSc bug that would cause this
problem.
No luck. It seems to happen mostly (only?) on certain machines.

Ups! I've found a possilbe source of the problem, at least for my case! Those negative times I got Order(-1e9) were in fact originated from premature returns due to CHKERRQ macros.

As I was doing Python unittesting, I was making calls generating
error, and catching exceptions, in order to check the error was
correctly set.

However, this way of using PETSc is not safe at all, in general PETSc
does not always recover correctly after an error, and this seems to be
specially true for log machinery.

After surfing the code and hacked PetscLogPrintSummary(),  I added a
check (eventInfo[event].depth == 0) in order to skip reductions of
time values for 'unterminated' events. This worked as expected, and
the even info did not show-up and the warning was not generated...

Could this be a possible 'fix' for this issue??

Richard... Are you completelly sure the negative timmings you were
getting are not related to an error being silenced because of a
missing CHKERRQ macro???



On Nov 26, 2007, at 11:03 AM, Lisandro Dalcin wrote:

I even get consistent time deltas using 'gettimeofday' on my box!!
Perhaps PETSc has some bug somewere?? What do you think??

On 11/26/07, Richard Tran Mills <rmills@xxxxxxxx> wrote:
Lisandro,

Unfortunately, I see the same negative timings problem on the Cray
XT3/4
systems when I configure PETSc to use MPI_Wtime() for all its
timings.  So
that doesn't necessarily fix anything...

--Richard

Lisandro Dalcin wrote:

Perhaps PETSc should use MPI_Wtime as default timer. If a better one
is available, then use it. But then MPIUNI have to also provide an
useful, default implementation.


Runing a simple test, like this (MPICH2):

int main(void)
{
int i;
double t0[100],t1[100];
MPI_Init(0,0);
for (i=0; i<100; i++) {
  t0[i] = MPI_Wtime();
  t1[i] = MPI_Wtime();
}
for (i=0; i< 100; i++) {
  printf("t0=%e, t1=%e, dt=%e\n",t0[i],t1[i],t1[i]-t0[i]);
}
MPI_Finalize();
return 0;
}

and in the SAME box I get the PETSc warning, it consistently gives
me
positive time deltas of the order of MPI_Wtick()...




--
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594






--
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594