[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Non repeatability issue



Dear users,
in the notes attached here we address an issue concerning
nonrepeatability.
We know this is a known issue in parallel floating point programs,
but we would like to be sure we are interpreting
things correctly a not mis-using PETSc.

Comments are welcome.

Regards,
Aldo

--
Dr. Aldo Bonfiglioli
Dip.to di Ingegneria e Fisica dell'Ambiente (DIFA)
Universita' della Basilicata
V.le dell'Ateneo lucano, 10 85100 Potenza ITALY
tel:+39.0971.205203 fax:+39.0971.205160

Dear all,
I have recently observed a non-repeatibility issue with my unstructured CFD code.

I solve the steady flow eqns. using pseudo-timestepping.
PETSc is used to solve the large sparse linear systems arising at each psudo-time-step.
Once close to the steady state solution, the time step size is increased to infinity
so that Newton's method is recovered.

Here below is a sample 2D calculation performed on 4 procs of a linux Beowulf cluster
(but we also tested on a 4 procs ES40).

The first two columns respectively show the time-step counter (i.e. the
nonlinear iteration counter) and the KSP iterations required to solve the linear system.
Cols. 5 - 8 display the nodal residuals (for mass, energy, etc.)

In the early stages of the (non-linear) iterative process all residuals are
identical among different runs. Once we get close to machine accuracy (iterations 48-50
in the example below), the residuals differ as do the KSP iterations for convergence.

We believe to have identified the source of the differences in the following:
At each step we solve a linear sistem of the form A x = rhs.
Close to steady state, both x and rhs are close to machine eps. This is because
rhs is the nodal residual which converges to machine eps as steady state is reached
and x is the nodal increment (also converging to machine eps at steady state).

As we look at the rhs (for instance) computed on each processor
(that is the processor owned elements of a ghosted rhs Vec), this is identical
between different runs. Once we call GhostUpdateBegin/End this is not any
more true, although differences are of the order of machine eps.

Our explanation is the following.
The grid is partitioned so that the triangular cells belong only to a 
processor, but gridpoints (where the variables are stored) are shared among
adjacent subdomains/procs. The Rhs is built in a FE manner by assembling
contributions from surrounding cells. Given a gridpoint i shared by various
subdomains (that is lying on an interdomain boundary), 
(*) Rhs_i = Rhs_i^p1 + Rhs_i^p2 + Rhs_i^p3 where the various contributions p1,p2, etc.
are due to group of cells (triangles) belonging to different processors p1,p2,etc.
At steady state Rhs_i must be of order machine eps, however some of the contributions
will be of larger order of magnitude but with different sign (in order to cancel out) while other contributions
could again be of order machine eps.
E.g. Rhs_i^p1 = - Rhs_i^p2 and Rhs_i^p3 order machine eps .
In this situation the result of the summation (*) will depend on
the order by which the various terms on the right hand side are summed up.
This is what we observe in practise on more than 2 procs..
When using two procs. there are only two contributions in the summation (*)
and the convergence history is repeatable.

Now, although this may look as a minor issue and in fact in the example below
(inviscid flow past a 2D profile) as well as in many others
all runs converge towards "steady state"
we experienced the fact that on larger 3D simulations when we
solve the RANS eqns using Newton's algorithm, the convergence history is
run-dependent and different runs may behave fairly differently.
An example is attached below: here we use ASM + ILU(1) + GMRES(60).
Differences start to show up at the fourth non-linear iteration
and run1 will eventually "converge" by reducing the non-linear residuals
below 1.e-10, while the residuals of run2 stagnate.


INVISCID 2D run


Newton KSP                       +------- Non-linear residuals ------------+
step   its

run1
    1   19 0.1516E+02 0.1642E+02 0.9616E-02 0.7013E-01 0.1676E-01 0.1564E-01 0.3000E+02
    2   35 0.1186E+01 0.2187E+02 0.3471E-02 0.1860E-01 0.8738E-02 0.7425E-02 0.8312E+02
   47   89 0.3674E+01 0.4492E+03 0.4005E-13 0.2490E-12 0.8549E-13 0.7615E-13 0.7203E+13
   48   89 0.4874E+01 0.4560E+03 0.2639E-13 0.1710E-12 0.5629E-13 0.4947E-13 0.1093E+14
   49   89 0.9156E+01 0.4671E+03 0.1841E-13 0.1334E-12 0.4205E-13 0.3460E-13 0.1567E+14
   50  105 0.9106E+01 0.4780E+03 0.1485E-13 0.1138E-12 0.3245E-13 0.2475E-13 0.1942E+14
run2
    1   19 0.6034E+01 0.7502E+01 0.9616E-02 0.7013E-01 0.1676E-01 0.1564E-01 0.3000E+02
    2   35 0.2107E+01 0.1126E+02 0.3471E-02 0.1860E-01 0.8738E-02 0.7425E-02 0.8312E+02
   47   89 0.1152E+02 0.4406E+03 0.3978E-13 0.2469E-12 0.8471E-13 0.7610E-13 0.7253E+13
   48   89 0.5312E+01 0.4468E+03 0.2585E-13 0.1728E-12 0.5808E-13 0.5017E-13 0.1116E+14
   49   91 0.7624E+01 0.4560E+03 0.1839E-13 0.1333E-12 0.4204E-13 0.3432E-13 0.1569E+14
   50  106 0.5025E+01 0.4620E+03 0.1468E-13 0.1074E-12 0.3237E-13 0.2618E-13 0.1965E+14
run3
    1   19 0.7433E+01 0.9449E+01 0.9616E-02 0.7013E-01 0.1676E-01 0.1564E-01 0.3000E+02
    2   35 0.1460E+01 0.1495E+02 0.3471E-02 0.1860E-01 0.8738E-02 0.7425E-02 0.8312E+02
   47   89 0.9440E+01 0.4151E+03 0.3962E-13 0.2455E-12 0.8454E-13 0.7573E-13 0.7280E+13
   48   89 0.1344E+02 0.4306E+03 0.2564E-13 0.1704E-12 0.5667E-13 0.5045E-13 0.1125E+14
   49   94 0.8316E+01 0.4406E+03 0.1748E-13 0.1274E-12 0.4100E-13 0.3388E-13 0.1651E+14
   50  104 0.4968E+01 0.4469E+03 0.1539E-13 0.1098E-12 0.3231E-13 0.2516E-13 0.1874E+14
run4
    1   19 0.2243E+01 0.3578E+01 0.9616E-02 0.7013E-01 0.1676E-01 0.1564E-01 0.3000E+02
    2   35 0.4621E+01 0.2723E+02 0.3471E-02 0.1860E-01 0.8738E-02 0.7425E-02 0.8312E+02
   47   89 0.7702E+01 0.4093E+03 0.3982E-13 0.2487E-12 0.8516E-13 0.7588E-13 0.7244E+13
   48   89 0.8366E+01 0.4188E+03 0.2588E-13 0.1739E-12 0.5760E-13 0.4989E-13 0.1114E+14
   49   93 0.1078E+02 0.4315E+03 0.1808E-13 0.1248E-12 0.3912E-13 0.3438E-13 0.1596E+14
   50  106 0.3265E+01 0.4363E+03 0.1549E-13 0.1135E-12 0.3467E-13 0.2506E-13 0.1863E+14
run5
    1   19 0.8703E+01 0.1049E+02 0.9616E-02 0.7013E-01 0.1676E-01 0.1564E-01 0.3000E+02
    2   35 0.1402E+01 0.1512E+02 0.3471E-02 0.1860E-01 0.8738E-02 0.7425E-02 0.8312E+02
   47   89 0.9737E+01 0.4423E+03 0.3995E-13 0.2498E-12 0.8584E-13 0.7608E-13 0.7221E+13
   48   89 0.3896E+01 0.4484E+03 0.2608E-13 0.1717E-12 0.5689E-13 0.4955E-13 0.1106E+14
   49   90 0.2708E+01 0.4525E+03 0.1841E-13 0.1318E-12 0.4075E-13 0.3397E-13 0.1567E+14
   50  104 0.8198E+01 0.4622E+03 0.1507E-13 0.1137E-12 0.3450E-13 0.2644E-13 0.1914E+14


3D RANS (on 14 procs.)
                                +--------------- residuals -------------------------- +
run1
    1  300 0.5524E+02 0.7469E+02 0.3694E-07 0.5036E-07 0.2536E-07 0.6433E-07 0.3260E-06 0.5000E+02
    2    8 0.2160E+01 0.9547E+02 0.2600E-06 0.3789E-07 0.1731E-06 0.2833E-06 0.3886E-03 0.7102E+01
    3   10 0.2455E+01 0.1165E+03 0.1314E-06 0.7531E-07 0.1525E-06 0.2746E-06 0.7369E-04 0.1405E+02
    4   16 0.3338E+01 0.1385E+03 0.9384E-07 0.4916E-07 0.1621E-06 0.1764E-06 0.4438E-04 0.1968E+02
    5   18 0.3649E+01 0.1608E+03 0.3103E-07 0.3694E-07 0.1028E-06 0.9826E-07 0.2768E-04 0.5952E+02
...
   38   31 0.5803E+01 0.9777E+03 0.7378E-08 0.1099E-07 0.7864E-08 0.8981E-08 0.1958E-04 0.2503E+03
   39   53 0.1009E+02 0.1006E+04 0.3250E-08 0.9071E-08 0.2718E-08 0.5060E-08 0.8113E-05 0.5683E+03
   40   47 0.8849E+01 0.1034E+04 0.1351E-08 0.6064E-08 0.2013E-08 0.3336E-08 0.4884E-05 0.1367E+04
   41   47 0.8856E+01 0.1061E+04 0.3722E-08 0.3737E-08 0.2168E-08 0.4815E-08 0.5811E-05 0.4963E+03
   42   31 0.5822E+01 0.1086E+04 0.7197E-08 0.5195E-08 0.1029E-07 0.3716E-08 0.1042E-05 0.2566E+03
   43   47 0.8854E+01 0.1113E+04 0.3824E-08 0.1100E-07 0.1788E-08 0.4469E-08 0.5128E-05 0.4830E+03
   44   60 0.1162E+02 0.1143E+04 0.2248E-08 0.5162E-08 0.2033E-08 0.2756E-08 0.8732E-06 0.8216E+03
   45  115 0.2117E+02 0.1183E+04 0.1570E-08 0.2492E-08 0.9909E-09 0.3213E-08 0.1739E-05 0.1176E+04
   46  226 0.4071E+02 0.1243E+04 0.2806E-09 0.1260E-08 0.1742E-09 0.4544E-09 0.5404E-06 0.6582E+04
   47  515 0.9198E+02 0.1353E+04 0.4936E-10 0.5056E-09 0.8438E-10 0.3446E-10 0.5972E-07 0.3742E+05
run2
    1  300 0.5465E+02 0.7417E+02 0.3694E-07 0.5036E-07 0.2536E-07 0.6433E-07 0.3260E-06 0.5000E+02
    2    8 0.2154E+01 0.9502E+02 0.2601E-06 0.3789E-07 0.1731E-06 0.2833E-06 0.3886E-03 0.7102E+01
    3   10 0.2449E+01 0.1162E+03 0.1314E-06 0.7530E-07 0.1526E-06 0.2746E-06 0.7369E-04 0.1406E+02
    4   18 0.3641E+01 0.1386E+03 0.9381E-07 0.4916E-07 0.1621E-06 0.1764E-06 0.4437E-04 0.1969E+02
    5   18 0.3646E+01 0.1610E+03 0.3104E-07 0.3693E-07 0.1028E-06 0.9825E-07 0.2769E-04 0.5950E+02
...
   46   50 0.9383E+01 0.1100E+04 0.4251E-08 0.6184E-07 0.1024E-07 0.9866E-08 0.2737E-04 0.4344E+03
   47   43 0.7984E+01 0.1127E+04 0.5129E-08 0.5265E-07 0.5989E-08 0.4442E-08 0.1972E-04 0.3601E+03
   48    8 0.2164E+01 0.1148E+04 0.7553E-07 0.5328E-07 0.2634E-07 0.1430E-06 0.9913E-04 0.2445E+02
   49   21 0.4108E+01 0.1171E+04 0.1561E-07 0.5173E-07 0.4150E-07 0.6183E-07 0.1828E-04 0.1183E+03
   50   35 0.6471E+01 0.1196E+04 0.6973E-08 0.2463E-07 0.1185E-07 0.1466E-07 0.1397E-04 0.2649E+03
   51   39 0.7233E+01 0.1222E+04 0.4899E-08 0.1581E-07 0.4077E-08 0.6338E-08 0.1131E-04 0.3770E+03
   52   48 0.8969E+01 0.1249E+04 0.3529E-08 0.1495E-07 0.4077E-08 0.3640E-08 0.8881E-05 0.5234E+03
   53   42 0.7789E+01 0.1276E+04 0.4898E-08 0.9435E-08 0.2343E-08 0.4393E-08 0.4517E-05 0.3771E+03
   54   43 0.7984E+01 0.1303E+04 0.4259E-08 0.8999E-08 0.2747E-08 0.7004E-08 0.3261E-05 0.4337E+03
   55   17 0.3496E+01 0.1325E+04 0.5685E-08 0.5167E-08 0.7471E-08 0.7705E-08 0.3686E-05 0.3249E+03