Hgardfou reports NaNs in temperature field
Contents
The symptoms
The log gcm.out reads:
i,k,temperature = 1 1 NaN i,k,temperature = 2 1 NaN i,k,temperature = 3 1 NaN etc.
At the end of a long report of NaNs, there is a message like:
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL STOP 1
and
Stopping in hgardfou Reason = hgardfou stops debutphy Houston, we have a problem, ierr = 1
Known causes
The message indicates numerical instability in the temperature field, which can be caused by different things. It may happen quickly when the simulation starts or more slowly after significant runtime. Note: even though the reported NaNs started at gridbox i,k 1,1, this doesn't mean the instability began there - the hgardfou subroutine checks the temperature field starting from level 1, so it will report bad values in this order.
Wrong value of callthermos
(October 2024): In physiq.def, the flag callthermos must be:
callthermos = y
when running a chemistry simulation with 78 levels. A value of n causes an immediate crash.
Wrong value of nbapp_chem
(October 2024): In physiq.def, the flag nbapp_chem must be:
nbapp_chem = 24000
Other values can cause a slow crash after 0.1-0.2 Venus days.
Age of air tracer included in chemistry array
(October 2024): The age of air cannot be included in chemistry runs. The flags in physiq.def should be:
ok_chem = y ok_aoa = n
or
ok_chem = n ok_aoa = y
Or both can be set to n. This also causes a crash after 0.1-0.2 Venus days. There is a check in conf_phys which should warn the user if the settings are incoherent and stop the simulation.
Compiler optimisation too high
(October 2024): The compiler optimisation flag is set in the arch-[***].fcm files in PROD_FFLAGS for normal runs (and DEBUG_FFLAGS for debug runs). The level should be "-O2" as "-O3" may lead to instability in chemistry runs. The penalty is the simulation will run more slowly.