Difference between revisions of "Running Mars mesoscale model"

From Planets
Jump to: navigation, search
(Troubleshooting restart files in big simulations)
Line 49: Line 49:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
The problem is in the file wrf_io.F90. When we execute a search for this file, there are various copies of it (only files with a number before deserve our attention, others are flags)
+
The problem is in the file wrf_io.F90. We have to update two files
  
 
<syntaxhighlight>
 
<syntaxhighlight>
NO./code/MESOSCALE/LMD_MM_MARS/mars_lmd_new_real_mpifort_64/WRFV2/inc/wrf_io_flags.h
+
code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_netcdf/wrf_io.F90: line 1188
02./code/MESOSCALE/LMD_MM_MARS/mars_lmd_new_real_mpifort_64/WRFV2/external/io_netcdf/wrf_io.F90
+
code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_pnetcdf/wrf_io.F90: line 1195
NO./code/MESOSCALE/LMD_MM_MARS/mars_lmd_new_real_mpifort_64/WRFV2/external/io_netcdf/wrf_io.o
 
04./code/MESOSCALE/LMD_MM_MARS/mars_lmd_new_real_mpifort_64/WRFV2/external/io_netcdf/wrf_io.f
 
NO./code/MESOSCALE/LMD_MM_MARS/mars_lmd_new_real_mpifort_64/WRFV2/external/io_grib_share/wrf_io_flags.h
 
06./code/MESOSCALE/LMD_MM_MARS/mars_lmd_new_real_mpifort_64/WRFV2/external/io_pnetcdf/wrf_io.F90
 
NO./code/MESOSCALE/LMD_MM_MARS/mars_lmd_new_real_mpifort_64/WRFV2/external/ioapi_share/wrf_io_flags.h
 
08./code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_netcdf/wrf_io.F90
 
NO./code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_grib_share/wrf_io_flags.h
 
10./code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_pnetcdf/wrf_io.F90
 
NO./code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/ioapi_share/wrf_io_flags.h
 
NO./code/MESOSCALE/LMD_MM_MARS/SRC/DEV/simpler_compile_LES_phys/external/io_grib_share/wrf_io_flags.h
 
NO./code/MESOSCALE/LMD_MM_MARS/SRC/LES/modif_mars/dble_preci/external/io_grib_share/wrf_io_flags.h
 
NO./code/MESOSCALE/LMD_MM_MARS/SRC/LES/modif_mars/dble_preci/external/ioapi_share/wrf_io_flags.h
 
 
</syntaxhighlight>
 
</syntaxhighlight>
  
The problem is caused by the big size of the restart files, it will happen when files are bigger than 2gb. In line 1188 of wrf_io.*(1195 in some versions of the file we find:
+
The problem is caused by the big size of the restart files, it will happen when files are bigger than 2gb.  
 +
 
 +
Apparently, these parameters only work when writing files smaller than 2gb. We can fix that be replacing:
 +
 
 +
<syntaxhighlight>
 +
NF_CLOBBER
 +
</syntaxhighlight>
 +
by
 
<syntaxhighlight>
 
<syntaxhighlight>
stat = NF_CREATE(FileName, NF_CLOBBER, DH%NCID)
+
IOR(NF_CLOBBER,NF_64BIT_OFFSET)
 
</syntaxhighlight>
 
</syntaxhighlight>
This is documented in https://docs.unidata.ucar.edu/netcdf-fortran/current/nc_f77_interface_guide.html
 
  
Apparently, these parameters only work when writing files smaller than 2gb. We can fix that be replacing this line by:
+
Then the resulting line looks like
 +
 
 
<syntaxhighlight>
 
<syntaxhighlight>
 +
code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_netcdf/wrf_io.F90: line 1188
 
stat = NF_CREATE(FileName, IOR(NF_CLOBBER,NF_64BIT_OFFSET), DH%NCID)
 
stat = NF_CREATE(FileName, IOR(NF_CLOBBER,NF_64BIT_OFFSET), DH%NCID)
 +
 +
code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_pnetcdf/wrf_io.F90: line 1195
 +
stat = NFMPI_CREATE(Comm, FileName, IOR(NF_CLOBBER,NF_64BIT_OFFSET), MPI_INFO_NULL, DH%NCID)
 
</syntaxhighlight>
 
</syntaxhighlight>
  
IMPORTANT NOTES.
+
This is documented in https://docs.unidata.ucar.edu/netcdf-fortran/current/nc_f77_interface_guide.html
* Even if the error produced refers to wrf_io.F90, the problem didn't solve for me when updating just files finished by F90, I had to update them all.
 
* In some of these files the function is not NF_CREATE but something slightly different. The essence of the fix is to replace NF_CLOBBER by IOR(NF_CLOBBER,NF_64BIT_OFFSET)
 

Revision as of 23:43, 16 March 2025

This is page I start to introduce some notes to remind that could be useful for other users (Jorge).

The user manual can be found at https://gitlab.in2p3.fr/la-communaut-des-mod-les-atmosph-riques-plan-taires/git-trunk/-/blob/master/MESOSCALE/MANUAL/SRC/user_manual.pdf?ref_type=heads

Regarding NaNs and parallel running

The mesoscale model cannot run in parallel in spirit at the moment, it runs but it outputs NaNs everywhere.

When running in a single core, in my case at least it produces NaNs with low output frequency (37) but by increasing the output frequency (6 works for me) it is solved. This is weird but seem to be right.

Increasing the output frequency in parallel runs doesn't work. Even with a frequency of 1 doesn't work.

Notes on clear input and output files (TBD)

Notes on configuration files (TBD)

namelist.input

Restarting simulations

In the file namelist.input there is the option "restart". The comment for this option says "Output restart files?" but actually it refers to whether or not the simulation should try to start from a previous simulation. Restart files are periodically saved, this can be configured with restart_interval. The value is an integer number corresponding to the number of "minutes" (with a Martian hour having 37 minutes) between restart files. By default this is very long (8880, corresponding to 10 sols). If we write 37 it will be every hour. If we write 37*6 it will be every 6 hours, etc.

To restart from an existing restart file, we must change the "restart" option to .true., and the start_month,start_day, and start_hour need to be updated to coincide with the time of the restart file. Since we can tune the starting of the simulation with a resolution of hours (and not minutes), this means that the restart files need to be saved at integer hours, and therefore the restart_interval option should have values 37*N, with N the number of hours of interval between restart files.

Troubleshooting restart files in big simulations

In big simulations restart files fail to be saved. This does not cause any problem, and the simulation keeps running. When increasing verbosity to one, the rsl.error file contains this error messages (Note that for this debug execution we set restart_interval to 1 to obtain the error quickly):

 d01 2024-10-01_06:01:00 med_restart_out: opening wrfrst_d01_2024-10-01_06:01:00
  for writing
 d01 2024-10-01_06:01:00  NetCDF error: NetCDF: One or more variable sizes viola
 te format constraints
 d01 2024-10-01_06:01:00  NetCDF error in ext_ncd_open_for_write_commit wrf_io.F
 90, line        1279
 med_restart_out: opening wrfrst_d01_2024-10-01_06:01:00 for writing
 d01 2024-10-01_06:01:00  Warning 2 DRYRUNS 1 VARIABLE in wrf_io.F90, line
   2303

[This line repeats many times]

d01 2024-10-01_06:01:00  Warning 2 DRYRUNS 1 VARIABLE in wrf_io.F90, line
   2303
Timing for Writing restart for domain        1:    0.06920 elapsed seconds.
 d01 2024-10-01_06:01:00  Warning TRY TO CLOSE DRYRUN in ext_ncd_ioclose wrf_io.
 F90, line        1311
 d01 2024-10-01_06:01:00  NetCDF error: NetCDF: One or more variable sizes viola
 te format constraints
 d01 2024-10-01_06:01:00  NetCDF error in ext_ncd_ioclose wrf_io.F90, line
   1329

The problem is in the file wrf_io.F90. We have to update two files

code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_netcdf/wrf_io.F90: line 1188
code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_pnetcdf/wrf_io.F90: line 1195

The problem is caused by the big size of the restart files, it will happen when files are bigger than 2gb.

Apparently, these parameters only work when writing files smaller than 2gb. We can fix that be replacing:

NF_CLOBBER

by

IOR(NF_CLOBBER,NF_64BIT_OFFSET)

Then the resulting line looks like

code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_netcdf/wrf_io.F90: line 1188
stat = NF_CREATE(FileName, IOR(NF_CLOBBER,NF_64BIT_OFFSET), DH%NCID)

code/MESOSCALE/LMD_MM_MARS/SRC/WRFV2/external/io_pnetcdf/wrf_io.F90: line 1195
stat = NFMPI_CREATE(Comm, FileName, IOR(NF_CLOBBER,NF_64BIT_OFFSET), MPI_INFO_NULL, DH%NCID)

This is documented in https://docs.unidata.ucar.edu/netcdf-fortran/current/nc_f77_interface_guide.html