Difference between revisions of "Guidelines for Environmentally Sustainable Simulations"

From Planets
Jump to: navigation, search
(Created page with "Reducing our environmental footprint Table of contents Estimate the CO2eq computation footprint Good practices Which model is needed to answer my research question ? Are there...")
 
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
Reducing our environmental footprint
+
This guide, which was largely inspired from Earth IPSL climate models wiki page, provides good practices to reduce as much as possible the environmental footprints of simulations performed with our GCMs. We encourage you to go through the whole list of good practices below:
Table of contents
 
Estimate the CO2eq computation footprint
 
Good practices
 
Which model is needed to answer my research question ?
 
Are there experiments that already exist to answer my research question?
 
Is my experimental design well thought through?
 
Notify the configuration manager
 
Can I share my results to help another researcher?
 
Consider the necessary diagnostics
 
Check everything before launching the production run
 
Two pairs of eyes are better than one
 
Don't wait for the end of the simulation before checking
 
Share your results with other researchers
 
Decrease the number of inodes
 
Running simulations has a significant environmental impact both in terms of computing and mass storage. Determining the carbon footprint of a simulation is not an easy task as it depends on several factors.
 
  
Estimate the CO2eq computation footprint
+
=Which model is needed to answer your research question ?=
To provide a rough estimate we may consider that running a 100 year long climate simulation with IPSL-CM6A-LR requires 150 000 core-hours and produces about 150 kg CO2eq. Multiplied by the number of simulations performed (for example 80 000 years of simulation were done for cmip6) this constitutes a non-negligible fraction of the IPSL carbon footprint. This is why we should collectively aim to reduce the environmental footprint of climate modeling.
 
  
The environmental footprint comes from the computing itself but also from the mass storage hence, it is important to target both aspects in our modeling activity. Adopting good computing practices on computing and storage can contribute to reduce a bit the environmental footprint of climate modeling. Here we attempt to provide such good practices when it comes to using the IPSL climate models.
+
Is a full 3D climate model needed to answer your research question? Could a simple, less computationally-expensive model (such as 1D versions of the model, namely rcm1d or kcm1d) provide you the answer?
  
Good practices
+
=Are there simulations that already exist to answer your research question?=
We encourage you to go through the questions / steps below before running a long computationally-expensive set of climate experiments. Feedbacks on this guide are welcomed.
 
  
Which model is needed to answer my research question ?
+
We encourage you to use the IPSL Planeto slack, or directly contact experienced developers/users of the model, to ask whether such simulations exist. Maybe available simulations are not exactly what you need but their preliminary analysis can speed up and narrow down the design of your new set of simulations.
  
Is a full-fledged climate model needed to answer my research question? Could a simple, less computationally-expensive model (such as OSCAR for present-day and future climate or iLOVECLIM for paleoclimates) provide me the answer?
+
=Is your experimental design well thought through?=
  
If the IPSL climate model is what you need to do your research, then it is important to minimize the number of simulations that you have to run.
+
Which effect/mechanism/signal/etc. are you looking for in your simulations? Have you done a literature review to ensure that your numerical experiments are well designed to do so?
  
Are there experiments that already exist to answer my research question?
+
=Talk with experienced developers/users=
  
Are there experiments (e.g., from CMIP6 on /bdd/CMIP6 on spirit) that already exist and that can be used to answer my research question? Maybe these experiments are not exactly what I need but their preliminary analysis can speed up and narrow down the design of the experiments I need to run.
+
We encourage you to discuss with experienced developers/users before running new model configurations. This will ensure that your study is well thought through and prepared.
  
Is my experimental design well thought through?
+
=Consider the necessary diagnostics=
 
 
Do I have an estimate of the signal-to-noise ratio that I am looking for in my experiments? If my experimental design requires running an ensemble of simulations, do I have an estimate of the ensemble size required and can this ensemble size be reduced by a better experimental design (e.g., by increasing the forcing)?
 
 
 
Notify the configuration manager
 
 
 
Before using a model configuration, you should always discuss with the configuration manager. This will ensure that your study is well thought through and prepared.
 
 
 
Can I share my results to help another researcher?
 
 
 
Maybe my climate experiments can be useful to someone else at IPSL. Discuss your experimental design around and pool with other scientists at IPSL to run a joint set of experiments with all the climate model output you need.
 
 
 
It is also important to minimize the risk of having to rerun simulations because of missing diagnostics or an incorrect experimental design.
 
 
 
Consider the necessary diagnostics
 
  
 
You should check that you have all the diagnostics you expect will be needed to analyze the results later on. As high-frequency diagnostics slow down the model and use a lot of mass storage, you should limit the high-frequency diagnostics to what is required. It may be appropriate to output diagnostics at a high-frequency resolution only for a sub-period of the simulation.
 
You should check that you have all the diagnostics you expect will be needed to analyze the results later on. As high-frequency diagnostics slow down the model and use a lot of mass storage, you should limit the high-frequency diagnostics to what is required. It may be appropriate to output diagnostics at a high-frequency resolution only for a sub-period of the simulation.
  
Check everything before launching the production run
+
=Check everything before launching the production run=
  
First, run short simulations in TEST or DEVT mode to check that you have all the diagnostics you need and that the model is doing what you expect it to do. Always compile in prod mode for the production experiments to get most optimization out of the Fortran compiler.
+
First, run short simulations to check that you have all the diagnostics you need and that the model is doing what you expect it to do.
  
 
If you plan to launch an ensemble of simulations: start with only one member, wait to have outputs and check them before starting all other members. Indeed it is easier and less consuming to clean and redo a short simulation instead of a full ensemble.
 
If you plan to launch an ensemble of simulations: start with only one member, wait to have outputs and check them before starting all other members. Indeed it is easier and less consuming to clean and redo a short simulation instead of a full ensemble.
  
Two pairs of eyes are better than one
+
=Two pairs of eyes are better than one=
  
 
If in doubt, you may ask a colleague to double check your experimental setup with you.
 
If in doubt, you may ask a colleague to double check your experimental setup with you.
  
Don't wait for the end of the simulation before checking
+
=Don't wait for the end of the simulation before checking=
  
 
Your model experiments are now running.
 
Your model experiments are now running.
  
You may check during the run that the simulations are doing what you expect them to do. You may also check that their computational cost is in line with what you expect. If the computational cost is beyond what you expect (see ​this page), and if you didn't add many diagnostics, it may be that you introduced a not-so-well-optimized new piece of code.
+
You may check during the run that the simulations are doing what you expect them to do. You may also check that their computational cost is in line with what you expect. If the computational cost is beyond what you expect (i.e., simulations run much slower than expected), and if you didn't add many diagnostics, it may be that you introduced a not-so-well-optimized new piece of code, or that you did not optimize well enough the choices of timesteps.
 +
 
 +
For example, the radiative transfer is usually the slowest part of the model, so you can potentially significantly accelerate simulations by reducing the frequency at which you call radiative transfer (by increasing the value of "iradia" in callphys.def).
  
Note: despite all your attention, your experiments turn out to be bugged, the experimental design was inappropriate or some diagnostics are missing. These things happen, do not blame yourself too much, analyze your errors and you may rerun the simulations with the corrected experimental design or setup! High research quality remains what we aim for.
+
''Note: If despite all your attention, your experiments turn out to be bugged, the experimental design was inappropriate or some diagnostics are missing ==> these things happen, do not blame yourself too much, analyze your errors and you may rerun the simulations with the corrected experimental design or setup! High research quality remains what we aim for.''
  
Share your results with other researchers
+
=Share your results with other researchers=
  
 
Your model experiments are done and it is time to analyze them.
 
Your model experiments are done and it is time to analyze them.
  
When it comes to analyzing your results, you may avoid duplicating the output. If you run your analysis on the IPSL spirit cluster, then you can make your model output visible through thredds (see this page).
+
When it comes to analyzing your results, you may avoid duplicating the output.  
  
Decrease the number of inodes
+
When you have finished the analysis, it is a good idea to keep only the part of the model outputs that has been useful or may be useful in the future. You may delete test simulations that are obsolete and archive the rest (using cc_pack, tar, zip) on the storedir to diminish the environmental impact of mass storage.
  
When you have finished the analysis, it is a good idea to keep only the part of the model outputs that has been useful or may be useful in the future. You may delete test simulations that are obsolete and archive the rest (using cc_pack, tar, zip) on the storedir to diminish the environmental impact of mass storage
+
Last, there are many online repositories (zenodo, IPSL, etc.) that you can use to publicly share the final outputs of your GCM simulations, for further uses by the community.

Latest revision as of 07:46, 2 April 2024

This guide, which was largely inspired from Earth IPSL climate models wiki page, provides good practices to reduce as much as possible the environmental footprints of simulations performed with our GCMs. We encourage you to go through the whole list of good practices below:

Which model is needed to answer your research question ?

Is a full 3D climate model needed to answer your research question? Could a simple, less computationally-expensive model (such as 1D versions of the model, namely rcm1d or kcm1d) provide you the answer?

Are there simulations that already exist to answer your research question?

We encourage you to use the IPSL Planeto slack, or directly contact experienced developers/users of the model, to ask whether such simulations exist. Maybe available simulations are not exactly what you need but their preliminary analysis can speed up and narrow down the design of your new set of simulations.

Is your experimental design well thought through?

Which effect/mechanism/signal/etc. are you looking for in your simulations? Have you done a literature review to ensure that your numerical experiments are well designed to do so?

Talk with experienced developers/users

We encourage you to discuss with experienced developers/users before running new model configurations. This will ensure that your study is well thought through and prepared.

Consider the necessary diagnostics

You should check that you have all the diagnostics you expect will be needed to analyze the results later on. As high-frequency diagnostics slow down the model and use a lot of mass storage, you should limit the high-frequency diagnostics to what is required. It may be appropriate to output diagnostics at a high-frequency resolution only for a sub-period of the simulation.

Check everything before launching the production run

First, run short simulations to check that you have all the diagnostics you need and that the model is doing what you expect it to do.

If you plan to launch an ensemble of simulations: start with only one member, wait to have outputs and check them before starting all other members. Indeed it is easier and less consuming to clean and redo a short simulation instead of a full ensemble.

Two pairs of eyes are better than one

If in doubt, you may ask a colleague to double check your experimental setup with you.

Don't wait for the end of the simulation before checking

Your model experiments are now running.

You may check during the run that the simulations are doing what you expect them to do. You may also check that their computational cost is in line with what you expect. If the computational cost is beyond what you expect (i.e., simulations run much slower than expected), and if you didn't add many diagnostics, it may be that you introduced a not-so-well-optimized new piece of code, or that you did not optimize well enough the choices of timesteps.

For example, the radiative transfer is usually the slowest part of the model, so you can potentially significantly accelerate simulations by reducing the frequency at which you call radiative transfer (by increasing the value of "iradia" in callphys.def).

Note: If despite all your attention, your experiments turn out to be bugged, the experimental design was inappropriate or some diagnostics are missing ==> these things happen, do not blame yourself too much, analyze your errors and you may rerun the simulations with the corrected experimental design or setup! High research quality remains what we aim for.

Share your results with other researchers

Your model experiments are done and it is time to analyze them.

When it comes to analyzing your results, you may avoid duplicating the output.

When you have finished the analysis, it is a good idea to keep only the part of the model outputs that has been useful or may be useful in the future. You may delete test simulations that are obsolete and archive the rest (using cc_pack, tar, zip) on the storedir to diminish the environmental impact of mass storage.

Last, there are many online repositories (zenodo, IPSL, etc.) that you can use to publicly share the final outputs of your GCM simulations, for further uses by the community.