Difference between revisions of "Tool Box"
(→Data Handling and Visualization Software) |
LTeinturier (talk | contribs) (→XARRAY python library (more modern)) |
||
(16 intermediate revisions by 2 users not shown) | |||
Line 12: | Line 12: | ||
When you execute newstart, you can use both a ''start2archive'' file or the start files (''start.nc'' and ''startfi.nc''). Then the interactive interface will propose to modify several physical quantities such as the gravity, the surface pressure or the rotation of the planet. At the end of the procedure, two files are created: '' '''re'''start.nc'' and '' '''re'''startfi.nc''. They can be renamed and used as start files to initialize a new simulation. | When you execute newstart, you can use both a ''start2archive'' file or the start files (''start.nc'' and ''startfi.nc''). Then the interactive interface will propose to modify several physical quantities such as the gravity, the surface pressure or the rotation of the planet. At the end of the procedure, two files are created: '' '''re'''start.nc'' and '' '''re'''startfi.nc''. They can be renamed and used as start files to initialize a new simulation. | ||
+ | |||
+ | We have prepared a simple tutorial to learn how to modify ''start.nc'' and ''startfi.nc'' files (= the the initial conditions) for the Generic PCM: https://lmdz-forge.lmd.jussieu.fr/mediawiki/Planets/index.php/Modify_start_Files | ||
=== start2archive === | === start2archive === | ||
Line 24: | Line 26: | ||
=== other third party scripts and tools === | === other third party scripts and tools === | ||
− | TO | + | You can easily modify start.nc and startfi.nc netcdf files with the xarray Python library. Below you can find an easy example where we modify the surface temperature field. |
+ | |||
+ | <syntaxhighlight lang="python"> | ||
+ | from numpy import * | ||
+ | import numpy as np | ||
+ | import matplotlib.pyplot as mpl | ||
+ | import math | ||
+ | import xarray as xr | ||
+ | |||
+ | #FIRST WE GET THE DATA FROM THE GCM SIMULATION | ||
+ | nc = xr.open_dataset('startfi.nc',decode_times=False) # can be any netcdf file (e.g. start/startfi.nc files) | ||
+ | |||
+ | # BELOW PHYSICAL VARIABLES | ||
+ | physical_points=nc['physical_points'] | ||
+ | lat=nc['latitude'] | ||
+ | lon=nc['longitude'] | ||
+ | aire_GCM=nc['area'] | ||
+ | |||
+ | # BELOW THE VARIABLE WE WANT TO UPDATE | ||
+ | tsurf=nc['tsurf'] | ||
+ | new_tsurf = np.empty(len(physical_points)) | ||
+ | |||
+ | # LOOP TO MODIFY THE VARIABLE | ||
+ | for i in range(0,len(physical_points),1): | ||
+ | new_tsurf[i]=300. # here you put whatever you want ; in this exemple, we assume an isothermal temperature distribution | ||
+ | |||
+ | nc['tsurf'].values = new_tsurf | ||
+ | nc.to_netcdf('restartfi.nc') | ||
+ | |||
+ | # SANITY CHECK PLOTS | ||
+ | |||
+ | fig = mpl.figure(1) | ||
+ | mpl.plot(physical_points,tsurf) | ||
+ | mpl.plot(physical_points,new_tsurf) | ||
+ | mpl.xlabel('GCM Physical Points') | ||
+ | mpl.ylabel('Tsurf (K)') | ||
+ | mpl.show() | ||
+ | </syntaxhighlight> | ||
== Post-processing tools == | == Post-processing tools == | ||
Line 420: | Line 459: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
+ | |||
+ | One cool thing about xarray is that it is well optimized, and can do whatever you want to do on your data, but better than you. See for instance, the example below to plot a temperature lon-lat map. Xarray handles it in 5 lines of code, where you would need a lot more to set-up you plot in traditional matplotlib. And the results look almost good enough for a paper plot. | ||
+ | |||
+ | <syntaxhighlight lang="python" line> | ||
+ | import xarray as xr | ||
+ | import matplotlib.pyplot as plt | ||
+ | ##Load your data and print it | ||
+ | file = '/home/lteinturier/Documents/PhD/wasp43b/chemistry_project/input_5xsolar.nc' | ||
+ | data = xr.open_dataset(file,decode_times=False) | ||
+ | print(data) | ||
+ | ##extracting the altitude level #20 for the whole file | ||
+ | data = data.isel(altitude=20) | ||
+ | ##let's assume that data hold a time-series of the temperature. Let's average it in time | ||
+ | temp = data['temp'].mean("Time",keep_attrs=True) #we keep the attribute when averaging to conserve the DataArray structure | ||
+ | ##now we plot | ||
+ | fig = temp.plot.contourf(cmap='gnuplot',levels=50) #choose the colormap and the number of contourf levels | ||
+ | fig.ax.set_title("P = {:.2e} mbar".format(data.p.mean().values)) ##set-up your title. If you don't change it, the title will be the altitude in km of your atmospheric level | ||
+ | plt.show() | ||
+ | |||
+ | </syntaxhighlight> | ||
+ | This is only a fraction of what Xarray can do. Check the [https://docs.xarray.dev/en/stable/user-guide/index.html documentation] for more. | ||
==== Python tutorials to make pretty visuals ==== | ==== Python tutorials to make pretty visuals ==== |
Latest revision as of 11:52, 7 December 2023
Contents
Pre-processing Tools
newstart: a fortran program to modify start files
Newstart is an interactive tool to modify the start files (start.nc and startfi.nc).
To be usable, newstart should be compile in the LMDZ.COMMON directory by using the following command line:
./makelmdz_fcm -arch my_arch_file -p std -d 64x48x30 newstart
In the example, my_arch_file is the name the arch files (see arch ) and 64x48x30 is the resolution of the physical grid. Then copy the executable from the LMDZ.COMMON/bin directory to your bench directory.
When you execute newstart, you can use both a start2archive file or the start files (start.nc and startfi.nc). Then the interactive interface will propose to modify several physical quantities such as the gravity, the surface pressure or the rotation of the planet. At the end of the procedure, two files are created: restart.nc and restartfi.nc. They can be renamed and used as start files to initialize a new simulation.
We have prepared a simple tutorial to learn how to modify start.nc and startfi.nc files (= the the initial conditions) for the Generic PCM: https://lmdz-forge.lmd.jussieu.fr/mediawiki/Planets/index.php/Modify_start_Files
start2archive
The start2archive tool is similar to newstart in the sense that it can be used to modify the start files. But start2archive can modify the resolution of the physical grid, the topography and the surface thermal inertia while newstart cannot. It is also useful to create an archive of different starting states, then extractable as start files. The command line to compile start2archive is similar to the one used for newstart:
./makelmdz_fcm -arch my_arch_file -p std -d 64x48x30 start2archive
To modify the resolution, you should first create a start_archive (by using start2archive) file at the used resolution, then compile a newstart file at the new resolution. Newstart will interpolate all the physical quantities on the new grid.
other third party scripts and tools
You can easily modify start.nc and startfi.nc netcdf files with the xarray Python library. Below you can find an easy example where we modify the surface temperature field.
from numpy import *
import numpy as np
import matplotlib.pyplot as mpl
import math
import xarray as xr
#FIRST WE GET THE DATA FROM THE GCM SIMULATION
nc = xr.open_dataset('startfi.nc',decode_times=False) # can be any netcdf file (e.g. start/startfi.nc files)
# BELOW PHYSICAL VARIABLES
physical_points=nc['physical_points']
lat=nc['latitude']
lon=nc['longitude']
aire_GCM=nc['area']
# BELOW THE VARIABLE WE WANT TO UPDATE
tsurf=nc['tsurf']
new_tsurf = np.empty(len(physical_points))
# LOOP TO MODIFY THE VARIABLE
for i in range(0,len(physical_points),1):
new_tsurf[i]=300. # here you put whatever you want ; in this exemple, we assume an isothermal temperature distribution
nc['tsurf'].values = new_tsurf
nc.to_netcdf('restartfi.nc')
# SANITY CHECK PLOTS
fig = mpl.figure(1)
mpl.plot(physical_points,tsurf)
mpl.plot(physical_points,new_tsurf)
mpl.xlabel('GCM Physical Points')
mpl.ylabel('Tsurf (K)')
mpl.show()
Post-processing tools
zrecast
With this program you can recast atmospheric (i.e.: 4D-dimentional longitude-latitude-altitude-time) data from GCM outputs (e.g. as given in diagfi.nc files) onto either pressure or altitude above areoid vertical coordinates. Since integrating the hydrostatic equation is required to recast the data, the input file must contain surface pressure and atmospheric temperature, as well as the ground geopotential. If recasting data onto pressure coordinates, then the output file name is given by the input file name to which _P.nc will be appened. If recasting data onto altitude above areoid coordinates, then a _A.nc will be appened.
mass stream function
The mass stream function (and the total angular momentum) can be computed from a diagfi.nc or a stats.nc, using the streamfunction.F90 script. The script is located at
trunk/LMDZ.GENERIC/utilities
To compile the script, open the compile file in the same directory and do the following:
- Replace "pgf90" with your favorite fortran compiler
- replace "/distrib/local/netcdf/pgi_7.1-6_32/lib" with the lib address and directory that contains your NetCDF library (file libnetcdf.a).
- Replace "/distrib/local/netcdf/pgi_7.1-6_32/include" with the address of the directory that contains the NetCDF include file (netcdf.inc).
- You can mess with the compiling options but it is not mandatory.
Once the script is compiled, copy it in the same directory as your .nc file and run
./streamfunction.e
The script will ask you for the name of your .nc file, and will run and produce a new nameofyourfile_stream.nc file.
Be careful : In this new file, all fields are temporally and zonally averaged.
If you want to use python instead of fortran, you can take a look at this repo. It hosts a tool to perform dynamical analysis of GCM simulations (and therefore, it computes the mass stream function and a lot of other stuff), but it is tailored for Dynamico only. This repo also takes care of recasting (it does the job of both zrecast.F90 and streamfunction.F90)
Continuing Simulations
manually
At the end of a simulation, the model generates restart files (files 'restart.nc' and 'restartfi.nc') which contain the final state of the model. The 'restart.nc' and 'restartfi.nc' files have the same format as the 'start.nc' and 'startfi.nc' files, respectively.
These files can in fact be used as initial states to continue the simulation, using the following renaming command lines:
mv restart.nc start.nc
mv restartfi.nc startfi.nc
Running a simulation with these start files will in fact resume the simulation from where the previous run ended.
with bash scripts
We have set up very simple bash scripts to automatize the launching of chain simulations. Here is an example of bash script that does the job:
#!/bin/bash
###########################################################################
# Script to perform several chained LMD Mars GCM simulations
# SET HERE the maximum total number of simulations
nummax=100
###########################################################################
echo "---------------------------------------------------------"
echo "STARTING LOOP RUN"
echo "---------------------------------------------------------"
dir=`pwd`
machine=`hostname`
address=`whoami`
# Look for file "num_run" which should contain
# the value of the previously computed season
# (defaults to 0 if file "num_run" does not exist)
if [[ -r num_run ]] ; then
echo "found file num_run"
numold=`cat num_run`
else
numold=0
fi
echo "numold is set to" ${numold}
# Set value of current season
(( numnew = ${numold} + 1 ))
echo "numnew is set to" ${numnew}
# Look for initialization data files (exit if none found)
if [[ ( -r start${numold}.nc && -r startfi${numold}.nc ) ]] ; then
\cp -f start${numold}.nc start.nc
\cp -f startfi${numold}.nc startfi.nc
else
if (( ${numold} == 99999 )) ; then
echo "No run because previous run crashed ! (99999 in num_run)"
exit
else
echo "Where is file start"${numold}".nc??"
exit
fi
fi
# Run GCM -- THIS LINE NEEDS TO BE MODIFIED WITH THE CORRECT GCM EXECUTION COMMAND
mpirun -np 8 gcm_64x48x26_phystd_para.e < diagfi.def > lrun${numnew}
# Check if run ended normaly and copy datafiles
if [[ ( -r restartfi.nc && -r restart.nc ) ]] ; then
echo "Run seems to have ended normaly"
\mv -f restart.nc start${numnew}.nc
\mv -f restartfi.nc startfi${numnew}.nc
else
if [[ -r num_run ]] ; then
\mv -f num_run num_run.crash
else
echo "No file num_run to build num_run.crash from !!"
# Impose a default value of 0 for num_run
echo 0 > num_run.crash
fi
echo 99999 > num_run
exit
fi
# Copy other datafiles that may have been generated
if [[ -r diagfi.nc ]] ; then
\mv -f diagfi.nc diagfi${numnew}.nc
fi
if [[ -r diagsoil.nc ]] ; then
\mv -f diagsoil.nc diagsoil${numnew}.nc
fi
if [[ -r stats.nc ]] ; then
\mv -f stats.nc stats${numnew}.nc
fi
if [[ -f profiles.dat ]] ; then
\mv -f profiles.dat profiles${numnew}.dat
\mv -f profiles.hdr profiles${numnew}.hdr
fi
# Prepare things for upcoming runs by writing
# value of computed season in file num_run
echo ${numnew} > num_run
# If we are over nummax : stop
if (( $numnew + 1 > $nummax )) ; then
exit
# If not : restart the loop (copy the executable and run the copy)
else
\cp -f run_gnome exe_mars
./exe_mars
fi
Summary of what this bash script does:
- It reads the file 'num_run' which contains the step of the simulation.
If num_run is
5
then the script expects to read start5.nc and startfi5.nc.
- It modifies start5.nc and startfi5.nc into start.nc and startfi.nc, respectively.
- It runs the GCM.
- It modifies restart.nc and restartfi.nc into start6.nc and startfi6.nc
- It rewrite num_run as follows:
6
- It restarts the loop until num_run reaches the value (defined in nummax):
100
Processing Output Files with NCOs
NCOs (NetCdf OperatorS) are a set of powerful command-line utilities – available on Linux, Mac and PC – that allow to perform useful (and very fast!) post-processing operations on netCDF GCM output files. Full documentation can be found on http://research.jisao.washington.edu/data_sets/nco/, but we provide below a few examples of command lines.
- How to calculate a time mean of a netCDF 'diagfi.nc' file
ncra -F -d Time,1,,1 diagfi.nc diagfi_MEAN.nc # format is "-d dimension,minimum,maximum,stride"
- Subsetting time in a netCDF 'diagfi.nc' file
ncea -F -d Time,first,last diagfi.nc diagfi_subset.nc # format is "-d dimension,minimum,maximum" ; we recall you can type "ncdump -v time diagfi.nc" to see the Time values in the netCDF file.
- Decimating a netCDF 'diagfi.nc' file in time
ncks -F -d Time,1,,8 diagfi.nc diagfi_decimated.nc # format is "-d dimension,minimum,maximum,stride" ; In this example, this means that data is extracted 1 time every 8 time steps, starting from the first time step (number 1), ending at the last time step).
- Extract a variable from a netCDF 'diagfi.nc' file
ncks -v tsurf,temp,p diagfi.nc diagfi_out.nc # Here we created a new file named 'diagfi_out.nc' in which we only kept variables named 'tsurf' (surface temperatures), 'temp' (atmospheric temperatures) and p (atmospheric pressures).
Again, more examples can be found on http://research.jisao.washington.edu/data_sets/nco/ .
Data Handling and Visualization Software
There are several data handling and visualization tools that can be used to analyse and plot the results from GCM simulations (using the diagfi.nc NetCDF files). We provide below a panorama of most widely used solutions.
panoply
Panoply is a user-friendly tool for viewing raw NetCDF data, available here: https://www.giss.nasa.gov/tools/panoply/ . It is very convenient to make pretty visuals (see an example for the exoplanet TRAPPIST-1e). There are many options that can be used (map projections, masks, colorbars, shadows, etc.) to make your plots fancy. However, the tool is not very well suited for manipulating data (compute averages, statistics, etc.).
- Installation on Linux:
You simply need to download and untar the Package from the Panoply website. Note that to work it requires that Java and related Java Runtime environment (JRE) be installed on your system (otherwise it will simply look as if "nothing is happening" when you try to launch Panoply via the "panoply.sh" script), which on Ubuntu simply requires something like:
sudo apt install java
sudo apt install default-jre
- Run on Linux (assuming the panoply.sh script is in a directory included in your PATH environment variable):
panoply.sh
ncview
ncview is another useful user-friendly tool for viewing raw NetCDF data. This is kind of a very archaic version of panoply, but it is convenient because it allows to have a very quick first look at netCDF data files.
Command line tool to visualize NetCDF data:
- Installation on Linux-Ubuntu:
sudo apt install ncview
- Run on Linux:
ncview diagfi.nc
python scripts
Python scripts provide a very useful mean to analyse and visualize netCDF files.
NETCDF4 python library (old school)
You can use the netCDF4 python library to open a netCDF file and put data in tables that can then be manipulated and plotted.
Here is an exemple of how to open and read a netCDF file with Python:
1 import numpy
2 from netCDF4 import Dataset
3
4 # HERE WE OPEN THE NETCDF FILE
5 nc = Dataset('diagfi.nc')
6
7 # HERE WE READ THE VARIABLES (1D OUTPUT)
8 Time=nc.variables['Time'][:]
9 lat=nc.variables['latitude'][:]
10 lon=nc.variables['longitude'][:]
11 al=nc.variables['altitude'][:]
12
13 # HERE WE READ THE AREA (2D OUTPUT)
14 aire_GCM=nc.variables['aire'][:][:]
15
16 # HERE WE READ 3D OUTPUTS
17 tsurf=nc.variables['tsurf'][:][:][:] # this is the surface temperature 3D field (time, latitude, longitude, altitude)
18
19 # HERE WE READ 4D OUTPUTS
20 temp=nc.variables['temp'][:][:][:][:] # this is the atmospheric temperature 4D field (time, latitude, longitude, altitude)
And here is an exemple of how to manipulate the netCDF data (here to compute the time averaged surface temperatures):
1 from numpy import *
2 import numpy as np
3
4 mean_tsurf=np.zeros((len(lat),len(lon)),dtype='f')
5
6 for i in range(0,len(Time)):
7 for j in range(0,len(lat)):
8 for k in range(0,len(lon)):
9 mean_tsurf[j,k]=mean_tsurf[j,k]+tsurf[i,j,k]*(1./len(Time))
And here is a last exemple of how to plot the data (using matplotlib):
1 import matplotlib.pyplot as plt
2
3 plt.figure(1)
4 plt.contourf(lon_GCM,lat_GCM,mean_tsurf)
5 plt.colorbar(label='Surface Temperature (K)')
6 plt.xlabel('Longitude ($^{\circ}$)')
7 plt.ylabel('Latitude ($^{\circ}$)')
8 plt.show()
XARRAY python library (more modern)
Another useful library to deal with netcdf files is xarray. We provide a code snippet below, doing the same thing as the snippets above.
1 import numpy as np
2 import xarray as xr
3 import matplotlib.pyplot as plt
4
5 # HERE WE OPEN THE NETCDF FILE
6 data = xr.open_dataset('diagfi.nc',
7 decode_times=False)
8
9 # HERE WE READ THE VARIABLES (1D OUTPUT)
10 Time=data['Time']
11 lat=data['latitude']
12 lon=data['longitude']
13 al=data['altitude']
14
15 # HERE WE READ THE AREA (2D OUTPUT)
16 aire_GCM=data['aire']
17
18 # HERE WE READ 3D OUTPUTS
19 tsurf=data['tsurf'] # this is the surface temperature 3D field (time, latitude, longitude, altitude)
20
21 # HERE WE READ 4D OUTPUTS
22 temp=data['temp'] # this is the atmospheric temperature 4D field (time, latitude, longitude, altitude)
23
24 ## let's take the time-averaged surface temperature
25 mean_tsurf = np.mean(tsurf,axis=0)
26
27 ##Let's plot a lon-lat map
28 fig = plt.figure()
29 plt.contourf(lon,lat,mean_tsurf)
30 plt.colorbar(label='Surface Temperature (K)')
31 plt.xlabel('Longitude ($^{\circ}$)')
32 plt.ylabel('Latitude ($^{\circ}$)')
33 plt.show()
Don't hesitate to use the function called .values to transform any xarray into a numpy array, especially in case of calculation time problems. For more examples on how to use xarray, take a look at the documentation. Here is another example of how one can use xarray with multiples netcdfiles.
1 import xarray as xr
2 import os
3
4 # your folder where output files are stored
5 FOLDER = './your_folder_with_output_files/'
6
7 # take back the files from your FOLDER
8 list_files_folder=os.listdir(FOLDER)
9
10 # If there are several files.
11 # Sort your simulation files by date,
12 # so beginning of simulation will be top of the list
13 # and end of simulation will be end of the list.
14 list_files_folder.sort()
15
16 files = [FOLDER+str(f) for f in list_files_folder]
17 # if you want to keep only files of special_year you can add this option :
18 # files = [FOLDER+str(f) for f in list_files_folder if f.startswith("special_year")]
19
20 # xarray will magically concatenate your outfiles by 'Time' (or any other 'concat_dime' you want)
21 nc=xr.open_mfdataset(files,decode_times=False, concat_dim='Time', combine='nested')
22
23 # to check your keys
24 for key in nc.keys():
25 print(key)
26
27 # to load keys (example here with keys for a mesoscale simulation)
28 Times = nc['Times'][:]
29 PTOT = nc['PTOT'][:]
30 T = nc['T'][:]
31 W = nc['W'][:]
32
33 # you can use some functions to make averages etc
34
35 T_moy = T.mean(dim=['Time','south_north','west_east'])
36
37 # other functions
38 # .cumsum (cumulative sum)
39 # .rename (change the name of the object)
One cool thing about xarray is that it is well optimized, and can do whatever you want to do on your data, but better than you. See for instance, the example below to plot a temperature lon-lat map. Xarray handles it in 5 lines of code, where you would need a lot more to set-up you plot in traditional matplotlib. And the results look almost good enough for a paper plot.
1 import xarray as xr
2 import matplotlib.pyplot as plt
3 ##Load your data and print it
4 file = '/home/lteinturier/Documents/PhD/wasp43b/chemistry_project/input_5xsolar.nc'
5 data = xr.open_dataset(file,decode_times=False)
6 print(data)
7 ##extracting the altitude level #20 for the whole file
8 data = data.isel(altitude=20)
9 ##let's assume that data hold a time-series of the temperature. Let's average it in time
10 temp = data['temp'].mean("Time",keep_attrs=True) #we keep the attribute when averaging to conserve the DataArray structure
11 ##now we plot
12 fig = temp.plot.contourf(cmap='gnuplot',levels=50) #choose the colormap and the number of contourf levels
13 fig.ax.set_title("P = {:.2e} mbar".format(data.p.mean().values)) ##set-up your title. If you don't change it, the title will be the altitude in km of your atmospheric level
14 plt.show()
This is only a fraction of what Xarray can do. Check the documentation for more.
Python tutorials to make pretty visuals
We provide a tutorial on how to make pretty visuals using Generic PCM 3-D simulations here.
Planetoplot
Planetoplot is a in-house, python based library developped to vizualize Generic PCM data.
The code and documentation can be found at: https://nbviewer.org/github/aymeric-spiga/planetoplot/blob/master/tutorial/planetoplot_tutorial.ipynb