Difference between revisions of "Tool Box"

From Planets
Jump to: navigation, search
(other third party scripts and tools)
(other third party scripts and tools)
Line 27: Line 27:
  
 
<syntaxhighlight lang="python">
 
<syntaxhighlight lang="python">
#!/usr/bin/env python3.0
 
#-*- coding:Utf-8 -*-
 
 
 
from    netCDF4              import    Dataset
 
from    netCDF4              import    Dataset
 
from    scipy.ndimage.filters                import gaussian_filter1d
 
from    scipy.ndimage.filters                import gaussian_filter1d

Revision as of 23:41, 27 October 2023

Pre-processing Tools

newstart: a fortran program to modify start files

Newstart is an interactive tool to modify the start files (start.nc and startfi.nc).

To be usable, newstart should be compile in the LMDZ.COMMON directory by using the following command line:

./makelmdz_fcm -arch my_arch_file -p std -d 64x48x30 newstart

In the example, my_arch_file is the name the arch files (see arch ) and 64x48x30 is the resolution of the physical grid. Then copy the executable from the LMDZ.COMMON/bin directory to your bench directory.

When you execute newstart, you can use both a start2archive file or the start files (start.nc and startfi.nc). Then the interactive interface will propose to modify several physical quantities such as the gravity, the surface pressure or the rotation of the planet. At the end of the procedure, two files are created: restart.nc and restartfi.nc. They can be renamed and used as start files to initialize a new simulation.

start2archive

The start2archive tool is similar to newstart in the sense that it can be used to modify the start files. But start2archive can modify the resolution of the physical grid, the topography and the surface thermal inertia while newstart cannot. It is also useful to create an archive of different starting states, then extractable as start files. The command line to compile start2archive is similar to the one used for newstart:

./makelmdz_fcm -arch my_arch_file -p std -d 64x48x30 start2archive

To modify the resolution, you should first create a start_archive (by using start2archive) file at the used resolution, then compile a newstart file at the new resolution. Newstart will interpolate all the physical quantities on the new grid.

other third party scripts and tools

You can easily modify netcdf files with the xarray Python library. Below you can find an easy example.

from    netCDF4               import    Dataset
from    scipy.ndimage.filters                import	gaussian_filter1d
from    scipy.ndimage.filters                import	convolve1d
from    scipy.ndimage.filters                import	uniform_filter1d
from    numpy import *
import  numpy                 as        np
import  matplotlib.pyplot     as        mpl
import  math
import xarray as xr
from math import log
from matplotlib.colors import LogNorm
from scipy.interpolate import griddata

#FIRST WE GET THE DATA FROM THE GCM SIMULATION
nc = xr.open_dataset('startfi.nc',decode_times=False) # can be any netcdf file (e.g. start/startfi.nc files)

# BELOW PHYSICAL VARIABLES
physical_points=nc['physical_points']
lat=nc['latitude']
lon=nc['longitude']
aire_GCM=nc['area']

# BELOW THE VARIABLE WE WANT TO UPDATE
tsurf=nc['tsurf']
new_tsurf = np.empty(len(physical_points))

# LOOP TO MODIFY THE VARIABLE
for i in range(0,len(physical_points),1):
    new_tsurf[i]=200.+100.*np.cos(lat[i]) # here you put whatever you want

# SANITY CHECK PLOTS

fig = mpl.figure()
mpl.plot(physical_points,tsurf)

nc['tsurf'].values = new_tsurf

nc.to_netcdf('restartfi.nc')

mpl.plot(physical_points,new_tsurf)
mpl.xlabel('GCM Physical Points')
mpl.ylabel('Tsurf (K)')
mpl.show()

Post-processing tools

zrecast

With this program you can recast atmospheric (i.e.: 4D-dimentional longitude-latitude-altitude-time) data from GCM outputs (e.g. as given in diagfi.nc files) onto either pressure or altitude above areoid vertical coordinates. Since integrating the hydrostatic equation is required to recast the data, the input file must contain surface pressure and atmospheric temperature, as well as the ground geopotential. If recasting data onto pressure coordinates, then the output file name is given by the input file name to which _P.nc will be appened. If recasting data onto altitude above areoid coordinates, then a _A.nc will be appened.

mass stream function

The mass stream function (and the total angular momentum) can be computed from a diagfi.nc or a stats.nc, using the streamfunction.F90 script. The script is located at

trunk/LMDZ.GENERIC/utilities

To compile the script, open the compile file in the same directory and do the following:

  • Replace "pgf90" with your favorite fortran compiler
  • replace "/distrib/local/netcdf/pgi_7.1-6_32/lib" with the lib address and directory that contains your NetCDF library (file libnetcdf.a).
  • Replace "/distrib/local/netcdf/pgi_7.1-6_32/include" with the address of the directory that contains the NetCDF include file (netcdf.inc).
  • You can mess with the compiling options but it is not mandatory.

Once the script is compiled, copy it in the same directory as your .nc file and run

./streamfunction.e

The script will ask you for the name of your .nc file, and will run and produce a new nameofyourfile_stream.nc file.

Be careful : In this new file, all fields are temporally and zonally averaged.

If you want to use python instead of fortran, you can take a look at this repo. It hosts a tool to perform dynamical analysis of GCM simulations (and therefore, it computes the mass stream function and a lot of other stuff), but it is tailored for Dynamico only. This repo also takes care of recasting (it does the job of both zrecast.F90 and streamfunction.F90)

Continuing Simulations

manually

At the end of a simulation, the model generates restart files (files 'restart.nc' and 'restartfi.nc') which contain the final state of the model. The 'restart.nc' and 'restartfi.nc' files have the same format as the 'start.nc' and 'startfi.nc' files, respectively.

These files can in fact be used as initial states to continue the simulation, using the following renaming command lines:

mv restart.nc start.nc
mv restartfi.nc startfi.nc

Running a simulation with these start files will in fact resume the simulation from where the previous run ended.

with bash scripts

We have set up very simple bash scripts to automatize the launching of chain simulations. Here is an example of bash script that does the job:

#!/bin/bash
###########################################################################
# Script to perform several chained LMD Mars GCM simulations
# SET HERE the maximum total number of simulations

nummax=100

###########################################################################


echo "---------------------------------------------------------"
echo "STARTING LOOP RUN"
echo "---------------------------------------------------------"

dir=`pwd`
machine=`hostname`
address=`whoami`

# Look for file "num_run" which should contain 
# the value of the previously computed season
# (defaults to 0 if file "num_run" does not exist)
if [[ -r num_run ]] ; then
  echo "found file num_run"
  numold=`cat num_run`
else
  numold=0
fi
echo "numold is set to" ${numold}


# Set value of current season 
(( numnew = ${numold} + 1 ))
echo "numnew is set to" ${numnew}

# Look for initialization data files (exit if none found)
if [[ ( -r start${numold}.nc  &&  -r startfi${numold}.nc ) ]] ; then
   \cp -f start${numold}.nc start.nc
   \cp -f startfi${numold}.nc startfi.nc
else
   if (( ${numold} == 99999 )) ; then
    echo "No run because previous run crashed ! (99999 in num_run)"
    exit
   else
   echo "Where is file start"${numold}".nc??"
   exit
   fi
fi

# Run GCM -- THIS LINE NEEDS TO BE MODIFIED WITH THE CORRECT GCM EXECUTION COMMAND
mpirun -np 8 gcm_64x48x26_phystd_para.e < diagfi.def > lrun${numnew}


# Check if run ended normaly and copy datafiles
if [[ ( -r restartfi.nc  &&  -r restart.nc ) ]] ; then
  echo "Run seems to have ended normaly"


  \mv -f restart.nc start${numnew}.nc
  \mv -f restartfi.nc startfi${numnew}.nc  
    
else
  if [[ -r num_run ]] ; then
    \mv -f num_run num_run.crash
  else
    echo "No file num_run to build num_run.crash from !!"
    # Impose a default value of 0 for num_run
    echo 0 > num_run.crash
  fi
 echo 99999 > num_run
 exit
fi

# Copy other datafiles that may have been generated
if [[ -r diagfi.nc ]] ; then
  \mv -f diagfi.nc diagfi${numnew}.nc
fi
if [[ -r diagsoil.nc ]] ; then
  \mv -f diagsoil.nc diagsoil${numnew}.nc
fi
if [[ -r stats.nc ]] ; then
  \mv -f stats.nc stats${numnew}.nc
fi
if [[ -f profiles.dat ]] ; then
  \mv -f profiles.dat profiles${numnew}.dat
  \mv -f profiles.hdr profiles${numnew}.hdr
fi

# Prepare things for upcoming runs by writing
# value of computed season in file num_run
echo ${numnew} > num_run

# If we are over nummax : stop
if (( $numnew + 1 > $nummax )) ; then
   exit
# If not : restart the loop (copy the executable and run the copy)
else
   \cp -f run_gnome exe_mars
   ./exe_mars
fi

Summary of what this bash script does:

  • It reads the file 'num_run' which contains the step of the simulation.

If num_run is

5

then the script expects to read start5.nc and startfi5.nc.

  • It modifies start5.nc and startfi5.nc into start.nc and startfi.nc, respectively.
  • It runs the GCM.
  • It modifies restart.nc and restartfi.nc into start6.nc and startfi6.nc
  • It rewrite num_run as follows:
6
  • It restarts the loop until num_run reaches the value (defined in nummax):
100


Processing Output Files with NCOs

NCOs (NetCdf OperatorS) are a set of powerful command-line utilities – available on Linux, Mac and PC – that allow to perform useful (and very fast!) post-processing operations on netCDF GCM output files. Full documentation can be found on http://research.jisao.washington.edu/data_sets/nco/, but we provide below a few examples of command lines.

  • How to calculate a time mean of a netCDF 'diagfi.nc' file
ncra -F -d Time,1,,1 diagfi.nc diagfi_MEAN.nc # format is "-d dimension,minimum,maximum,stride"
  • Subsetting time in a netCDF 'diagfi.nc' file
ncea -F -d Time,first,last diagfi.nc diagfi_subset.nc # format is "-d dimension,minimum,maximum" ; we recall you can type "ncdump -v time diagfi.nc" to see the Time values in the netCDF file.
  • Decimating a netCDF 'diagfi.nc' file in time
ncks -F -d Time,1,,8 diagfi.nc diagfi_decimated.nc # format is "-d dimension,minimum,maximum,stride" ; In this example, this means that data is extracted 1 time every 8 time steps, starting from the first time step (number 1), ending at the last time step).
  • Extract a variable from a netCDF 'diagfi.nc' file
ncks -v tsurf,temp,p diagfi.nc diagfi_out.nc # Here we created a new file named 'diagfi_out.nc' in which we only kept variables named 'tsurf' (surface temperatures), 'temp' (atmospheric temperatures) and p (atmospheric pressures).

Again, more examples can be found on http://research.jisao.washington.edu/data_sets/nco/ .

Data Handling and Visualization Software

There are several data handling and visualization tools that can be used to analyse and plot the results from GCM simulations (using the diagfi.nc NetCDF files). We provide below a panorama of most widely used solutions.

panoply

Panoply is a user-friendly tool for viewing raw NetCDF data, available here: https://www.giss.nasa.gov/tools/panoply/ . It is very convenient to make pretty visuals (see an example for the exoplanet TRAPPIST-1e). There are many options that can be used (map projections, masks, colorbars, shadows, etc.) to make your plots fancy. However, the tool is not very well suited for manipulating data (compute averages, statistics, etc.).

  • Installation on Linux:

You simply need to download and untar the Package from the Panoply website. Note that to work it requires that Java and related Java Runtime environment (JRE) be installed on your system (otherwise it will simply look as if "nothing is happening" when you try to launch Panoply via the "panoply.sh" script), which on Ubuntu simply requires something like:

sudo apt install java
sudo apt install default-jre
  • Run on Linux (assuming the panoply.sh script is in a directory included in your PATH environment variable):
panoply.sh
Screenshot of panoply showing here Generic PCM results for the exoplanet TRAPPIST-1e (surface temperatures)

ncview

ncview is another useful user-friendly tool for viewing raw NetCDF data. This is kind of a very archaic version of panoply, but it is convenient because it allows to have a very quick first look at netCDF data files.

Command line tool to visualize NetCDF data:

  • Installation on Linux-Ubuntu:
sudo apt install ncview
  • Run on Linux:
ncview diagfi.nc
Screenshot of ncview showing here Generic PCM results for the exoplanet Proxima b (OLR - Thermal Emission)

python scripts

Python scripts provide a very useful mean to analyse and visualize netCDF files.

NETCDF4 python library (old school)

You can use the netCDF4 python library to open a netCDF file and put data in tables that can then be manipulated and plotted.

Here is an exemple of how to open and read a netCDF file with Python:

 1 import numpy
 2 from netCDF4 import Dataset
 3 
 4 # HERE WE OPEN THE NETCDF FILE
 5 nc = Dataset('diagfi.nc')
 6 
 7 # HERE WE READ THE VARIABLES (1D OUTPUT)
 8 Time=nc.variables['Time'][:]
 9 lat=nc.variables['latitude'][:]
10 lon=nc.variables['longitude'][:]
11 al=nc.variables['altitude'][:]
12 
13 # HERE WE READ THE AREA (2D OUTPUT)
14 aire_GCM=nc.variables['aire'][:][:]
15 
16 # HERE WE READ 3D OUTPUTS
17 tsurf=nc.variables['tsurf'][:][:][:] # this is the surface temperature 3D field (time, latitude, longitude, altitude)
18 
19 # HERE WE READ 4D OUTPUTS
20 temp=nc.variables['temp'][:][:][:][:] # this is the atmospheric temperature 4D field (time, latitude, longitude, altitude)

And here is an exemple of how to manipulate the netCDF data (here to compute the time averaged surface temperatures):

1 from numpy import *
2 import numpy as np
3 
4 mean_tsurf=np.zeros((len(lat),len(lon)),dtype='f')
5 
6 for i in range(0,len(Time)):
7     for j in range(0,len(lat)):
8         for k in range(0,len(lon)):
9             mean_tsurf[j,k]=mean_tsurf[j,k]+tsurf[i,j,k]*(1./len(Time))

And here is a last exemple of how to plot the data (using matplotlib):

1 import matplotlib.pyplot as plt
2 
3 plt.figure(1)
4 plt.contourf(lon_GCM,lat_GCM,mean_tsurf)
5 plt.colorbar(label='Surface Temperature (K)')
6 plt.xlabel('Longitude ($^{\circ}$)')
7 plt.ylabel('Latitude ($^{\circ}$)')
8 plt.show()

XARRAY python library (more modern)

Another useful library to deal with netcdf files is xarray. We provide a code snippet below, doing the same thing as the snippets above.

 1 import numpy as np
 2 import xarray as xr 
 3 import matplotlib.pyplot as plt 
 4 
 5 # HERE WE OPEN THE NETCDF FILE
 6 data = xr.open_dataset('diagfi.nc',
 7                        decode_times=False)
 8 
 9 # HERE WE READ THE VARIABLES (1D OUTPUT)
10 Time=data['Time']
11 lat=data['latitude']
12 lon=data['longitude']
13 al=data['altitude']
14 
15 # HERE WE READ THE AREA (2D OUTPUT)
16 aire_GCM=data['aire']
17 
18 # HERE WE READ 3D OUTPUTS
19 tsurf=data['tsurf'] # this is the surface temperature 3D field (time, latitude, longitude, altitude)
20 
21 # HERE WE READ 4D OUTPUTS
22 temp=data['temp'] # this is the atmospheric temperature 4D field (time, latitude, longitude, altitude)
23 
24 ## let's take the time-averaged surface temperature
25 mean_tsurf = np.mean(tsurf,axis=0)
26 
27 ##Let's plot a lon-lat map
28 fig = plt.figure()
29 plt.contourf(lon,lat,mean_tsurf)
30 plt.colorbar(label='Surface Temperature (K)')
31 plt.xlabel('Longitude ($^{\circ}$)')
32 plt.ylabel('Latitude ($^{\circ}$)')
33 plt.show()

Don't hesitate to use the function called .values to transform any xarray into a numpy array, especially in case of calculation time problems. For more examples on how to use xarray, take a look at the documentation. Here is another example of how one can use xarray with multiples netcdfiles.

 1 import xarray as xr
 2 import os
 3 
 4 # your folder where output files are stored
 5 FOLDER = './your_folder_with_output_files/'
 6 
 7 # take back the files from your FOLDER
 8 list_files_folder=os.listdir(FOLDER)
 9 
10 # If there are several files.
11 # Sort your simulation files by date,
12 # so beginning of simulation will be top of the list
13 # and end of simulation will be end of the list.
14 list_files_folder.sort()
15 
16 files = [FOLDER+str(f) for f in list_files_folder]
17 # if you want to keep only files of special_year you can add this option :
18 # files = [FOLDER+str(f) for f in list_files_folder if f.startswith("special_year")]
19 
20 # xarray will magically concatenate your outfiles by 'Time' (or any other 'concat_dime' you want)
21 nc=xr.open_mfdataset(files,decode_times=False, concat_dim='Time', combine='nested')
22 
23 # to check your keys
24 for key in nc.keys():
25     print(key)
26 
27 # to load keys (example here with keys for a mesoscale simulation)
28 Times = nc['Times'][:]
29 PTOT = nc['PTOT'][:]
30 T = nc['T'][:]
31 W = nc['W'][:]
32 
33 # you can use some functions to make averages etc
34 
35 T_moy = T.mean(dim=['Time','south_north','west_east'])
36 
37 # other functions
38 # .cumsum (cumulative sum)
39 # .rename (change the name of the object)

Python tutorials to make pretty visuals

We provide a tutorial on how to make pretty visuals using Generic PCM 3-D simulations here.

Planetoplot

Planetoplot is a in-house, python based library developped to vizualize Generic PCM data.

The code and documentation can be found at: https://nbviewer.org/github/aymeric-spiga/planetoplot/blob/master/tutorial/planetoplot_tutorial.ipynb