Some pointers about running LMDZ in parallel

De LMDZPedia
Révision de 12 janvier 2023 à 11:22 par Emillour (discussion | contributions)

(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)
Aller à : navigation, rechercher

Some pointers and advice for those who want to run LMDZ in parallel on a Linux PC

This note is mainly oriented towards using LMDZ on a "personal" computer, i.e. as opposed to on a cluster or national center (surc as IDRIS or CCRT or CINES) where many tools and libraries are available.

The first step is of course to have the model installed on your Linux machine, which should be relatively easy using the install_lmdz.sh script launched with appropriate option, e.g.

./install_lmdz.sh -parallel mpi_omp

If unfortunately the script fails, you might need to make some adjustments and generate your own NetCDF or IOIPSL libraries and/or your own arch files. What follows assumes that necessary libraries have been correctly built and are available. The focus here is on generating appropriate arch.fcm files and then compiling and running the GCM.

Compiling and running in MPI only

A prerequisite is (obviously) to have an MPI library at hand (e.g. MPICH, OpenMPI, etc.). Also having a BLAS library at hand, although not mandatory, is recommended. If the install_lmdz.sh script ran fine (with the -parallel mpi_omp option, then this is clearly the case.

Assuming that mpif90 points to the MPI wrapper (to gfortran) , that the MPI library is installed in /my/mpi/directory/lib, that related files to include (mpi.h) are in directory /my/mpi/directory/include, that the BLAS library is installed in /my/blas/directory/lib, then an example of an adequate arch-local.fcm file would be:

%COMPILER            /my/mpi/directory/bin/mpif90
%LINK                /my/mpi/directory/bin/mpif90
%AR                  ar
%MAKE                make
%FPP_FLAGS           -P -traditional
%FPP_DEF             NC_DOUBLE BLAS SGEMV=DGEMV SGEMM=DGEMM
%BASE_FFLAGS         -cpp -ffree-line-length-0 -fdefault-real-8 
%PROD_FFLAGS         -O3 -funroll-loops
%DEV_FFLAGS          -g -O1 -Wall
%DEBUG_FFLAGS        -g3 -Wall -fbounds-check -ffpe-trap=invalid,zero,overflow -O0 -fstack-protector-all -fbacktrace -finit-real=snan
%MPI_FFLAGS          -fcray-pointer -I/my/mpi/directory/include
%OMP_FFLAGS          
%BASE_LD             -L/my/blas/directory/lib -lblas
%MPI_LD              -L/my/mpi/directory/lib -lmpi
%OMP_LD              

And compiling the GCM would be done via

makelmdz_fcm -arch local -parallel mpi ......

To then run LMDZ on N=4 processors would require running the command:

/my/mpi/directory/bin/mpirun -np 4 gcm_32x32x39_phylmd_para_mem.e

Note that prior to running you might need to update your LD_LIBRARY_PATH environment variable to include the path to your MPI/NetCDF/BLAS libraries, especially if these are locate in non-standard paths. Note also that if you are using the IOIPSL library, then output files will be split into as many files as processors were used and should be recombined into a single file using the IOIPSL rebuild utility.

Compiling and running in OpenMP only

Assuming that the BLAS library is installed in /my/blas/directory/lib, then an example of an adequate arch-local.fcm file would be:

%COMPILER            gfortran
%LINK                gfortran
%AR                  ar
%MAKE                make
%FPP_FLAGS           -P -traditional
%FPP_DEF             NC_DOUBLE BLAS SGEMV=DGEMV SGEMM=DGEMM
%BASE_FFLAGS         -cpp -ffree-line-length-0 -fdefault-real-8 
%PROD_FFLAGS         -O3 -funroll-loops
%DEV_FFLAGS          -g -O1 -Wall
%DEBUG_FFLAGS        -g3 -Wall -fbounds-check -ffpe-trap=invalid,zero,overflow -O0 -fstack-protector-all -fbacktrace -finit-real=snan
%MPI_FFLAGS          -fcray-pointer -I/my/mpi/directory/include
%OMP_FFLAGS          -fopenmp -fcray-pointer
%BASE_LD             -L/my/blas/directory/lib -lblas
%MPI_LD              
%OMP_LD              -fopenmp

And compiling the GCM would be done via

makelmdz_fcm -arch local -parallel omp ......

To then run LMDZ using X=4 tasks would require running the commands:

export OMP_NUM_THREADS=4
export OMP_STACKSIZE=200M
./gcm_32x32x39_phylmd_para_mem.e

Note that to avoid memory issues (which can easily arrive in OpenMP as separate tasks need some private memory to store their variables, which is defined by the OMP_STACKSIZE=... environment variable) it is strongly recommended to have as much stack available as possible. Which in practice means setting:

ulimit -s unlimited

prior to running (and in practice this can be put in your .bashrc or .bash_profile file to be always set this way).

Compiling and running in mixed MPI/OpenMP

All that is mentionned above about MPI and OpenMP applies. An example of an adequate arch-local.fcm file would be:

%COMPILER            /my/mpi/directory/bin/mpif90
%LINK                /my/mpi/directory/bin/mpif90
%AR                  ar
%MAKE                make
%FPP_FLAGS           -P -traditional
%FPP_DEF             NC_DOUBLE BLAS SGEMV=DGEMV SGEMM=DGEMM
%BASE_FFLAGS         -cpp -ffree-line-length-0 -fdefault-real-8 
%PROD_FFLAGS         -O3 -funroll-loops
%DEV_FFLAGS          -g -O1 -Wall
%DEBUG_FFLAGS        -g3 -Wall -fbounds-check -ffpe-trap=invalid,zero,overflow -O0 -fstack-protector-all -fbacktrace -finit-real=snan
%MPI_FFLAGS          -fcray-pointer -I/my/mpi/directory/include
%OMP_FFLAGS          -fopenmp -fcray-pointer
%BASE_LD             -L/my/blas/directory/lib -lblas
%MPI_LD              -L/my/mpi/directory/lib -lmpi
%OMP_LD              -fopenmp

And compiling the GCM would be done via

makelmdz_fcm -arch local -parallel mpi_omp ......

To then run LMDZ using N=4 MPI processes with X=2 tasks each (i.e.using N*X=8 cores overall) would require running the commands:

export OMP_NUM_THREADS=2
export OMP_STACKSIZE=200M
/my/mpi/directory/bin/mpirun -np 4 gcm_32x32x39_phylmd_para_mem.e

12/01/2023