HowTo: Profiling LMDZ

De LMDZPedia
Aller à : navigation, rechercher

Profiling with gprof

The legacy way to profile the model is using gprof.

  • Pros: available everywhere, easy to use
  • Cons: old, bad handling of multithreaded applications, requires instrumentation

gprof is a rudimentary but simple to use tool to profile a sequential executable.

Note: a new tool gprofng is currently being developed with several notable improvements including multithreading support. As of 06/24, it doesn't support Fortran yet.

Instrumenting the code

In the .fcm arch file used, add -pg to BASE_FFLAGS and BASE_LD. Recompile the model.

Collecting statistics

Run the executable gcm.e. This will generate a gmon.out file locally.

Reading results

Run gprof gcm.e gmon.out > profiling.txt to get a textual view of the profiling.

For a graphical representation, we recommend using gprof2dot, via gprof gcm.e | gprof2dot | dot -Tpng -o output.png. A typical example is shown below.

Gprof.png


Profiling with scorep & scalasca

Scalasca (home, manual) is a profiling suite developed by Technische Universität Darmstadt Laboratory for Parallel Programming. It is much more capable than gprof, but also requires a heavier setup.

  • Pros: much better overall: excellent multithreaded support with separation between USR/OMP/MPI, filtering, trace profiling, great visualisation, highly customizable
  • Cons: requires a trickier installation, more complex

Installing scalasca

Some supercomputers have scalasca installed already, but for completeness here's a short guide to installing it locally (written 06/24, for v2.6.1):

All dependencies can be downloaded from here.

Compiling CubeBundle / CubeLib

Note: If you're on your own computer, we recommend installing CubeBundle, which includes CubeGUI. This requires Qt5 and OpenGL, which are a pain to install on supercomputers without sudo. For supercomputer local install, we recommand installing CubeW+CubeLib instead, and visualizing the results on your own computer.

No special instruction here - simply get the sources on the link above, extract them locally, and run ./configure --prefix=... && make -j 8 && make install.

Compiling scorep

Scalasca relies on scorep to instrument your code for profiling.

On clusters like Adastra, you need to make sure that clang in your path points to the system install: LD_LIBRARY_PATH="/usr/lib64:$LD_LIBRARY_PATH" ./configure --prefix=... && LD_LIBRARY_PATH="/usr/lib64:$LD_LIBRARY_PATH" make -j 8 && make install.

Note: make sure to load the right environment before compiling ! scorep can only instrument compilers it has detected when configured.

Compiling scalasca

Usual process: ./configure --prefix=... && make -j 8 && make install.

Note: Scalasca doesn't seem to like gcc-13 very much, but it doesn't need to be compiled with the same compiler as scorep.

Note: On Adastra, you need module load craype-x86-trento and we recommend module load gcc/12.2.0.

Note: To perform traces, Scalasca requires executables in $install_dir/bin/backend/ to be in the path. You can safely simply copy them to $install_dir/bin/ and add that to the path instead.

Using scalasca

Note: in doubt, read the manual, it's short and to the point.

Instrumenting the code

In the .fcm arch file used, replace the compiler (e.g. mpif90,mpicc) by scorep $compiler (e.g. scorep mpif90). Recompile the model.

Note: make sure to replace both the compiler and the linker. Do not replace the preprocessors.

Collecting statistics

Run scalasca -analyze $my_cmd, e.g. OMP_NUM_THREADS=2 scalasca -analyze mpirun -n 4 gcm.e. This will generate a scorep-gcm_4x2_sum folder locally.

Note: if the folder already exists, the analysis will halt.

Reading results

Note: if you are running on a cluster with CubeLib+CubeW as explained above, copy the generated folder scorep-* to the machine where you installed CubeBundle for the visualization.


Run scalasca -examine scorep-*. The first time ran it will first compute some statistics, then it will open the results in CubeGUI.

Scalsca-cubeGUI.png