Profiling Method

From Planets
Jump to: navigation, search

If you want to do some profiling of the GCM (like knowing the time passed in each subroutine when running a simulation), here is a simple methodology to follow:

   1) add the option -pg in the lines %BASE_FFLAGS and %BASE_LD of your arch.fcm file
   2) compile the GCM
   3) run your simulation ; after it is completed, you should have a binary file named gmon.out
   4) use the unix command gprof on your gcm executable, and write the output in a text file, for example:
1 gprof gcm.e > profiling.txt

gprof will read the gmon.out file, and output profiling information. For more information on how to use gprof, type man gprof

You now have in your text file a lot of information on time used by the subroutines of the GCM code and their proportion to the whole run's duration. Especially, you have first the "Flat profile", which sorts the subroutines by their own time consumption ("self"):

Flat profile: Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
10.32      0.16     0.16      360     0.00     0.00  aerave_
 7.74      0.28     0.12       60     0.00     0.00  lwxd_

...

   and second, the "Call graph", which displays the parents-children links between the subroutines, and sorts them by their "total" time (self+children).

Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 0.65% of 1.55 seconds index % time self children called name

               0.00    1.55       1/1           main [2]

[1] 100.0 0.00 1.55 1 MAIN__ [1]

               0.00    1.49       1/1           leapfrog_p_ [3]
               0.00    0.04       1/1           filtreg_mod_mp_inifilr_ [30]
               0.00    0.01       1/1           inidissip_ [75]
               0.00    0.01       1/1           iniphysiq_mod_mp_iniphysiq_ [77]
               0.00    0.00       1/1           conf_gcm_ [272]
               0.00    0.00       1/1           parallel_lmdz_mp_init_parallel_ [355]
               0.00    0.00       1/1           mod_const_mpi_mp_init_const_mpi_ [317]
               0.00    0.00       1/1           bands_mp_read_distrib_ [250]
               0.00    0.00       1/1           parallel_lmdz_mp_barrier_ [353]
               0.00    0.00       1/1           bands_mp_set_bands_ [251]
               0.00    0.00       1/1           bands_mp_writebands_ [252]
               0.00    0.00       1/1           mod_hallo_mp_init_mod_hallo_ [321]
               0.00    0.00       1/30          parallel_lmdz_mp_setdistrib_ [172]
               0.00    0.00       1/1           cpdet_mod_mp_ini_cpdet_ [275]
               0.00    0.00       1/1           infotrac_mp_infotrac_init_ [299]
               0.00    0.00       1/1           dynetat0_ [287]
               0.00    0.00       1/1           dynredem0_p_ [290]
               0.00    0.00       1/1           iniconst_ [301]
               0.00    0.00       1/1           inigeom_ [302]

...

       5) for nice profiling visualizations, like call trees, you can use the python script from ​https://github.com/jrfonseca/gprof2dot