Difference between revisions of "Profiling Method"

From Planets
Jump to: navigation, search
Line 15: Line 15:
 
You now have in your text file a lot of information on time used by the subroutines of the GCM code and their proportion to the whole run's duration. Especially, you have first the "Flat profile", which sorts the subroutines by their own time consumption ("self"):
 
You now have in your text file a lot of information on time used by the subroutines of the GCM code and their proportion to the whole run's duration. Especially, you have first the "Flat profile", which sorts the subroutines by their own time consumption ("self"):
  
<syntaxhighlight>
+
<pre>
 
Flat profile:
 
Flat profile:
 
Each sample counts as 0.01 seconds.
 
Each sample counts as 0.01 seconds.
Line 23: Line 23:
 
   7.74      0.28    0.12      60    0.00    0.00  lwxd_
 
   7.74      0.28    0.12      60    0.00    0.00  lwxd_
 
...
 
...
</syntaxhighlight>
+
</pre>
  
 
and second, the "Call graph", which displays the parents-children links between the subroutines, and sorts them by their "total" time (self+children).
 
and second, the "Call graph", which displays the parents-children links between the subroutines, and sorts them by their "total" time (self+children).
  
<syntaxhighlight>
+
<pre>
 
    Call graph (explanation follows)
 
    Call graph (explanation follows)
 
granularity: each sample hit covers 2 byte(s) for 0.65% of 1.55 seconds
 
granularity: each sample hit covers 2 byte(s) for 0.65% of 1.55 seconds
Line 54: Line 54:
 
-----------------------------------------------
 
-----------------------------------------------
 
...
 
...
</syntaxhighlight>
+
</pre>
  
 
Finally, for nice profiling visualizations like call trees, you can use the python script from ​https://github.com/jrfonseca/gprof2dot
 
Finally, for nice profiling visualizations like call trees, you can use the python script from ​https://github.com/jrfonseca/gprof2dot

Revision as of 13:48, 25 July 2022

If you want to do some profiling of the GCM (like knowing the time passed in each subroutine when running a simulation), here is a simple methodology to follow:

  1. add the option -pg in the lines %BASE_FFLAGS and %BASE_LD of your arch.fcm file
  2. compile the GCM
  3. run your simulation ; after it is completed, you should have a binary file named gmon.out
  4. use the unix command gprof on your gcm executable, and write the output in a text file, for example:
    gprof gcm.e > profiling.txt
    

    gprof will read the gmon.out file, and output profiling information. For more information on how to use gprof, type man gprof

    You now have in your text file a lot of information on time used by the subroutines of the GCM code and their proportion to the whole run's duration. Especially, you have first the "Flat profile", which sorts the subroutines by their own time consumption ("self"):

    Flat profile:
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls   s/call   s/call  name    
     10.32      0.16     0.16      360     0.00     0.00  aerave_
      7.74      0.28     0.12       60     0.00     0.00  lwxd_
    ...
    

    and second, the "Call graph", which displays the parents-children links between the subroutines, and sorts them by their "total" time (self+children).

    		     Call graph (explanation follows)
    granularity: each sample hit covers 2 byte(s) for 0.65% of 1.55 seconds
    index % time    self  children    called     name
                    0.00    1.55       1/1           main [2]
    [1]    100.0    0.00    1.55       1         MAIN__ [1]
                    0.00    1.49       1/1           leapfrog_p_ [3]
                    0.00    0.04       1/1           filtreg_mod_mp_inifilr_ [30]
                    0.00    0.01       1/1           inidissip_ [75]
                    0.00    0.01       1/1           iniphysiq_mod_mp_iniphysiq_ [77]
                    0.00    0.00       1/1           conf_gcm_ [272]
                    0.00    0.00       1/1           parallel_lmdz_mp_init_parallel_ [355]
                    0.00    0.00       1/1           mod_const_mpi_mp_init_const_mpi_ [317]
                    0.00    0.00       1/1           bands_mp_read_distrib_ [250]
                    0.00    0.00       1/1           parallel_lmdz_mp_barrier_ [353]
                    0.00    0.00       1/1           bands_mp_set_bands_ [251]
                    0.00    0.00       1/1           bands_mp_writebands_ [252]
                    0.00    0.00       1/1           mod_hallo_mp_init_mod_hallo_ [321]
                    0.00    0.00       1/30          parallel_lmdz_mp_setdistrib_ [172]
                    0.00    0.00       1/1           cpdet_mod_mp_ini_cpdet_ [275]
                    0.00    0.00       1/1           infotrac_mp_infotrac_init_ [299]
                    0.00    0.00       1/1           dynetat0_ [287]
                    0.00    0.00       1/1           dynredem0_p_ [290]
                    0.00    0.00       1/1           iniconst_ [301]
                    0.00    0.00       1/1           inigeom_ [302]
    -----------------------------------------------
    ...
    

    Finally, for nice profiling visualizations like call trees, you can use the python script from ​https://github.com/jrfonseca/gprof2dot


    From https://trac.lmd.jussieu.fr/Planeto/wiki/ProfilingMethod