Difference between revisions of "Profiling Method"
(2 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
If you want to do some profiling of the GCM (like knowing the time passed in each subroutine when running a simulation), here is a simple methodology to follow: | If you want to do some profiling of the GCM (like knowing the time passed in each subroutine when running a simulation), here is a simple methodology to follow: | ||
− | + | <ol> | |
+ | <li> add the option -pg in the lines %BASE_FFLAGS and %BASE_LD of your arch.fcm file | ||
+ | <li> compile the GCM | ||
+ | <li> run your simulation ; after it is completed, you should have a binary file named gmon.out | ||
+ | <li> use the unix command gprof on your gcm executable, and write the output in a text file, for example: | ||
− | + | <syntaxhighlight lang="bash"> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | <syntaxhighlight lang="bash" | ||
gprof gcm.e > profiling.txt | gprof gcm.e > profiling.txt | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 17: | Line 15: | ||
You now have in your text file a lot of information on time used by the subroutines of the GCM code and their proportion to the whole run's duration. Especially, you have first the "Flat profile", which sorts the subroutines by their own time consumption ("self"): | You now have in your text file a lot of information on time used by the subroutines of the GCM code and their proportion to the whole run's duration. Especially, you have first the "Flat profile", which sorts the subroutines by their own time consumption ("self"): | ||
+ | <pre> | ||
Flat profile: | Flat profile: | ||
Each sample counts as 0.01 seconds. | Each sample counts as 0.01 seconds. | ||
Line 24: | Line 23: | ||
7.74 0.28 0.12 60 0.00 0.00 lwxd_ | 7.74 0.28 0.12 60 0.00 0.00 lwxd_ | ||
... | ... | ||
+ | </pre> | ||
− | + | and second, the "Call graph", which displays the parents-children links between the subroutines, and sorts them by their "total" time (self+children). | |
+ | <pre> | ||
Call graph (explanation follows) | Call graph (explanation follows) | ||
granularity: each sample hit covers 2 byte(s) for 0.65% of 1.55 seconds | granularity: each sample hit covers 2 byte(s) for 0.65% of 1.55 seconds | ||
Line 53: | Line 54: | ||
----------------------------------------------- | ----------------------------------------------- | ||
... | ... | ||
+ | </pre> | ||
+ | |||
+ | Finally, for nice profiling visualizations like call trees, you can use the python script from https://github.com/jrfonseca/gprof2dot | ||
+ | |||
+ | |||
+ | ''From https://trac.lmd.jussieu.fr/Planeto/wiki/ProfilingMethod'' | ||
− | + | [[Category:Generic-Model]] | |
+ | [[Category:Mars-Model]] | ||
+ | [[Category:Venus-Model]] | ||
+ | [[Category:Titan-Model]] |
Latest revision as of 11:16, 9 March 2023
If you want to do some profiling of the GCM (like knowing the time passed in each subroutine when running a simulation), here is a simple methodology to follow:
- add the option -pg in the lines %BASE_FFLAGS and %BASE_LD of your arch.fcm file
- compile the GCM
- run your simulation ; after it is completed, you should have a binary file named gmon.out
- use the unix command gprof on your gcm executable, and write the output in a text file, for example:
gprof gcm.e > profiling.txt
gprof will read the gmon.out file, and output profiling information. For more information on how to use gprof, type man gprof
You now have in your text file a lot of information on time used by the subroutines of the GCM code and their proportion to the whole run's duration. Especially, you have first the "Flat profile", which sorts the subroutines by their own time consumption ("self"):
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 10.32 0.16 0.16 360 0.00 0.00 aerave_ 7.74 0.28 0.12 60 0.00 0.00 lwxd_ ...
and second, the "Call graph", which displays the parents-children links between the subroutines, and sorts them by their "total" time (self+children).
Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 0.65% of 1.55 seconds index % time self children called name 0.00 1.55 1/1 main [2] [1] 100.0 0.00 1.55 1 MAIN__ [1] 0.00 1.49 1/1 leapfrog_p_ [3] 0.00 0.04 1/1 filtreg_mod_mp_inifilr_ [30] 0.00 0.01 1/1 inidissip_ [75] 0.00 0.01 1/1 iniphysiq_mod_mp_iniphysiq_ [77] 0.00 0.00 1/1 conf_gcm_ [272] 0.00 0.00 1/1 parallel_lmdz_mp_init_parallel_ [355] 0.00 0.00 1/1 mod_const_mpi_mp_init_const_mpi_ [317] 0.00 0.00 1/1 bands_mp_read_distrib_ [250] 0.00 0.00 1/1 parallel_lmdz_mp_barrier_ [353] 0.00 0.00 1/1 bands_mp_set_bands_ [251] 0.00 0.00 1/1 bands_mp_writebands_ [252] 0.00 0.00 1/1 mod_hallo_mp_init_mod_hallo_ [321] 0.00 0.00 1/30 parallel_lmdz_mp_setdistrib_ [172] 0.00 0.00 1/1 cpdet_mod_mp_ini_cpdet_ [275] 0.00 0.00 1/1 infotrac_mp_infotrac_init_ [299] 0.00 0.00 1/1 dynetat0_ [287] 0.00 0.00 1/1 dynredem0_p_ [290] 0.00 0.00 1/1 iniconst_ [301] 0.00 0.00 1/1 inigeom_ [302] ----------------------------------------------- ...
Finally, for nice profiling visualizations like call trees, you can use the python script from https://github.com/jrfonseca/gprof2dot
From https://trac.lmd.jussieu.fr/Planeto/wiki/ProfilingMethod