Difference between revisions of "Profiling Method"
(Created blank page) |
|||
Line 1: | Line 1: | ||
+ | If you want to do some profiling of the GCM (like knowing the time passed in each subroutine when running a simulation), here is a simple methodology to follow: | ||
+ | 1) add the option -pg in the lines %BASE_FFLAGS and %BASE_LD of your arch.fcm file | ||
+ | |||
+ | 2) compile the GCM | ||
+ | |||
+ | 3) run your simulation ; after it is completed, you should have a binary file named gmon.out | ||
+ | |||
+ | 4) use the unix command gprof on your gcm executable, and write the output in a text file, for example: | ||
+ | |||
+ | <syntaxhighlight lang="bash" line> | ||
+ | gprof gcm.e > profiling.txt | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | gprof will read the gmon.out file, and output profiling information. For more information on how to use gprof, type man gprof | ||
+ | |||
+ | You now have in your text file a lot of information on time used by the subroutines of the GCM code and their proportion to the whole run's duration. Especially, you have first the "Flat profile", which sorts the subroutines by their own time consumption ("self"): | ||
+ | |||
+ | Flat profile: | ||
+ | Each sample counts as 0.01 seconds. | ||
+ | % cumulative self self total | ||
+ | time seconds seconds calls s/call s/call name | ||
+ | 10.32 0.16 0.16 360 0.00 0.00 aerave_ | ||
+ | 7.74 0.28 0.12 60 0.00 0.00 lwxd_ | ||
+ | ... | ||
+ | |||
+ | and second, the "Call graph", which displays the parents-children links between the subroutines, and sorts them by their "total" time (self+children). | ||
+ | |||
+ | Call graph (explanation follows) | ||
+ | granularity: each sample hit covers 2 byte(s) for 0.65% of 1.55 seconds | ||
+ | index % time self children called name | ||
+ | 0.00 1.55 1/1 main [2] | ||
+ | [1] 100.0 0.00 1.55 1 MAIN__ [1] | ||
+ | 0.00 1.49 1/1 leapfrog_p_ [3] | ||
+ | 0.00 0.04 1/1 filtreg_mod_mp_inifilr_ [30] | ||
+ | 0.00 0.01 1/1 inidissip_ [75] | ||
+ | 0.00 0.01 1/1 iniphysiq_mod_mp_iniphysiq_ [77] | ||
+ | 0.00 0.00 1/1 conf_gcm_ [272] | ||
+ | 0.00 0.00 1/1 parallel_lmdz_mp_init_parallel_ [355] | ||
+ | 0.00 0.00 1/1 mod_const_mpi_mp_init_const_mpi_ [317] | ||
+ | 0.00 0.00 1/1 bands_mp_read_distrib_ [250] | ||
+ | 0.00 0.00 1/1 parallel_lmdz_mp_barrier_ [353] | ||
+ | 0.00 0.00 1/1 bands_mp_set_bands_ [251] | ||
+ | 0.00 0.00 1/1 bands_mp_writebands_ [252] | ||
+ | 0.00 0.00 1/1 mod_hallo_mp_init_mod_hallo_ [321] | ||
+ | 0.00 0.00 1/30 parallel_lmdz_mp_setdistrib_ [172] | ||
+ | 0.00 0.00 1/1 cpdet_mod_mp_ini_cpdet_ [275] | ||
+ | 0.00 0.00 1/1 infotrac_mp_infotrac_init_ [299] | ||
+ | 0.00 0.00 1/1 dynetat0_ [287] | ||
+ | 0.00 0.00 1/1 dynredem0_p_ [290] | ||
+ | 0.00 0.00 1/1 iniconst_ [301] | ||
+ | 0.00 0.00 1/1 inigeom_ [302] | ||
+ | ----------------------------------------------- | ||
+ | ... | ||
+ | |||
+ | 5) for nice profiling visualizations, like call trees, you can use the python script from https://github.com/jrfonseca/gprof2dot |
Revision as of 09:02, 11 May 2022
If you want to do some profiling of the GCM (like knowing the time passed in each subroutine when running a simulation), here is a simple methodology to follow:
1) add the option -pg in the lines %BASE_FFLAGS and %BASE_LD of your arch.fcm file
2) compile the GCM
3) run your simulation ; after it is completed, you should have a binary file named gmon.out
4) use the unix command gprof on your gcm executable, and write the output in a text file, for example:
1 gprof gcm.e > profiling.txt
gprof will read the gmon.out file, and output profiling information. For more information on how to use gprof, type man gprof
You now have in your text file a lot of information on time used by the subroutines of the GCM code and their proportion to the whole run's duration. Especially, you have first the "Flat profile", which sorts the subroutines by their own time consumption ("self"):
Flat profile: Each sample counts as 0.01 seconds.
% cumulative self self total time seconds seconds calls s/call s/call name 10.32 0.16 0.16 360 0.00 0.00 aerave_ 7.74 0.28 0.12 60 0.00 0.00 lwxd_
...
and second, the "Call graph", which displays the parents-children links between the subroutines, and sorts them by their "total" time (self+children).
Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 0.65% of 1.55 seconds index % time self children called name
0.00 1.55 1/1 main [2]
[1] 100.0 0.00 1.55 1 MAIN__ [1]
0.00 1.49 1/1 leapfrog_p_ [3] 0.00 0.04 1/1 filtreg_mod_mp_inifilr_ [30] 0.00 0.01 1/1 inidissip_ [75] 0.00 0.01 1/1 iniphysiq_mod_mp_iniphysiq_ [77] 0.00 0.00 1/1 conf_gcm_ [272] 0.00 0.00 1/1 parallel_lmdz_mp_init_parallel_ [355] 0.00 0.00 1/1 mod_const_mpi_mp_init_const_mpi_ [317] 0.00 0.00 1/1 bands_mp_read_distrib_ [250] 0.00 0.00 1/1 parallel_lmdz_mp_barrier_ [353] 0.00 0.00 1/1 bands_mp_set_bands_ [251] 0.00 0.00 1/1 bands_mp_writebands_ [252] 0.00 0.00 1/1 mod_hallo_mp_init_mod_hallo_ [321] 0.00 0.00 1/30 parallel_lmdz_mp_setdistrib_ [172] 0.00 0.00 1/1 cpdet_mod_mp_ini_cpdet_ [275] 0.00 0.00 1/1 infotrac_mp_infotrac_init_ [299] 0.00 0.00 1/1 dynetat0_ [287] 0.00 0.00 1/1 dynredem0_p_ [290] 0.00 0.00 1/1 iniconst_ [301] 0.00 0.00 1/1 inigeom_ [302]
...
5) for nice profiling visualizations, like call trees, you can use the python script from https://github.com/jrfonseca/gprof2dot