HowTo: LMDZ code guidelines : Différence entre versions

De LMDZPedia
Aller à : navigation, rechercher
m
 
(4 révisions intermédiaires par le même utilisateur non affichées)
Ligne 1 : Ligne 1 :
 
   /!\ This page is a WORK IN PROGRESS. Nothing in this page should be considered correct or consensual for now. /!\
 
   /!\ This page is a WORK IN PROGRESS. Nothing in this page should be considered correct or consensual for now. /!\
  
''Note: this documents draws from various sources, including [http://broken.link.todo.replace the OPA guidelines].
+
''Note: this documents draws from various sources, including [http://broken.link.todo.replace the OPA guidelines], [https://fortranwiki.org/fortran/show/Modernizing+Old+Fortran Modernizing Old Fortran], [http://pages.swcp.com/~walt/F/F_bnf.html F syntax rules], [https://github.com/fortran-lang/stdlib/blob/master/STYLE_GUIDE.md the stdlib style guide], [https://github.com/codee-com/fortran-modernization CODEE Fortran recommendations].
  
 
== Example code ==
 
== Example code ==
Ligne 36 : Ligne 36 :
 
CONTAINS
 
CONTAINS
 
    
 
    
  !! Put at least a blank line before/after each function/subroutine
+
  !! Put at least a blank line before/after each function/subroutine2
 +
!! A routine should have a clear, unambiguous name, describing what the routine does.
 
   SUBROUTINE my_subroutine(arg1, arg2, arg3) ! handle event xxxx
 
   SUBROUTINE my_subroutine(arg1, arg2, arg3) ! handle event xxxx
 
   !!                            ^ comments should always be right before, or on the same line as what they're commenting
 
   !!                            ^ comments should always be right before, or on the same line as what they're commenting
Ligne 62 : Ligne 63 :
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 +
== Comments and Documentation ==
  
Some comments:
+
Documentation consists of putting information both inside and outside the source code. On large, formal projects, most of the documentation is outside the source code. However because the internal documentation is most closely associated with the code, it is the part most likely to remain correct as the code is modified. The thing to be really careful with is that misleading comments are even worse than having no comments at all. So, please report any misleading comments you found.
* Limit '''to the bare minimum''' the use of preprocessor <code>#ifdef</code> keys. Those decrease lisibility, disable automatic code analysis, and generally make code a pain to manage and read. Ideally, a given preprocessor key should be used only once, or if not possible, in a single module for the entire codebase.
+
 
* Limit the use of <code>include</code> headers to the bare minimum. Whenever not possible, wrap those include in a module, and import that module instead. This provides much more flexibility, e.g. when using <code>ONLY: ...</code>.
+
Try to avoid having superfluous comments such as: <code>A = A + 1 ! Increment A by one.</code> This is not adding anything that is not immediately obvious. A person looking through your code should only have to scan your comments to be able to get a good idea of what the code does and where to look for any special activity.
 +
 
 +
Good comments do not just repeat verbally what is happening in the code or just explain it. They clarify its intent. They should explain at a higher level of abstraction what the code or what you are trying to do. In addition, you should comment:
 +
* Everything that gets around an error or an undocumented feature in a language or environment.
 +
* Any violations of good programming style should be justified. This will ensure that any well-intentioned programmer does not break your code by changing the source to implement a “better” style.
 +
* The lines before a control structure, e.g. IF, CASE, loop, etc. are a natural spot to comment and explain what these constructs are about to do.
 +
* The start of '''any''' subroutine or function whose usage is not downright obvious.
  
 
== Misc ==
 
== Misc ==
Ligne 113 : Ligne 121 :
 
* <code>DATA</code> and <code>BLOCK DATA</code> should be replaced by initializers.
 
* <code>DATA</code> and <code>BLOCK DATA</code> should be replaced by initializers.
  
 +
=== Preprocessor ===
 +
 +
Limit '''to the bare minimum''' the use of preprocessor <code>#ifdef</code> keys. Those decrease lisibility, disable automatic code analysis, and generally make code a pain to manage and read. Ideally, a given preprocessor key should be used only once, or if not possible, in a single module for the entire codebase.
 +
 +
=== INCLUDE ===
 +
 +
Limit the use of <code>include</code> headers to the bare minimum. Whenever not possible, wrap those include in a module, and import that module instead. This provides much more flexibility, e.g. when using <code>ONLY: ...</code>.
 +
 +
== Recent examples of improper LMDZ code ==
 +
 +
''Note: The purpose of this section is '''obviously not''' to blame anyone, but to stress that "bad"/perfectible code in LMDZ isn't limited to old legacy code, and that even current code is being written in a way that can be greatly improved.''
 +
 +
==== 07/24: Unhelpful comment ====
 +
 +
<syntaxhighlight lang="fortran" line>
 +
IF (iflag_ice_thermo == 0) THEN 
 +
          !pas necessaire a priori
 +
</syntaxhighlight>
 +
 +
This comment fails to convey a clear indication. What is necessary or not ? In which condition ? "A priori" further conveys a sense of uncertainty that may make sense to the writer, but certainly not to an external reader.
 +
 +
==== 07/24: Unclear conventions, no PURE, useless RETURN ====
 +
 +
<syntaxhighlight lang="fortran" line>
 +
function grid_index(lon_deg,lat_deg)
 +
!--------------------------------------------------------------------------------
 +
! Get local index of grid point of longitude,latitude lon_deg,lat_deg
 +
! Author : XXX 2024/07/XX
 +
! Please do not put this function in a m*odule not to complexify the replay script
 +
!--------------------------------------------------------------------------------
 +
USE dimphy, only : klon
 +
USE geometry_mod, ONLY: latitude_deg, longitude_deg
 +
implicit none
 +
real, intent(in) :: lon_deg,lat_deg
 +
integer :: grid_index
 +
integer i
 +
grid_index=0
 +
do i=1,klon
 +
  if ( abs(lon_deg-longitude_deg(i)) < 0.02 .and.  abs(lat_deg-latitude_deg(i)) < 0.02 ) then
 +
        grid_index=i
 +
        exit
 +
  endif
 +
enddo
 +
return
 +
end
 +
</syntaxhighlight>
 +
 +
This function is actually mostly great ! It is documented, the author put a clear statement to indicate a deviation from standards (no module), <code>USE</code> is used in conjuction with <code>ONLY</code>, <code>INTENT</code> is declared.
 +
However, it is also lacking in some areas, notably:
 +
* This function should be declared as <code>PURE</code>
 +
* This function has inconsistent syntax: mix of uppercase and lowercase for Fortran keywords, sometimes uses <code>::</code> in declarations and sometimes not
 +
* The function ends with a useless <code>RETURN</code>
 +
 +
==== 07/24: Duplicated, uncommented code ====
 +
 +
<syntaxhighlight lang="fortran" line>
 +
          zx_tmp_fi3d=0.
 +
          IF (vars_defined) THEN
 +
            DO nsrf=1,nbsrf
 +
                DO k=1,klev
 +
                  zx_tmp_fi3d(:,k)=zx_tmp_fi3d(:,k) &
 +
                        +pctsrf(:,nsrf)*tke_shear(:,k,nsrf)
 +
                ENDDO
 +
            ENDDO
 +
          ENDIF
 +
 +
          CALL histwrite_phy(o_tke_shear, zx_tmp_fi3d)
 +
 +
          zx_tmp_fi3d=0.
 +
          IF (vars_defined) THEN
 +
            DO nsrf=1,nbsrf
 +
                DO k=1,klev
 +
                  zx_tmp_fi3d(:,k)=zx_tmp_fi3d(:,k) &
 +
                        +pctsrf(:,nsrf)*tke_buoy(:,k,nsrf)
 +
                ENDDO
 +
            ENDDO
 +
          ENDIF
 +
 +
          CALL histwrite_phy(o_tke_buoy, zx_tmp_fi3d)
 +
 +
 +
          zx_tmp_fi3d=0.
 +
          IF (vars_defined) THEN
 +
            DO nsrf=1,nbsrf
 +
                DO k=1,klev
 +
                  zx_tmp_fi3d(:,k)=zx_tmp_fi3d(:,k) &
 +
                        +pctsrf(:,nsrf)*tke_trans(:,k,nsrf)
 +
                ENDDO
 +
            ENDDO
 +
          ENDIF
 +
 +
          CALL histwrite_phy(o_tke_trans, zx_tmp_fi3d)
 +
</syntaxhighlight>
 +
 +
This code:
 +
* Duplicates three times essentially the same code. Refactoring into a routine would be much more readable and reduce the potential of typos.
 +
* Contains no comment justifying such behavior, or what is even being done here, which to an external reader is fairly obscure.
 +
 +
==== 06/24: Undocumented variable ====
 +
 +
<syntaxhighlight lang="fortran" line>
 +
INTEGER                              :: nsrf, k, iq, iff, i, j, ilev, itr, itrb
 +
!!                                                                          ^ Added this
 +
!! (...)
 +
itrb = 0
 +
!! (...)
 +
if(tracers(iq)%name(1:3)=='BIN') then
 +
              itrb = itrb + 1
 +
</syntaxhighlight>
 +
 +
Just because a variable is a "dumb" counter doesn't mean it shouldn't be documented. In this particular example, the variable name is devoid of any obvious meaning.
 +
Properly documenting this variable would also make it clear to others whether it would make sense to reuse it later on in the code.
 +
 +
==== 06/24: Comments as code storage ====
 +
 +
<syntaxhighlight lang="fortran" line>
 +
!CALL writefield_phy('latitude',latitude_deg,1)
 +
!CALL writefield_phy('pressure_hl',pressure_hl,klev+1)
 +
!CALL writefield_phy('Ldecorel',PDECORR_LEN_EDGES_M,klev)
 +
</syntaxhighlight>
 +
 +
'''Comments are ''NOT'' for code storage'''. Unfortunately, the LMDZ code is ''littered'' with bits of commented code without any indication of when or why the code was commented.
 +
As a result, there are in a lot of places such comments, many dating from 20+ years ago, for which we have no clue whether we can remove them or not. This greatly hinders readability and maintainability.
  
 +
Of course, in ''some'' cases, it is warranted to keep some code at hand, e.g. for diagnostic purpose. In such case, please '''make sure to always clearly document ''why, when and by whom'' the code was commented out''', so that it can be cleaned up at a later date when it becomes irrelevant.
  
 
== Questions à élucider ==
 
== Questions à élucider ==
Ligne 119 : Ligne 251 :
 
* Prescrit-on une langue (Français / Anglais) ?
 
* Prescrit-on une langue (Français / Anglais) ?
 
* Longueur des lignes: dans la limite du raisonable, pourquoi se limiter à 100-140 chars ? Le wrapping fonctionne très bien.
 
* Longueur des lignes: dans la limite du raisonable, pourquoi se limiter à 100-140 chars ? Le wrapping fonctionne très bien.
*
 

Version actuelle en date du 1 août 2024 à 09:17

 /!\ This page is a WORK IN PROGRESS. Nothing in this page should be considered correct or consensual for now. /!\

Note: this documents draws from various sources, including the OPA guidelines, Modernizing Old Fortran, F syntax rules, the stdlib style guide, CODEE Fortran recommendations.

Example code

Note: in this example, we added "meta-comments" indicated by !!

Note: of course, there's always exceptions to "Always", but they should systematically documented by a comment, with a proper explanation.

 1 !! File: lmdz_mymodule.f90
 2 !!         ^ all module files should start with their component name, (e.g. "lmdz")
 3 !!                      ^ use ".f90" for all free-form files, unless they contain preprocessor directives (then use ".F90").
 4 
 5 !! At the top of the file, we provide a comment to explain what this modules does
 6 ! implementation of my_useful_stuff.
 7 ! ----------------------------------
 8 
 9 !! Unless specified otherwise, all files should contain a single module with the same name which encapsulates all their content
10 MODULE lmdz_mymodule
11 !!  ^ all Fortran keywords are in CAPS, everything else is in lowercase.
12   USE my_other_module, ONLY: function_a, function_b
13   !!                     ^ Always specify explicitely which functions you use from a module.
14 !!^ use indentation (2 spaces) to provide visual clarity
15   IMPLICIT NONE; PRIVATE
16   !!                ^ Always make your module private, and expose explicitely which functions are public
17   !!    ^ Always use IMPLICIT NONE at the module-level
18   PUBLIC my_var, my_subroutine
19 
20   REAL, PARAMETER :: my_var
21   REAL            :: a, b, c, d, e, f
22   !!               ^ use "::" even if it's not strictly necessary
23   !!       ^ Alignment of "::" is not necessary, but should be attempted whenever it provides readability and is relevant (e.g. there's some link between the aligned variables)
24         
25 CONTAINS
26   
27  !! Put at least a blank line before/after each function/subroutine2
28  !! A routine should have a clear, unambiguous name, describing what the routine does.
29   SUBROUTINE my_subroutine(arg1, arg2, arg3) ! handle event xxxx
30   !!                            ^ comments should always be right before, or on the same line as what they're commenting
31   !!                            ^ as for modules, all functions should be documented in a comment
32     INTEGER, INTENT(IN) :: arg1
33     !!          ^ Always provide intent 
34     REAL, INTENT(IN)    :: arg2
35 
36     INTEGER, INTENT(INOUT) :: arg3
37 
38     REAl, INTENT(OUT) :: out1
39     !!^ Group together IN, then INOUT, then OUT
40 
41     IF (arg2 > arg1) THEN
42     !!       ^ .ge., .le., etc are deprecated. Use >, <, ==, etc instead
43       ! ...
44     END IF
45     !! ^ Use END IF, END DO rather than ENDIF, ENDDO (coherence with fortran-lang.org)
46   END SUBROUTINE my_subroutine
47 !!         ^ always use named blocks for END
48   
49 
50 END MODULE my_module
51 !!       ^ always use named blocks for END

Comments and Documentation

Documentation consists of putting information both inside and outside the source code. On large, formal projects, most of the documentation is outside the source code. However because the internal documentation is most closely associated with the code, it is the part most likely to remain correct as the code is modified. The thing to be really careful with is that misleading comments are even worse than having no comments at all. So, please report any misleading comments you found.

Try to avoid having superfluous comments such as: A = A + 1 ! Increment A by one. This is not adding anything that is not immediately obvious. A person looking through your code should only have to scan your comments to be able to get a good idea of what the code does and where to look for any special activity.

Good comments do not just repeat verbally what is happening in the code or just explain it. They clarify its intent. They should explain at a higher level of abstraction what the code or what you are trying to do. In addition, you should comment:

  • Everything that gets around an error or an undocumented feature in a language or environment.
  • Any violations of good programming style should be justified. This will ensure that any well-intentioned programmer does not break your code by changing the source to implement a “better” style.
  • The lines before a control structure, e.g. IF, CASE, loop, etc. are a natural spot to comment and explain what these constructs are about to do.
  • The start of any subroutine or function whose usage is not downright obvious.

Misc

Fortran standard

Fortran 2008 is now widely supported on all major compilers, and should be taken as a reference.

Allocatable arrays

Allocatable arrays can be relevant, but should be substituted by automatic arrays (= passing the array size as an argument) whenever possible, for performance and debugging reasons.

Compiler warnings

Compiler warnings are useful - unless there's too many of them. Effort should be made such that during the compiling phase, no warning appears, so that user can be easily alerted of potential bugs. when some appear in their configuration.

In particular, all component must be able to run when a compile-time and/or run-time array bounds checking option is enabled. Use of the (*) construct in array dimensioning to circumvent this problem is forbidden because it effectively disables array bounds checking.

Side effects

Whenever possible, write functions without side-effects, and label them accordingly as PURE or ELEMENTAL.

SAVE

Since all files contain a MODULE, attribute SAVE is banned. Variables whose value have to be preserved between two calls should be declared at the module level.

Deprecated features

In terms of keeping code up to date and easier to maintain the code should always follow the current standards of FORTRAN and ANSI C. We decide to restrict the languages use to the elements, which are not obsolete or deleted -- even if they are still available with almost all compilers.

Note: just because a feature is deprecated doesn't mean it must be converted in old, working code - this can be tricky and bug-prone in complex instances.

Some notable deprecated features:

  • COMMON blocks. Instead, use MODULE initialisation.
  • EQUIVALENCE should be replaced by pointers or derived data types.
  • Any kind of GOTO has absolutely no place in modern programming. Use CASE whenever relevant rather than IF.
  • Arithmetic IF statements. Use block IF or CASE instead.
  • CONTINUE statement: use IF, CASE, DO WHILE, EXIT, CYCLE statements or a contained SUBROUTINE instead.
  • I/O routines END and ERR - use IOSTAT instead
  • FORMAT statements: use character parameters or explicit format- specifiers inside the READ or WRITE statement instead.
  • ENTRY statements: a subprogram must only have one entry point
  • Alternate RETURN is obsolescent, and RETURN at the end of a function is useless clutter.
  • Statement functions have been replaced by corresponding functions or subroutines in the CONTAIN block of the current scope.
  • DATA and BLOCK DATA should be replaced by initializers.

Preprocessor

Limit to the bare minimum the use of preprocessor #ifdef keys. Those decrease lisibility, disable automatic code analysis, and generally make code a pain to manage and read. Ideally, a given preprocessor key should be used only once, or if not possible, in a single module for the entire codebase.

INCLUDE

Limit the use of include headers to the bare minimum. Whenever not possible, wrap those include in a module, and import that module instead. This provides much more flexibility, e.g. when using ONLY: ....

Recent examples of improper LMDZ code

Note: The purpose of this section is obviously not to blame anyone, but to stress that "bad"/perfectible code in LMDZ isn't limited to old legacy code, and that even current code is being written in a way that can be greatly improved.

07/24: Unhelpful comment

1 IF (iflag_ice_thermo == 0) THEN   
2            !pas necessaire a priori

This comment fails to convey a clear indication. What is necessary or not ? In which condition ? "A priori" further conveys a sense of uncertainty that may make sense to the writer, but certainly not to an external reader.

07/24: Unclear conventions, no PURE, useless RETURN

 1 function grid_index(lon_deg,lat_deg)
 2 !--------------------------------------------------------------------------------
 3 ! Get local index of grid point of longitude,latitude lon_deg,lat_deg
 4 ! Author : XXX 2024/07/XX
 5 ! Please do not put this function in a m*odule not to complexify the replay script
 6 !--------------------------------------------------------------------------------
 7 USE dimphy, only : klon
 8 USE geometry_mod, ONLY: latitude_deg, longitude_deg
 9 implicit none
10 real, intent(in) :: lon_deg,lat_deg
11 integer :: grid_index
12 integer i
13 grid_index=0
14 do i=1,klon
15    if ( abs(lon_deg-longitude_deg(i)) < 0.02 .and.  abs(lat_deg-latitude_deg(i)) < 0.02 ) then
16         grid_index=i
17         exit
18    endif
19 enddo
20 return
21 end

This function is actually mostly great ! It is documented, the author put a clear statement to indicate a deviation from standards (no module), USE is used in conjuction with ONLY, INTENT is declared. However, it is also lacking in some areas, notably:

  • This function should be declared as PURE
  • This function has inconsistent syntax: mix of uppercase and lowercase for Fortran keywords, sometimes uses :: in declarations and sometimes not
  • The function ends with a useless RETURN

07/24: Duplicated, uncommented code

 1           zx_tmp_fi3d=0.
 2           IF (vars_defined) THEN
 3              DO nsrf=1,nbsrf
 4                 DO k=1,klev
 5                    zx_tmp_fi3d(:,k)=zx_tmp_fi3d(:,k) &
 6                         +pctsrf(:,nsrf)*tke_shear(:,k,nsrf)
 7                 ENDDO
 8              ENDDO
 9           ENDIF
10 
11           CALL histwrite_phy(o_tke_shear, zx_tmp_fi3d)
12 
13           zx_tmp_fi3d=0.
14           IF (vars_defined) THEN
15              DO nsrf=1,nbsrf
16                 DO k=1,klev
17                    zx_tmp_fi3d(:,k)=zx_tmp_fi3d(:,k) &
18                         +pctsrf(:,nsrf)*tke_buoy(:,k,nsrf)
19                 ENDDO
20              ENDDO
21           ENDIF
22 
23           CALL histwrite_phy(o_tke_buoy, zx_tmp_fi3d)
24 
25 
26           zx_tmp_fi3d=0.
27           IF (vars_defined) THEN
28              DO nsrf=1,nbsrf
29                 DO k=1,klev
30                    zx_tmp_fi3d(:,k)=zx_tmp_fi3d(:,k) &
31                         +pctsrf(:,nsrf)*tke_trans(:,k,nsrf)
32                 ENDDO
33              ENDDO
34           ENDIF
35 
36           CALL histwrite_phy(o_tke_trans, zx_tmp_fi3d)

This code:

  • Duplicates three times essentially the same code. Refactoring into a routine would be much more readable and reduce the potential of typos.
  • Contains no comment justifying such behavior, or what is even being done here, which to an external reader is fairly obscure.

06/24: Undocumented variable

1 INTEGER                               :: nsrf, k, iq, iff, i, j, ilev, itr, itrb
2 !!                                                                           ^ Added this
3 !! (...)
4 itrb = 0
5 !! (...)
6 if(tracers(iq)%name(1:3)=='BIN') then
7                itrb = itrb + 1

Just because a variable is a "dumb" counter doesn't mean it shouldn't be documented. In this particular example, the variable name is devoid of any obvious meaning. Properly documenting this variable would also make it clear to others whether it would make sense to reuse it later on in the code.

06/24: Comments as code storage

1 !CALL writefield_phy('latitude',latitude_deg,1)
2 !CALL writefield_phy('pressure_hl',pressure_hl,klev+1)
3 !CALL writefield_phy('Ldecorel',PDECORR_LEN_EDGES_M,klev)

Comments are NOT for code storage. Unfortunately, the LMDZ code is littered with bits of commented code without any indication of when or why the code was commented. As a result, there are in a lot of places such comments, many dating from 20+ years ago, for which we have no clue whether we can remove them or not. This greatly hinders readability and maintainability.

Of course, in some cases, it is warranted to keep some code at hand, e.g. for diagnostic purpose. In such case, please make sure to always clearly document why, when and by whom the code was commented out, so that it can be cleaned up at a later date when it becomes irrelevant.

Questions à élucider

  • Prescrit-on une langue (Français / Anglais) ?
  • Longueur des lignes: dans la limite du raisonable, pourquoi se limiter à 100-140 chars ? Le wrapping fonctionne très bien.