HowTo: Bind MPI & pin OMP threads on clusters

De LMDZPedia
Aller à : navigation, rechercher

We highly recommend reading the Adastra documentation on the matter for more details.

Note: As LMDZ only supports CPU for now, this page only addresses CPU binding.

CPU Binding & Pinning

Binding & Pinning: What and Why

Binding / pinning refers to the action of specifying how each MPI / OMP thread is distributed on a computing node. If left unspecified, the following can happen:

  • MPI/OMP threads that communicate closely can get assigned to threads that are "far apart", inducing significant communication overhead,
  • MPI and OMP threads can get assigned to the same physical thread, reducing significantly performance,
  • OMP threads can "move" during the execution, further exacerbating the two points above.

On Intel architectures (e.g. on JeanZay), the default scheduler does a fairly good job of binding/pinning automatically. However on AMD architectures (e.g. on Adastra), this is not the case, and users should provide their own explicit binding/pinning.

Example: on Adastra, specifying proper bindings for LMDZ Setup results in a x4-x6 speedup !
  • Note: although not "required" on JeanZay, it's still a good practice !

Binding & Pinning: How

Note: this section assumes you are running on exclusive nodes, on a SLURM cluster

The easiest way to "automatically" set good-enough binding/pinning is to use slurm_set_cpu_binding.sh from LMDZ Setup. This script reads the number of required MPI and OMP tasks from environment variables $SLURM_NTASKS_PER_NODE, $OMP_NUM_THREADS, and from the system information creates a binding table. It then leverages $OMP_PLACES and numactl to execute the binding.

  • Note: the script will automatically use hyperthreading (SMT) if you require more MPIxOMP than there are non-SMT cores on the system.

Since we manually set the binding, we must disable slurm's auto-binding using --cpu-bind=none --mem-bind=none.

Typical use: srun --cpu-bind=none --mem-bind=none -- ./slurm_set_cpu_binding.sh ./gcm.e

  • Note: Here we don't need to specify a number of tasks for srun, as it will be automatically inferred from $SLURM_NTASKS_PER_NODE.

A note on SMT (hyperthreading)

Thanks to this binding, we can properly investigate the effects of using SMT hyperthreading. As of 06/24, a bench using LMDZ Setup on Adastra reveals that there's barely any performance gain in using SMT.