HowTo: Bind MPI & pin OMP threads on clusters
We highly recommend reading the Adastra documentation on the matter for more details.
Note: As LMDZ only supports CPU for now, this page only addresses CPU binding.
Sommaire
CPU Binding & Pinning
Binding & Pinning: What and Why
Binding / pinning refers to the action of specifying how each MPI / OMP thread is distributed on a computing node. If left unspecified, the following can happen:
- MPI/OMP threads that communicate closely can get assigned to threads that are "far apart", inducing significant communication overhead,
- MPI and OMP threads can get assigned to the same physical thread, reducing significantly performance,
- OMP threads can "move" during the execution, further exacerbating the two points above.
On Intel architectures (e.g. on JeanZay), the default scheduler does a fairly good job of binding/pinning automatically. However on AMD architectures (e.g. on Adastra), this is not the case, and users should provide their own explicit binding/pinning.
Example: on Adastra, specifying proper bindings for LMDZ Setup results in a x4-x6 speedup !
- Note: although not "required" on JeanZay, it's still a good practice !
Binding & Pinning: How
Note: this section assumes you are running on exclusive nodes, on a SLURM cluster
The easiest way to "automatically" set good-enough binding/pinning is to use slurm_set_cpu_binding.sh
from LMDZ Setup. This script reads the number of required MPI and OMP tasks from environment variables $SLURM_NTASKS_PER_NODE, $OMP_NUM_THREADS
, and from the system information creates a binding table. It then leverages $OMP_PLACES
and numactl
to execute the binding.
- Note: the script will automatically use hyperthreading (SMT) if you require more MPIxOMP than there are non-SMT cores on the system.
Since we manually set the binding, we must disable slurm's auto-binding using --cpu-bind=none --mem-bind=none
.
Typical use: srun --cpu-bind=none --mem-bind=none -- ./slurm_set_cpu_binding.sh ./gcm.e
- Note: Here we don't need to specify a number of tasks for
srun
, as it will be automatically inferred from$SLURM_NTASKS_PER_NODE
.
A note on SMT (hyperthreading)
Thanks to this binding, we can properly investigate the effects of using SMT hyperthreading. As of 06/24, a bench using LMDZ Setup on Adastra reveals that there's barely any performance gain in using SMT.