HowTo: Bind MPI & pin OMP threads on clusters

We highly recommend reading the Adastra documentation on the matter for more details.

Note: As LMDZ only supports CPU for now, this page only addresses CPU binding.

Sommaire

1 CPU Binding & Pinning

CPU Binding & Pinning

Binding & Pinning: What and Why

Binding / pinning refers to the action of specifying how each MPI / OMP thread is distributed on a computing node. If left unspecified, the following can happen:

MPI/OMP threads that communicate closely can get assigned to threads that are "far apart", inducing significant communication overhead,
MPI and OMP threads can get assigned to the same physical thread, reducing significantly performance,
OMP threads can "move" during the execution, further exacerbating the two points above.

On Intel architectures (e.g. on JeanZay), the default scheduler does a fairly good job of binding/pinning automatically. However on AMD architectures (e.g. on Adastra), this is not the case, and users should provide their own explicit binding/pinning.

Example: on Adastra, specifying proper bindings for LMDZ Setup results in a x4-x6 speedup !

Note: although not "required" on JeanZay, it's still a good practice !

Binding & Pinning: How

Note: this section assumes you are running on exclusive nodes, on a SLURM cluster

The easiest way to "automatically" set good-enough binding/pinning is to use slurm_set_cpu_binding.sh from LMDZ Setup. This script reads the number of required MPI and OMP tasks from environment variables $SLURM_NTASKS_PER_NODE, $OMP_NUM_THREADS, and from the system information creates a binding table. It then leverages $OMP_PLACES and numactl to execute the binding.

Note: the script will automatically use hyperthreading (SMT) if you require more MPIxOMP than there are non-SMT cores on the system.

Since we manually set the binding, we must disable slurm's auto-binding using --cpu-bind=none --mem-bind=none.

Typical use: srun --cpu-bind=none --mem-bind=none -- ./slurm_set_cpu_binding.sh ./gcm.e

Note: Here we don't need to specify a number of tasks for srun, as it will be automatically inferred from $SLURM_NTASKS_PER_NODE.

A note on SMT (hyperthreading)

Thanks to this binding, we can properly investigate the effects of using SMT hyperthreading. As of 06/24, a bench using LMDZ Setup on Adastra reveals that there's barely any performance gain in using SMT.

HowTo: Bind MPI & pin OMP threads on clusters

Sommaire

CPU Binding & Pinning

Binding & Pinning: What and Why

Binding & Pinning: How

A note on SMT (hyperthreading)

Menu de navigation

Outils personnels

Espaces de noms

Variantes

Affichages

Plus

Rechercher

Navigation

Outils