Difference between revisions of "Using Adastra"

From Planets
Jump to: navigation, search
 
(2 intermediate revisions by the same user not shown)
Line 49: Line 49:
  
 
* Link to the Adastra technical documentation: https://dci.dci-gitlab.cines.fr/webextranet/
 
* Link to the Adastra technical documentation: https://dci.dci-gitlab.cines.fr/webextranet/
 +
 +
* Link to the webpage where you can find out (login and password are those of your Adastra account) how many hour left we have on the project and details about everyone's use of Adastra: https://reser.cines.fr
  
 
== Disks and workspaces ==
 
== Disks and workspaces ==
Line 57: Line 59:
 
</syntaxhighlight>
 
</syntaxhighlight>
 
* In a nutshell: we have lots of space on the WORKDIR (250 To) which is "permanent" (unlike the SCRATCHDIR, where files older than 30 days are purged), so use it! And when you want to archive things, make some large tar files and put them on the STOREDIR
 
* In a nutshell: we have lots of space on the WORKDIR (250 To) which is "permanent" (unlike the SCRATCHDIR, where files older than 30 days are purged), so use it! And when you want to archive things, make some large tar files and put them on the STOREDIR
 +
=== Transferring data from Irene ===
 +
You can use the ccfr "speedway" between National computing centers to copy data from Irene to Adastra (it is all explained here: https://dci.dci-gitlab.cines.fr/webextranet/data_storage_and_transfers/index.html#between-computing-site-ccfr ). To summarize:
 +
# First check that you indeed asked to have access to ccfr when you created your account. just run on Adastra the "id" command and check that you are a registered member of the "22011(cinesccfr)" group. If not, ask the CINES helpdesk svp@cines.fr
 +
# Connect to Adastra the usual way, and once on Adastra "ssh adastra-ccfr.cines.fr", which should land you on "login1" which is the node enabled to use the ccfr connection
 +
# Once on login1 you can transfert data from Irene via scp or rsync using the appropriate gateway machine (on the Irene side) which is "irene-fr-ccfr-gw.ccc.cea", e.g.:
 +
<pre>
 +
rsync -avz irenelogin@irene-fr-ccfr-gw.ccc.cea:irene_path_to_your_data adastra_path_to_your_data
 +
</pre>
  
 
== Submitting jobs ==
 
== Submitting jobs ==
Line 168: Line 178:
 
# Launch a request for an allocation on a compute node: <syntaxhighlight lang="bash"> salloc --account=cin0391 --constraint=GENOA --job-name="debug" --nodes=1 --time=1:00:00 --exclusive </syntaxhighlight>
 
# Launch a request for an allocation on a compute node: <syntaxhighlight lang="bash"> salloc --account=cin0391 --constraint=GENOA --job-name="debug" --nodes=1 --time=1:00:00 --exclusive </syntaxhighlight>
 
# Identify which node it is linked to and directly ssh (from login node) to it, e.g. if it is node "c1516" <syntaxhighlight lang="bash"> ssh c1516 </syntaxhighlight>
 
# Identify which node it is linked to and directly ssh (from login node) to it, e.g. if it is node "c1516" <syntaxhighlight lang="bash"> ssh c1516 </syntaxhighlight>
# source environment and launch gdb4hpc
+
# source your usual environment and then the gdb4hpc module <syntaxhighlight> module load gdb4hpc/4.16.0.1 </syntaxhighlight>
# within gdb4hpc: <syntaxhighlight> dbg all> launch $a{1} --launcher-args="--mpi=cray_shasta -A cin0391 --constraint=GENOA -t 00:30:00 -N 1 --cpu-bind=verbose,cores --exclusive"  /path/to/executable.exe </syntaxhighlight>
+
# Go to your work directory and launch gdb4hpc
 +
# within gdb4hpc: <syntaxhighlight> dbg all> launch $a{1} --launcher-args="--mpi=cray_shasta -A cin0391 --constraint=GENOA -t 00:30:00 -N 1 --cpu-bind=verbose,cores --exclusive"  ./executable.exe </syntaxhighlight>
 +
 
 +
Once everything running, the first thing you have to do is set a breakpoint at the beginning of the program, e.g.:
 +
<syntaxhighlight>
 +
break icosa_lmdz.f90:1
 +
</syntaxhighlight>
 +
And then "continue" to that point
  
 
== Are you being disconnected when inactive? ==
 
== Are you being disconnected when inactive? ==

Latest revision as of 09:20, 14 October 2024

This page provides a summary of examples and tools designed to help you get used with the Adastra environment.

Getting access to the cluster

For people on the "Atmosphères Planétaires" GENCI project who need to open an account on Adastra, here is the procedure:

  1. Go to https://www.edari.fr/utilisateur and log in via Janus or create an account if you don't have a Janus login. If this doesn't work, you can create a new eDARI account. (Make sure your profile is fully up to date including nationality)
  2. Beware! If you are on 2 lab (LMD and LATMOS for example), you must register with your email address corresponding to your Janus account.
  3. Click on "se rattacher à un dossier ayant obtenu des resources" or "Attach yourself to an application file that has obtained resources"
  4. "Atmosphères Planétaires" project number to provide: A0160110391
  5. Ehouarn then receives an email to allow you to join the project. Once he has validated it, you receive a confirmation mail.
  6. Once approved, you have to request for an account, click on "CINES: créer une demande d'ouverture de compte"
  7. fill in the forms: name, contract end date, CINES, your lab information (LMD is the default)
  8. Access IP address 134.157.47.46 , FQDN (Fully Qualified Domain Name): ssh-out.lmd.jussieu.fr
  9. Add a second address : 134.157.176.129 , FQDN: spirit2.ipsl.fr
  10. click on option to have access to CCFR (only important if you have access to other GENCI machines)
  11. Security officer is Julien Lenseigne for LMD (his informations are all pre-filled, except phone: +33169335172)
  12. YOU MUST THEN VALIDATE THE REQUEST: click on the "Valider la saisie des informations"
  13. You then receive an automatic mail, but it's only to tell you to go to the next step: You must now download the pre-filled form from e-dari: find "télécharger la demande" and download the pdf. Sign it, and upload it on e-dari "déposer la demande de création de compte".
  14. Wait for your application to be preprocessed by the system...

A couple of pointers

  • Connecting to Adastra: For those who had an account on Occigen, we have retained group and login credentials from then; To connect to Adastra you need first go through the LMD gateway (hakim) or the IPSL (Ciclad/Spirit) gateway and then
ssh your_cines_login@adastra.cines.fr

And then you will probably want to switch project using the myproject command, e.g. to switch to "lmd1167" (the old "Atmosphères Planétaires" GENCI project)

myproject -a lmd1167

and to switch to "cin0391" (the 2023-2024 "Atmosphères Planétaires" GENCI project)

myproject -a cin0391

WARNING: when you switch projects, you also switch HOME directory etc.

To get all the info about dedicated environment variables (e.g. paths to SCRATCH, STORE, etc.) you can use

myproject -c
  • To get all the information about project accounting (number of hours available and used by each member of the project) you need to connect to https://reser.cines.fr/ using your Adastra login and password
  • Changing the password of your CINES account

When your password is close to expiring, CINES asks you to change it on this website : https://rosetta.cines.fr

Please note that you can access this website only if you are on a machine that you declared as a gateway for Adastra. At LMD, we have generally declared hakim.lmd.jussieu.fr (aka ssh-out) and spirit2.ipsl.fr as gateway machines. Hakim doesn't have any browser installed, but you can launch firefox on Spirit and connect to the rosetta website. If that doesn't work, check out the page on How to launch your local browser through a gateway machine or contact mail svp@cines.fr

  • Link to the webpage where you can find out (login and password are those of your Adastra account) how many hour left we have on the project and details about everyone's use of Adastra: https://reser.cines.fr

Disks and workspaces

myproject -s cin0391
  • In a nutshell: we have lots of space on the WORKDIR (250 To) which is "permanent" (unlike the SCRATCHDIR, where files older than 30 days are purged), so use it! And when you want to archive things, make some large tar files and put them on the STOREDIR

Transferring data from Irene

You can use the ccfr "speedway" between National computing centers to copy data from Irene to Adastra (it is all explained here: https://dci.dci-gitlab.cines.fr/webextranet/data_storage_and_transfers/index.html#between-computing-site-ccfr ). To summarize:

  1. First check that you indeed asked to have access to ccfr when you created your account. just run on Adastra the "id" command and check that you are a registered member of the "22011(cinesccfr)" group. If not, ask the CINES helpdesk svp@cines.fr
  2. Connect to Adastra the usual way, and once on Adastra "ssh adastra-ccfr.cines.fr", which should land you on "login1" which is the node enabled to use the ccfr connection
  3. Once on login1 you can transfert data from Irene via scp or rsync using the appropriate gateway machine (on the Irene side) which is "irene-fr-ccfr-gw.ccc.cea", e.g.:
rsync -avz irenelogin@irene-fr-ccfr-gw.ccc.cea:irene_path_to_your_data adastra_path_to_your_data

Submitting jobs

It's done using SLURM; you need to write up a job script and submit it using sbatch

sbatch myjob

You must specify in the header of the job which project ressources you are using ("cin0391" in our case):

#SBATCH --account=cin0391

Example of an MPI job to launch a simulation

#!/bin/bash
#SBATCH --job-name=job_mpi
#SBATCH --account=cin0391
### GENOA nodes accommodate 2 processors of 96 cores each, i.e. 192 cores overall
#SBATCH --constraint=GENOA
### Number of Nodes to use
#SBATCH --nodes=1
### Number of MPI tasks per node
#SBATCH --ntasks-per-node=48 
### Number of OpenMP threads per MPI task
#SBATCH --cpus-per-task=1
#SBATCH --threads-per-core=1
###SBATCH --exclusive
#SBATCH --output=job_mpi_%A.out
#SBATCH --time=00:45:00 

#source env modules:
source ../trunk/LMDZ.COMMON/arch.env 
ulimit -s unlimited

srun --cpu-bind=threads --label gcm_96x96x78_phyvenus_para.e > gcm.out 2>&1

Example of a mixed MPI/OpenMP job to launch a simulation

#!/bin/bash
#SBATCH --job-name=job_mpi_omp
#SBATCH --account=cin0391
### GENOA nodes accommodate 2 processors of 96 cores each, i.e. 192 cores overall
#SBATCH --constraint=GENOA
### Number of Nodes to use
#SBATCH --nodes=1
### Number of MPI tasks per node
#SBATCH --ntasks-per-node=24 
### Number of OpenMP threads per MPI task
#SBATCH --cpus-per-task=4
#SBATCH --threads-per-core=1
###SBATCH --exclusive
#SBATCH --output=job_mpi_omp_%A.out
#SBATCH --time=00:30:00 

#source env modules:
source ../trunk/LMDZ.COMMON/arch.env 
ulimit -s unlimited

### OMP_NUM_THREADS value must match "#SBATCH --cpus-per-task"
export OMP_NUM_THREADS=4
export OMP_STACKSIZE=400M

srun --cpu-bind=threads --label gcm_64x48x54_phymars_para.e > gcm.out 2>&1


Using python

If you want to use python on ADASTRA for quick analysis, you'll see that some basic packages are unavailable (ex : matplotlib). To solve this issue, you may install a virtual python environment. Note that ADASTRA allows the self maintenance of your environment on the /work and /scratch partition : you should not put it in your /home !

python3 -m venv virtual_environment

Then, you may want to activate the environment by doing :

source path/virtual_environment/bin/activate

You will see that the environment is active in your terminal with a (virtual_environment) at the beginning of your input line. When you are here, you can install any desired package with "pip". For exemple here are the command lines I had to use to get matplotlib to work.

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade Pillow

pip install matplotlib

You may see that some packages are required beforehand : in some cases, you will need to install them manually. When all packages are done installing, you may use python as you please if the virtual environment is active in your terminal !

Using Ferret

Ferret is installed on Adastra, but not (yet) as a standard module to load... To be ables to use Ferret you need to do the following:

module load develop
module load GCC-CPU-2.1.0
module load ferret/7.6.0

Using gdb4hpc

This is the default (only) debugger available... to use it you need to:

  1. Launch a request for an allocation on a compute node:
     salloc --account=cin0391 --constraint=GENOA --job-name="debug" --nodes=1 --time=1:00:00 --exclusive
    
  2. Identify which node it is linked to and directly ssh (from login node) to it, e.g. if it is node "c1516"
     ssh c1516
    
  3. source your usual environment and then the gdb4hpc module
     module load gdb4hpc/4.16.0.1
  4. Go to your work directory and launch gdb4hpc
  5. within gdb4hpc:
     dbg all> launch $a{1} --launcher-args="--mpi=cray_shasta -A cin0391 --constraint=GENOA -t 00:30:00 -N 1 --cpu-bind=verbose,cores --exclusive"  ./executable.exe

Once everything running, the first thing you have to do is set a breakpoint at the beginning of the program, e.g.:

break icosa_lmdz.f90:1

And then "continue" to that point

Are you being disconnected when inactive?

If you are regularly being disconnected when a bit inactive on the supercomputer, adding these few lines in a config file in the .ssh/ repository of your logging machine (ex: ssh-out/spirit) may help :

Host *
...
KeepAlive yes
TCPKeepAlive yes
ServerAliveInterval 15