Difference between revisions of "Using Adastra"
(16 intermediate revisions by 4 users not shown) | |||
Line 5: | Line 5: | ||
# Go to https://www.edari.fr/utilisateur and log in via Janus or create an account if you don't have a Janus login. If this doesn't work, you can create a new eDARI account. (Make sure your profile is fully up to date including nationality) | # Go to https://www.edari.fr/utilisateur and log in via Janus or create an account if you don't have a Janus login. If this doesn't work, you can create a new eDARI account. (Make sure your profile is fully up to date including nationality) | ||
+ | # Beware! If you are on 2 lab (LMD and LATMOS for example), you must register with your email address corresponding to your Janus account. | ||
# Click on "se rattacher à un dossier ayant obtenu des resources" or "Attach yourself to an application file that has obtained resources" | # Click on "se rattacher à un dossier ayant obtenu des resources" or "Attach yourself to an application file that has obtained resources" | ||
− | # "Atmosphères Planétaires" project number to provide: | + | # "Atmosphères Planétaires" project number to provide: A0160110391 |
# Ehouarn then receives an email to allow you to join the project. Once he has validated it, you receive a confirmation mail. | # Ehouarn then receives an email to allow you to join the project. Once he has validated it, you receive a confirmation mail. | ||
# Once approved, you have to request for an account, click on "CINES: créer une demande d'ouverture de compte" | # Once approved, you have to request for an account, click on "CINES: créer une demande d'ouverture de compte" | ||
# fill in the forms: name, contract end date, CINES, your lab information (LMD is the default) | # fill in the forms: name, contract end date, CINES, your lab information (LMD is the default) | ||
# Access IP address 134.157.47.46 , FQDN (Fully Qualified Domain Name): ssh-out.lmd.jussieu.fr | # Access IP address 134.157.47.46 , FQDN (Fully Qualified Domain Name): ssh-out.lmd.jussieu.fr | ||
− | # Add a second | + | # Add a second address : 134.157.176.129 , FQDN: spirit2.ipsl.fr |
# click on option to have access to CCFR (only important if you have access to other GENCI machines) | # click on option to have access to CCFR (only important if you have access to other GENCI machines) | ||
# Security officer is Julien Lenseigne for LMD (his informations are all pre-filled, except phone: +33169335172) | # Security officer is Julien Lenseigne for LMD (his informations are all pre-filled, except phone: +33169335172) | ||
# YOU MUST THEN VALIDATE THE REQUEST: click on the "Valider la saisie des informations" | # YOU MUST THEN VALIDATE THE REQUEST: click on the "Valider la saisie des informations" | ||
# You then receive an automatic mail, but it's only to tell you to go to the next step: You must now download the pre-filled form from e-dari: find "télécharger la demande" and download the pdf. Sign it, and upload it on e-dari "déposer la demande de création de compte". | # You then receive an automatic mail, but it's only to tell you to go to the next step: You must now download the pre-filled form from e-dari: find "télécharger la demande" and download the pdf. Sign it, and upload it on e-dari "déposer la demande de création de compte". | ||
− | # Wait for your application to be preprocessed by the system... | + | # Wait for your application to be preprocessed by the system... |
== A couple of pointers == | == A couple of pointers == | ||
Line 38: | Line 39: | ||
myproject -c | myproject -c | ||
</syntaxhighlight> | </syntaxhighlight> | ||
+ | |||
+ | * To get all the information about project accounting (number of hours available and used by each member of the project) you need to connect to https://reser.cines.fr/ using your Adastra login and password | ||
* Changing the password of your CINES account | * Changing the password of your CINES account | ||
When your password is close to expiring, CINES asks you to change it on this website : https://rosetta.cines.fr | When your password is close to expiring, CINES asks you to change it on this website : https://rosetta.cines.fr | ||
− | Please note that you can access this website only if you are on a machine that you declared as a gateway for Adastra. At LMD, we have generally declared hakim.lmd.jussieu.fr (aka ssh-out) and | + | Please note that you can access this website only if you are on a machine that you declared as a gateway for Adastra. At LMD, we have generally declared hakim.lmd.jussieu.fr (aka ssh-out) and spirit2.ipsl.fr as gateway machines. Hakim doesn't have any browser installed, but you can launch <code>firefox</code> on Spirit and connect to the rosetta website. |
− | If that doesn't work, | + | If that doesn't work, check out the page on [[How to launch your local browser through a gateway machine]] or contact mail svp@cines.fr |
* Link to the Adastra technical documentation: https://dci.dci-gitlab.cines.fr/webextranet/ | * Link to the Adastra technical documentation: https://dci.dci-gitlab.cines.fr/webextranet/ | ||
+ | |||
+ | * Link to the webpage where you can find out (login and password are those of your Adastra account) how many hour left we have on the project and details about everyone's use of Adastra: https://reser.cines.fr | ||
+ | |||
+ | == Disks and workspaces == | ||
+ | * all the details are on the Adastra documentation: https://dci.dci-gitlab.cines.fr/webextranet/data_storage_and_transfers/index.html | ||
+ | * If you want to know the current quota (in HOMEDIR, WORKDIR and SCRATCHDIR) allocated to the project (yes quotas are for the whole group): | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | myproject -s cin0391 | ||
+ | </syntaxhighlight> | ||
+ | * In a nutshell: we have lots of space on the WORKDIR (250 To) which is "permanent" (unlike the SCRATCHDIR, where files older than 30 days are purged), so use it! And when you want to archive things, make some large tar files and put them on the STOREDIR | ||
+ | === Transferring data from Irene === | ||
+ | You can use the ccfr "speedway" between National computing centers to copy data from Irene to Adastra (it is all explained here: https://dci.dci-gitlab.cines.fr/webextranet/data_storage_and_transfers/index.html#between-computing-site-ccfr ). To summarize: | ||
+ | # First check that you indeed asked to have access to ccfr when you created your account. just run on Adastra the "id" command and check that you are a registered member of the "22011(cinesccfr)" group. If not, ask the CINES helpdesk svp@cines.fr | ||
+ | # Connect to Adastra the usual way, and once on Adastra "ssh adastra-ccfr.cines.fr", which should land you on "login1" which is the node enabled to use the ccfr connection | ||
+ | # Once on login1 you can transfert data from Irene via scp or rsync using the appropriate gateway machine (on the Irene side) which is "irene-fr-ccfr-gw.ccc.cea", e.g.: | ||
+ | <pre> | ||
+ | rsync -avz irenelogin@irene-fr-ccfr-gw.ccc.cea:irene_path_to_your_data adastra_path_to_your_data | ||
+ | </pre> | ||
== Submitting jobs == | == Submitting jobs == | ||
Line 62: | Line 83: | ||
#SBATCH --job-name=job_mpi | #SBATCH --job-name=job_mpi | ||
#SBATCH --account=cin0391 | #SBATCH --account=cin0391 | ||
− | ### GENOA nodes accommodate 96 cores | + | ### GENOA nodes accommodate 2 processors of 96 cores each, i.e. 192 cores overall |
#SBATCH --constraint=GENOA | #SBATCH --constraint=GENOA | ||
### Number of Nodes to use | ### Number of Nodes to use | ||
Line 88: | Line 109: | ||
#SBATCH --job-name=job_mpi_omp | #SBATCH --job-name=job_mpi_omp | ||
#SBATCH --account=cin0391 | #SBATCH --account=cin0391 | ||
− | ### GENOA nodes accommodate 96 cores | + | ### GENOA nodes accommodate 2 processors of 96 cores each, i.e. 192 cores overall |
#SBATCH --constraint=GENOA | #SBATCH --constraint=GENOA | ||
### Number of Nodes to use | ### Number of Nodes to use | ||
Line 111: | Line 132: | ||
srun --cpu-bind=threads --label gcm_64x48x54_phymars_para.e > gcm.out 2>&1 | srun --cpu-bind=threads --label gcm_64x48x54_phymars_para.e > gcm.out 2>&1 | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | |||
+ | == Using python == | ||
+ | |||
+ | If you want to use python on ADASTRA for quick analysis, you'll see that some basic packages are unavailable (ex : matplotlib). To solve this issue, you may install a virtual python environment. Note that ADASTRA allows the self maintenance of your environment on the /work and /scratch partition : you should not put it in your /home ! | ||
+ | |||
+ | <syntaxhighlight lang="bash"> | ||
+ | |||
+ | python3 -m venv virtual_environment | ||
+ | |||
+ | </syntaxhighlight> | ||
+ | |||
+ | Then, you may want to activate the environment by doing : | ||
+ | |||
+ | <syntaxhighlight lang="bash"> | ||
+ | |||
+ | source path/virtual_environment/bin/activate | ||
+ | |||
+ | </syntaxhighlight> | ||
+ | |||
+ | You will see that the environment is active in your terminal with a (virtual_environment) at the beginning of your input line. When you are here, you can install any desired package with "pip". For exemple here are the command lines I had to use to get matplotlib to work. | ||
+ | |||
+ | <syntaxhighlight lang="bash"> | ||
+ | |||
+ | python3 -m pip install --upgrade pip | ||
+ | python3 -m pip install --upgrade Pillow | ||
+ | |||
+ | pip install matplotlib | ||
+ | |||
+ | </syntaxhighlight> | ||
+ | |||
+ | You may see that some packages are required beforehand : in some cases, you will need to install them manually. When all packages are done installing, you may use python as you please if the virtual environment is active in your terminal ! | ||
+ | |||
+ | == Using Ferret == | ||
+ | Ferret is installed on Adastra, but not (yet) as a standard module to load... To be ables to use Ferret you need to do the following: | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | module load develop | ||
+ | module load GCC-CPU-2.1.0 | ||
+ | module load ferret/7.6.0 | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | == Using gdb4hpc == | ||
+ | This is the default (only) debugger available... to use it you need to: | ||
+ | # Launch a request for an allocation on a compute node: <syntaxhighlight lang="bash"> salloc --account=cin0391 --constraint=GENOA --job-name="debug" --nodes=1 --time=1:00:00 --exclusive </syntaxhighlight> | ||
+ | # Identify which node it is linked to and directly ssh (from login node) to it, e.g. if it is node "c1516" <syntaxhighlight lang="bash"> ssh c1516 </syntaxhighlight> | ||
+ | # source your usual environment and then the gdb4hpc module <syntaxhighlight> module load gdb4hpc/4.16.0.1 </syntaxhighlight> | ||
+ | # Go to your work directory and launch gdb4hpc | ||
+ | # within gdb4hpc: <syntaxhighlight> dbg all> launch $a{1} --launcher-args="--mpi=cray_shasta -A cin0391 --constraint=GENOA -t 00:30:00 -N 1 --cpu-bind=verbose,cores --exclusive" ./executable.exe </syntaxhighlight> | ||
+ | |||
+ | Once everything running, the first thing you have to do is set a breakpoint at the beginning of the program, e.g.: | ||
+ | <syntaxhighlight> | ||
+ | break icosa_lmdz.f90:1 | ||
+ | </syntaxhighlight> | ||
+ | And then "continue" to that point | ||
+ | |||
+ | == Are you being disconnected when inactive? == | ||
+ | If you are regularly being disconnected when a bit inactive on the supercomputer, adding these few lines in a ''config'' file in the .ssh/ repository of your logging machine (ex: ssh-out/spirit) may help : | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | Host * | ||
+ | ... | ||
+ | KeepAlive yes | ||
+ | TCPKeepAlive yes | ||
+ | ServerAliveInterval 15 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
[[Category:FAQ]] | [[Category:FAQ]] |
Latest revision as of 09:20, 14 October 2024
This page provides a summary of examples and tools designed to help you get used with the Adastra environment.
Contents
Getting access to the cluster
For people on the "Atmosphères Planétaires" GENCI project who need to open an account on Adastra, here is the procedure:
- Go to https://www.edari.fr/utilisateur and log in via Janus or create an account if you don't have a Janus login. If this doesn't work, you can create a new eDARI account. (Make sure your profile is fully up to date including nationality)
- Beware! If you are on 2 lab (LMD and LATMOS for example), you must register with your email address corresponding to your Janus account.
- Click on "se rattacher à un dossier ayant obtenu des resources" or "Attach yourself to an application file that has obtained resources"
- "Atmosphères Planétaires" project number to provide: A0160110391
- Ehouarn then receives an email to allow you to join the project. Once he has validated it, you receive a confirmation mail.
- Once approved, you have to request for an account, click on "CINES: créer une demande d'ouverture de compte"
- fill in the forms: name, contract end date, CINES, your lab information (LMD is the default)
- Access IP address 134.157.47.46 , FQDN (Fully Qualified Domain Name): ssh-out.lmd.jussieu.fr
- Add a second address : 134.157.176.129 , FQDN: spirit2.ipsl.fr
- click on option to have access to CCFR (only important if you have access to other GENCI machines)
- Security officer is Julien Lenseigne for LMD (his informations are all pre-filled, except phone: +33169335172)
- YOU MUST THEN VALIDATE THE REQUEST: click on the "Valider la saisie des informations"
- You then receive an automatic mail, but it's only to tell you to go to the next step: You must now download the pre-filled form from e-dari: find "télécharger la demande" and download the pdf. Sign it, and upload it on e-dari "déposer la demande de création de compte".
- Wait for your application to be preprocessed by the system...
A couple of pointers
- Connecting to Adastra: For those who had an account on Occigen, we have retained group and login credentials from then; To connect to Adastra you need first go through the LMD gateway (hakim) or the IPSL (Ciclad/Spirit) gateway and then
ssh your_cines_login@adastra.cines.fr
And then you will probably want to switch project using the myproject command, e.g. to switch to "lmd1167" (the old "Atmosphères Planétaires" GENCI project)
myproject -a lmd1167
and to switch to "cin0391" (the 2023-2024 "Atmosphères Planétaires" GENCI project)
myproject -a cin0391
WARNING: when you switch projects, you also switch HOME directory etc.
To get all the info about dedicated environment variables (e.g. paths to SCRATCH, STORE, etc.) you can use
myproject -c
- To get all the information about project accounting (number of hours available and used by each member of the project) you need to connect to https://reser.cines.fr/ using your Adastra login and password
- Changing the password of your CINES account
When your password is close to expiring, CINES asks you to change it on this website : https://rosetta.cines.fr
Please note that you can access this website only if you are on a machine that you declared as a gateway for Adastra. At LMD, we have generally declared hakim.lmd.jussieu.fr (aka ssh-out) and spirit2.ipsl.fr as gateway machines. Hakim doesn't have any browser installed, but you can launch firefox
on Spirit and connect to the rosetta website.
If that doesn't work, check out the page on How to launch your local browser through a gateway machine or contact mail svp@cines.fr
- Link to the Adastra technical documentation: https://dci.dci-gitlab.cines.fr/webextranet/
- Link to the webpage where you can find out (login and password are those of your Adastra account) how many hour left we have on the project and details about everyone's use of Adastra: https://reser.cines.fr
Disks and workspaces
- all the details are on the Adastra documentation: https://dci.dci-gitlab.cines.fr/webextranet/data_storage_and_transfers/index.html
- If you want to know the current quota (in HOMEDIR, WORKDIR and SCRATCHDIR) allocated to the project (yes quotas are for the whole group):
myproject -s cin0391
- In a nutshell: we have lots of space on the WORKDIR (250 To) which is "permanent" (unlike the SCRATCHDIR, where files older than 30 days are purged), so use it! And when you want to archive things, make some large tar files and put them on the STOREDIR
Transferring data from Irene
You can use the ccfr "speedway" between National computing centers to copy data from Irene to Adastra (it is all explained here: https://dci.dci-gitlab.cines.fr/webextranet/data_storage_and_transfers/index.html#between-computing-site-ccfr ). To summarize:
- First check that you indeed asked to have access to ccfr when you created your account. just run on Adastra the "id" command and check that you are a registered member of the "22011(cinesccfr)" group. If not, ask the CINES helpdesk svp@cines.fr
- Connect to Adastra the usual way, and once on Adastra "ssh adastra-ccfr.cines.fr", which should land you on "login1" which is the node enabled to use the ccfr connection
- Once on login1 you can transfert data from Irene via scp or rsync using the appropriate gateway machine (on the Irene side) which is "irene-fr-ccfr-gw.ccc.cea", e.g.:
rsync -avz irenelogin@irene-fr-ccfr-gw.ccc.cea:irene_path_to_your_data adastra_path_to_your_data
Submitting jobs
It's done using SLURM; you need to write up a job script and submit it using sbatch
sbatch myjob
You must specify in the header of the job which project ressources you are using ("cin0391" in our case):
#SBATCH --account=cin0391
Example of an MPI job to launch a simulation
#!/bin/bash
#SBATCH --job-name=job_mpi
#SBATCH --account=cin0391
### GENOA nodes accommodate 2 processors of 96 cores each, i.e. 192 cores overall
#SBATCH --constraint=GENOA
### Number of Nodes to use
#SBATCH --nodes=1
### Number of MPI tasks per node
#SBATCH --ntasks-per-node=48
### Number of OpenMP threads per MPI task
#SBATCH --cpus-per-task=1
#SBATCH --threads-per-core=1
###SBATCH --exclusive
#SBATCH --output=job_mpi_%A.out
#SBATCH --time=00:45:00
#source env modules:
source ../trunk/LMDZ.COMMON/arch.env
ulimit -s unlimited
srun --cpu-bind=threads --label gcm_96x96x78_phyvenus_para.e > gcm.out 2>&1
Example of a mixed MPI/OpenMP job to launch a simulation
#!/bin/bash
#SBATCH --job-name=job_mpi_omp
#SBATCH --account=cin0391
### GENOA nodes accommodate 2 processors of 96 cores each, i.e. 192 cores overall
#SBATCH --constraint=GENOA
### Number of Nodes to use
#SBATCH --nodes=1
### Number of MPI tasks per node
#SBATCH --ntasks-per-node=24
### Number of OpenMP threads per MPI task
#SBATCH --cpus-per-task=4
#SBATCH --threads-per-core=1
###SBATCH --exclusive
#SBATCH --output=job_mpi_omp_%A.out
#SBATCH --time=00:30:00
#source env modules:
source ../trunk/LMDZ.COMMON/arch.env
ulimit -s unlimited
### OMP_NUM_THREADS value must match "#SBATCH --cpus-per-task"
export OMP_NUM_THREADS=4
export OMP_STACKSIZE=400M
srun --cpu-bind=threads --label gcm_64x48x54_phymars_para.e > gcm.out 2>&1
Using python
If you want to use python on ADASTRA for quick analysis, you'll see that some basic packages are unavailable (ex : matplotlib). To solve this issue, you may install a virtual python environment. Note that ADASTRA allows the self maintenance of your environment on the /work and /scratch partition : you should not put it in your /home !
python3 -m venv virtual_environment
Then, you may want to activate the environment by doing :
source path/virtual_environment/bin/activate
You will see that the environment is active in your terminal with a (virtual_environment) at the beginning of your input line. When you are here, you can install any desired package with "pip". For exemple here are the command lines I had to use to get matplotlib to work.
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade Pillow
pip install matplotlib
You may see that some packages are required beforehand : in some cases, you will need to install them manually. When all packages are done installing, you may use python as you please if the virtual environment is active in your terminal !
Using Ferret
Ferret is installed on Adastra, but not (yet) as a standard module to load... To be ables to use Ferret you need to do the following:
module load develop
module load GCC-CPU-2.1.0
module load ferret/7.6.0
Using gdb4hpc
This is the default (only) debugger available... to use it you need to:
- Launch a request for an allocation on a compute node:
salloc --account=cin0391 --constraint=GENOA --job-name="debug" --nodes=1 --time=1:00:00 --exclusive
- Identify which node it is linked to and directly ssh (from login node) to it, e.g. if it is node "c1516"
ssh c1516
- source your usual environment and then the gdb4hpc module
module load gdb4hpc/4.16.0.1
- Go to your work directory and launch gdb4hpc
- within gdb4hpc:
dbg all> launch $a{1} --launcher-args="--mpi=cray_shasta -A cin0391 --constraint=GENOA -t 00:30:00 -N 1 --cpu-bind=verbose,cores --exclusive" ./executable.exe
Once everything running, the first thing you have to do is set a breakpoint at the beginning of the program, e.g.:
break icosa_lmdz.f90:1
And then "continue" to that point
Are you being disconnected when inactive?
If you are regularly being disconnected when a bit inactive on the supercomputer, adding these few lines in a config file in the .ssh/ repository of your logging machine (ex: ssh-out/spirit) may help :
Host *
...
KeepAlive yes
TCPKeepAlive yes
ServerAliveInterval 15