Difference between revisions of "Using Irene Rome"
(→Pure MPI + long run) |
(→Pure MPI + long run) |
||
Line 142: | Line 142: | ||
(ideally a multiple of 128, which is the number of CPUs per core). | (ideally a multiple of 128, which is the number of CPUs per core). | ||
− | Runs longer than | + | Runs longer than 1 day are not possible unless |
selecting a quality-of-service (QoS) "long" | selecting a quality-of-service (QoS) "long" | ||
Revision as of 23:03, 14 September 2022
This page provides a summary of examples and tools designed to help you get used with the Irene Rome environment. (as of July 2022)
Contents
How to access the cluster
For people on the "Atmosphères Planétaires" GENCI project who need to open an account on Irene-Rome, here is the procedure:
- Log on to https://www-dcc.extra.cea.fr/CCFR/ and provide various information about yourself
A few tips:
- chose TGCC
- give your PROFESSIONAL phone number (and not your personal cell phone number)
- name of the project: Atmosphères Planétaires Numéro du Dossier: A0120110391
- Responsable scientifique du projet: M. Ehouarn MILLOUR , ehouarn.millour@lmd.ipsl.fr, 0144275286, Nationalité: Fr
- Responsable sécurité: M. Franck Guyon, franck.guyon@lmd.ipsl.fr, 0144275277, Nationalité: Fr
- IPs & machine names to connect to Irene: 134.157.47.46 (ssh-out.lmd.jussieu.fr) and 134.157.176.129 (ciclad.ipsl.upmc.fr)
- Chose anything you want for the 8 character password
- And then get Ehouarn to sign the form and forward it to Franck for him to sign as well.
- Send the signed form to hotline.tgcc@cea.fr
Some useful commands
- To access the disks of our project on Irene ("Atmosphères Planétaires" GENCI project), add the following line in your .bashrc file:
module switch dfldatadir/gen10391
- To access your work directory (to run your simulations)
cd /ccc/work/cont003/gen10391/
you can also access the work directory with:
cd $CCCWORKDIR
- To access your store directory (to store big data files we are limited in inode number not in filesize! It is recommended to store files of at least 50M, preferably more, e.g. big tar files of 10G or more)
cd /ccc/work/cont003/gen10391/
you can also access the store directory with:
cd $CCCSTOREDIR
- To access the scratch directory
cd $CCCSCRATCHDIR
IMPORTANT: the scratchdir is fast access, very big, BUT regularly automatically purged! If you use it do remember to backup stuff on the WORKDIR or STOREDIR.
The scratch purge policy (from machine.info):
* Files not accessed for 60 days are automatically purged * Symbolic links are not purged * Directories that have been empty for more than 30 days are removed
- To access Irene Interactive Documentation:
machine.info
NB: you can also access the online documentation here: http://www-hpc.cea.fr/tgcc-public/en/html/toc/fulldoc/Introduction.html
- To display infos about project accounting:
ccc_myproject
- To know about user and group disk quota
ccc_quota -a
- To know about how long your passwd will be active:
ccc_password_expiration
- To change passwd:
passwd
Example of a job to run a GCM simulation
Mixed openMP / MPI
#!/bin/bash
# Partition to run on:
#MSUB -q rome
# project to run on
#MSUB -A gen10391
# disks to use
#MSUB -m scratch,work,store
# Job name
#MSUB -r run_gcm
# Job standard output:
#MSUB -o run_gcm.%I
# Job standard error:
#MSUB -e run_gcm.%I
# number of OpenMP threads c
#MSUB -c 2
# number of MPI tasks n
#MSUB -n 16
# number of nodes to use N
#MSUB -N 1
# max job run time T (in seconds)
#MSUB -T 3600
# request exculsive use of the node (128 cores)
##MSUB -x
source ../trunk/LMDZ.COMMON/arch.env
export OMP_STACKSIZE=400M
export OMP_NUM_THREADS=2
ccc_mprun -l gcm_32x32x15_phystd_para.e > gcm.out 2>&1
Pure MPI + long run
The most important parameter for mpi-only runs is -n providing the total number of CPUs (ideally a multiple of 128, which is the number of CPUs per core).
Runs longer than 1 day are not possible unless selecting a quality-of-service (QoS) "long"
#!/bin/bash
# Partition to run on:
#MSUB -q rome
# project to run on
#MSUB -A gen10391
# disks to use
#MSUB -m scratch,work,store
# Job name
#MSUB -r run_wrf
# Job standard output:
#MSUB -o run_wrf.%I
# Job standard error:
#MSUB -e run_wrf.%I
# number of MPI tasks n (total)
#MSUB -n 256
# max job run time T (in seconds)
#MSUB -T 345600
# select quality-of-service
# - test < 30min
# - normal < 1d (default)
# - long < 3d
#MSUB -Q long
#############################
## WRF 257x257 with 1024 proc
## leads to 8x8 tiles
## 32 tasks over X
## 32 tasks over Y
#############################
## WRF 129x129 with 256 proc
## leads to 8x8 tiles
## 16 tasks over X
## 16 tasks over Y
#############################
# load the modules used to compile
source arch.env
# clean the logs
rm -rf rsl.*
# create initial state
# -- this is done on a single proc
ideal.exe
mv rsl.error.0000 ideal_rsl.error.0000
mv rsl.out.0000 ideal_rsl.out.0000
# main launch
ccc_mprun -l wrf.exe
Main submission commands
- To launch the job script run_gcm.job:
ccc_msub run_gcm.job
- To display information about your jobs:
ccc_mpp -u $USER
- To kill job number jobid
ccc_mdel jobid
- To display infos about project accounting:
ccc_myproject
- To display infos about limits:
ccc_mqinfo
Extra Tips
- If you encounter a quota issue on Irene, first check:
ccc_quota
if you ha "disk quota exceeded error message", this might be because your files/scripts do not have correct right access. To solve this, use the following command on all your dirs (before tranfering them to Irene):
chmod -R g+s NAME_OF_DIR
File transfert from Occigen
One should use ccfr:
module load ccfr
A list of available machines is given by
ccfr_ssh -v
To log on to Occigen (from Irene):
ccfr_ssh occigenlogin@cines
To copy a file:
ccfr_cp occigenlogin@cines:remote_dir local_dir
Worth knowing about
- The command wget is disabled on Irene, scripts using it will fail...
- Only "https" is allowed (for svn co, git, etc)
In case you reach a quota issue, please use the following command on your directory before sending the data:
chmod -R g+s NAME_OF_DIR