icpl

Partiitons on ICPL Cluster

seker/survey
Partition Name: seker
By core (limited for 120 minutes)
socket
Partition Name: socket
One or more cpu (= socket) per job on one or multiple nodes for unlimited time.
core
Partition Name: core
One or more core on one or multiple nodes for unlimited time.

Slurm Commands

sbatch

Job scripts are submitted with the sbatch command, e.g.:

% sbatch hello.slurm

The job identification number is returned when you submit the job, e.g.:

% sbatch hello.slurm
Submitted batch job 18341

For further information about the sbatch command, type man squeue on the gate server.

squeue

Displaying job status

The squeue command is used to obtain status information about all jobs submitted to all queues.
Without any specified options, the squeue command provides a display which is similar to the following:

For further information about the squeue command, type man squeue on the gate server.

scancel

SLURM provides the scancel command for deleting jobs from the system using the job identification number:

% scancel <JOBID>

If you did not note the job identification number (JOBID) when it was submitted, you can use squeue to retrieve it.
For further information about the scancel command, type man squeue on the gate server.

srun

run a job on slurm cluster directly from the shell, e.g.:

% hostname
newgate1.phys.huji.ac.il
% srun hostname
ibm12g10

note the second command ran on a remote host.

run a 24 cpu's program.
srun -n 24 --partition=parallel mpirun ~/hello_world.

the prompt will return only after the command is finished. output will be directed to STDOU and STDERR.

smq

alias to:
\squeue -o "%.10i %20j %10u %3t %10m %12M %15l %6D %5C %9P %9q %R"

alias to:
squeue -u $USER

Running in interactive mode

srsh

The srsh command is an alias used to open interactive shell on another node.
srun --qos=serial --partition=serial -J srsh.$USER --pty /bin/tcsh -l

Job Submit Example

Submit file examle

create submit file named myjob.sbatch with the following contents as example:
( Note: #SBATCH is a slurm command not a remark, ##SBATCH is a remark )

#!/bin/sh
#
#SBATCH --job-name=myjob
#SBATCH --output=out.%x.out_%j
#SBATCH --error=out.%x.err_%j
## for mpi program
#SBATCH --partition=core
#SBATCH --ntasks=64
## for serial program
##SBATCH --partition=socket
##SBATCH --ntasks=1

### here you can load modules. example:
## ml purge
## ml load gcc/10.1.0 openmpi/4.0.4/gcc/10.1.0 slurm/19.05.2 python3/3.6.3

exec=myprogram
echo "limits: "
ulimit -a
echo ""
echo "= submitted from: `hostname`"
echo "= submitted from: $SLURM_SUBMIT_HOST"
echo "= Current working directory is `pwd`"
echo "= Running on hosts: $SLURM_NODELIST"
echo "= Running on $SLURM_NNODES nodes."
echo "= Running $SLURM_NTASKS tasks."
echo "= ldd of ${exec} "
ldd ${exec}
echo "= mpi: `which mpirun` "

## for mpi program use
mpirun --display-map --display-allocation --report-bindings ${exec}
## for non mpi program use
## ${exec}

submit the job by running: sbatch myjob.sbatch


Faculty of Science Racah Institute of Physics ICPL site

Running Jobs On ICPL Cluster
Accessing ICPL Cluster In order to run jobs you must first access the gate servers. The gate servers connect between the internet and the university network on one side and the ICPL Cluster network the other. Accessing the gate server from outside the university network is permitted only using ssh protocol on port 22345. The gate servers are : newgate1.phys.huji.ac.il newgate2.phys.huji.ac.il newgate3.phys.huji.ac.il newgate4.phys.huji.ac.il Currently installed operation system is Linux CentOS For more information about Linux operating system check CentOS. For more information about linux shell commands check explainshell. Unix, Linux & MAC OS users can access by open a terminal and run: ssh <USERNAME>@<SERVERNAME> Or : ssh <USERNAME>@newgate1.phys.huji.ac.il Windows users should install ssh client such as putty. X-server for windows ( to open X11 windows ) : xming Change Port 22 to 22345 <username> should be replaced with the username you received. Password will be sent seperatly (sms,phone). After initial login you should change your password using the command yppasswd. Working with Environment Modules User environment is managed using Lmod, Lua based module system that provides a convenient way to dynamically change the users' environment through modulefiles. For more information about - Lmod View loaded modules ml module list View available modules module avail module avail openmpi Add module ml <MODULENAME> module add <MODULENAME> Del module ml -<MODULENAME> module add <MODULENAME> Add a compiler ml intel_parallel_studio_xe module add intel_parallel_studio_xe Add mpi ml openmpi module add openmpi Remove mpi ml -openmpi Replace mpi with a newer version module swap openmpi/1.6.3 openmpi/1.10.2 Submitting a job using slurm resource manager Jobs on ICPL Cluster are run as batch jobs in an unattended manner or as a shell terminal in interactive mode. Jobs are scripts with instructions how and where to execute your work. Typically a user logs in to the gate servers, prepares a job and submits it to the job queue. The user can then disconnect from the system without interupting the job. the job will continue to run on a designated node and the user can collect the data, read the output files etc. Further information about slurm and slurm commands can be found here. Jobs are managed by Slurm, which is in charge of allocating the computer resources requested for the job running the job and reporting the outcome of the execution back to the user. Running a job involves, at the minimum, the following steps Preparing a submission script and Submitting the job for execution. Partiitons on ICPL Cluster seker/survey Partition Name: seker By core (limited for 120 minutes) socket Partition Name: socket One or more cpu (= socket) per job on one or multiple nodes for unlimited time. core Partition Name: core One or more core on one or multiple nodes for unlimited time. Slurm Commands sbatch Job scripts are submitted with the sbatch command, e.g.: % sbatch hello.slurm The job identification number is returned when you submit the job, e.g.: % sbatch hello.slurm Submitted batch job 18341 For further information about the sbatch command, type man squeue on the gate server. squeue Displaying job status The squeue command is used to obtain status information about all jobs submitted to all queues. Without any specified options, the squeue command provides a display which is similar to the following: For further information about the squeue command, type man squeue on the gate server. scancel SLURM provides the scancel command for deleting jobs from the system using the job identification number: % scancel <JOBID> If you did not note the job identification number (JOBID) when it was submitted, you can use squeue to retrieve it. For further information about the scancel command, type man squeue on the gate server. srun run a job on slurm cluster directly from the shell, e.g.: % hostname newgate1.phys.huji.ac.il % srun hostname ibm12g10 note the second command ran on a remote host. run a 24 cpu's program. srun -n 24 --partition=parallel mpirun ~/hello_world. the prompt will return only after the command is finished. output will be directed to STDOU and STDERR. smq alias to: \squeue -o "%.10i %20j %10u %3t %10m %12M %15l %6D %5C %9P %9q %R" sq alias to: squeue -u $USER Running in interactive mode srsh The srsh command is an alias used to open interactive shell on another node. srun --qos=serial --partition=serial -J srsh.$USER --pty /bin/tcsh -l Job Submit Example Submit file examle create submit file named myjob.sbatch with the following contents as example: ( Note: #SBATCH is a slurm command not a remark, ##SBATCH is a remark ) #!/bin/sh # #SBATCH --job-name=myjob #SBATCH --output=out.%x.out_%j #SBATCH --error=out.%x.err_%j ## for mpi program #SBATCH --partition=core #SBATCH --ntasks=64 ## for serial program ##SBATCH --partition=socket ##SBATCH --ntasks=1 ### here you can load modules. example: ## ml purge ## ml load gcc/10.1.0 openmpi/4.0.4/gcc/10.1.0 slurm/19.05.2 python3/3.6.3 exec=myprogram echo "limits: " ulimit -a echo "" echo "= submitted from: `hostname`" echo "= submitted from: $SLURM_SUBMIT_HOST" echo "= Current working directory is `pwd`" echo "= Running on hosts: $SLURM_NODELIST" echo "= Running on $SLURM_NNODES nodes." echo "= Running $SLURM_NTASKS tasks." echo "= ldd of ${exec} " ldd ${exec} echo "= mpi: `which mpirun` " ## for mpi program use mpirun --display-map --display-allocation --report-bindings ${exec} ## for non mpi program use ## ${exec} submit the job by running: sbatch myjob.sbatch