Faculty of Science     Racah Institute of Physics     Icpl Cluster

Running Jobs On ICPL Cluster  

Accessing ICPL Cluster

In order to run jobs you must first access the gate servers. The gate servers connect between the internet and
the university network on one side and the ICPL Cluster network the other.
Accessing the gate server from outside the university network is permitted only using ssh protocol on port 22.

The gate servers are :

Currently installed operation system is Linux CentOS 6.6.
For more information about Linux operating system check CentOS.
For more information about linux shell commands check explainshell.

Unix, Linux & MAC OS users can access by open a terminal and run:

Or :
ssh <USERNAME>@newgate1.phys.huji.ac.il

Windows users should install ssh client such as putty.
X-server for windows ( to open X11 windows )  : xming

<username> should be replaced with the username you received.
Password will be sent seperatly (sms,phone).

After initial login you should change your password using the command yppasswd.

Working with Environment Modules

User environment is managed using Lmod, Lua based module system that provides a convenient way
to dynamically change the users' environment through modulefiles.
For more information about - Lmod

View loaded modules
module list

View available modules
module avail
module avail openmpi

Add module
module add <MODULENAME>

Del module
module add <MODULENAME>

Add a compiler
ml intel_parallel_studio_xe
module add intel_parallel_studio_xe

Add mpi
ml openmpi
module add openmpi

Remove mpi
ml -openmpi

Replace mpi with a newer version
module swap openmpi/1.6.3 openmpi/1.10.2

Submitting a job using slurm resource manager

Jobs on ICPL Cluster are run as batch jobs in an unattended manner or as a shell terminal in interactive mode.
Jobs are scripts with instructions how and where to execute your work.
Typically a user logs in to the gate servers, prepares a job and submits it to the job queue.
The user can then disconnect from the system without interupting the job. the job will continue to run on
a designated node and the user can collect the data, read the output files etc.
Further information about slurm and slurm commands can be found here.
Jobs are managed by Slurm, which is in charge of
  • allocating the computer resources requested for the job
    server is x86_64 machine with 2 cpu's and 4 - 8 Cores per cpu.
  • running the job and reporting the outcome of the execution back to the user.
Running a job involves, at the minimum, the following steps
  • Preparing a submission script and
  • Submitting the job for execution.
Job types on ICPL Cluster
  • serial
    Partition Name: serial
    One cpu per job for unlimited time.
    number of cpu's for serial jobs per user is limited to ~96

  • shared memory
    Partition Name: shmem
    One or more sockets ( Cpu's ) on a single execution node.
    number of cpu's for shared memory jobs per user is limited to ~320
  • survey
    Partition Name: survey
    One cpu per job for a short limited time ( usually 1:00:00 hour )
    number of cpu's survey jobs per user is limited to ~640
  • parallel
    Partition Name: parallel
    One or more sockets ( Cpu's ) on moultiple nodes with common network such as infiniband or ethernet.
    number of cpu's parallel jobs per user is limited to ~384
  • Interactive Shell
    Partition : serial, shmem, parallel
    One or more Cpu's. provide an exclusive environment on which users can run in interactive mode, e.g.:
    Runnig shell application or script without interupting other processes.

Slurm Commands


Job scripts are submitted with the sbatch command, e.g.:

% sbatch hello.slurm

The job identification number is returned when you submit the job, e.g.:

% sbatch hello.slurm
Submitted batch job 18341

For further information about the sbatch command, type man squeue on the gate server.


Displaying job status

The squeue command is used to obtain status information about all jobs submitted to all queues.
Without any specified options, the squeue command provides a display which is similar to the following:

For further information about the squeue command, type man squeue on the gate server.


SLURM provides the scancel command for deleting jobs from the system using the job identification number:

% scancel <JOBID>

If you did not note the job identification number (JOBID) when it was submitted, you can use squeue to retrieve it.
For further information about the scancel command, type man squeue on the gate server.


run a job on slurm cluster directly from the shell, e.g.:

% hostname
% srun hostname

note the second command ran on a remote host.

run a 24 cpu's program.
srun -n 24 --partition=parallel mpirun ~/hello_world.

the prompt will return only after the command is finished. output will be directed to STDOU and STDERR.


alias to:
\squeue -o "%.10i %20j %10u %3t %10m %12M %15l %6D %5C %9P %9q %R"


alias to:
squeue -u $USER

Running in interactive mode


The srsh command is an alias used to open interactive shell on another node.
srun --qos=serial --partition=serial -J srsh.$USER --pty /bin/tcsh -l

Scripts Examples


#SBATCH --job-name=job_name
#SBATCH --output=/path/to/output/file
#SBATCH --error=/path/to/error/file
#SBATCH --ntasks=1

# run whatever you need here



#SBATCH --job-name=job_name
#SBATCH --output=/path/to/output/file
#SBATCH --error=/path/to/error/file

#SBATCH --ntasks=1

# run whatever you need here



#SBATCH --job-name=job_name
#SBATCH --output=/path/to/output/file
#SBATCH --error=/path/to/error/file

#SBATCH --sockets-per-node=1

# run whatever you need here



#SBATCH --job-name=job_name
#SBATCH --output=/path/to/output/file
#SBATCH --error=/path/to/error/file

#SBATCH --partition=parallel
#SBATCH --ntasks=144

# run whatever you need here


Contact Us
© All rights reserved to The Hebrew University of Jerusalem