How to Run a Job on HPC Using Simple Linux Utility for Resource Management

#1

This is a minimal working example on how to submit a job on HPC using Slurm. Slurm is a shorthand for "Simple Linux Utility for Resource Management", an open-source job scheduler that allocates compute resources on clusters for queued researcher defined jobs. Slurm has been deployed at various national and international computing centers, and by approximately 60% of the TOP500 supercomputers in the world. See detailed instructions. Submitting a job to be executed immediately or at a later time, depending on availability of requested resources, includes scheduling and specification of cluster resources, such as number of HPC Slurm compute nodes to be used, number of tasks per node, memory and time.

We provide a simple script that can be used to submit a job to CentOS Linux cluster with specifications below:

Code: [Select all] [Expand/Collapse]

LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description:    CentOS release 6.10 (Final)
Release:    6.10
Codename:   Final

Here, we show one way for running a job on the HPC cluster by setting up a job script. This script will request cluster resources and list, in sequence, the commands that we want to execute.

Let's call our job script "job_script", which is a plain text file that we can edit with a UNIX editor such as vi/vim, nano, or emacs. Therefore, we can create an empty script with vim editor as

Code: [Select all] [Expand/Collapse]

$vim job_script

copy and paste the following contents into it,

Code: [Select all] [Expand/Collapse]

!/bin/sh
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=8
#SBATCH --mem=64000
#SBATCH --time=01::00
 
#If your job requires python, load Python modules here
source /home/user/setup.sh
#Commands that you actually want to run go here.
cd /home/tssfl
ipython code.py
 
#Mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL
#Send mail to this address
#SBATCH --mail-user=name@tssfl.com

save, and close (Esc, followed by : and then wq or x, hit Enter) the script.

The first four lines above that start with #SBATCH (never uncomment them), respectively, specifies compute nodes, number of tasks per node, memory and time (1 Hour). source /home/user/setup.sh loads modules (in this case Python modules) from another HP cluster user, that's, there is no need to re-install resources in your home directory, you can load them from another user if they are available. We move to the home directory tssfl where we assume the code "code.py" we want to run is placed, and then issue the command ipython code.py. We can also set mail alert at start, end and abortion of execution to monitor the progress of our computations. All these instructions are defined in the script.

Finally, we can run our job by submitting it via Terminal as

Code: [Select all] [Expand/Collapse]

$sbatch job_script

To cancel the job or see it in queue, you can respectively, issue the commands scancel Job_ID, squeue.

TSSFL TECHNOLOGY STACK

How to Run a Job on HPC Using Simple Linux Utility for Resource Management

Who is online

How to Run a Job on HPC Using Simple Linux Utility for Resource Management

Who is online

Login • Register