How to Run a Job on HPC Using Simple Linux Utility for Resource Management

Include usage of Supercomputers & Clusters, CPU and GPU optimization/acceleration, Parallel Computing (including Massively Parallel Computing), Distributed Computing, usage of: OpenMP (Open Multi-Processing) API, Message Passing Interface (MPI), Compute Unified Device Architecture (CUDA), OpenCL, framework, a variety of highly concurrent, multithreaded applications and single-process, multithreaded systems, and a lot of other different ways of optimizing codes and parallel programming
Post Reply
User avatar
Eli
Senior Expert Member
Reactions: 183
Posts: 5334
Joined: 9 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#1

This is a minimal working example on how to submit a job on HPC using Slurm. Slurm is a shorthand for "Simple Linux Utility for Resource Management", an open-source job scheduler that allocates compute resources on clusters for queued researcher defined jobs. Slurm has been deployed at various national and international computing centers, and by approximately 60% of the TOP500 supercomputers in the world. See detailed instructions. Submitting a job to be executed immediately or at a later time, depending on availability of requested resources, includes scheduling and specification of cluster resources, such as number of HPC Slurm compute nodes to be used, number of tasks per node, memory and time.

We provide a simple script that can be used to submit a job to CentOS Linux cluster with specifications below:

  1. LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
  2. Distributor ID: CentOS
  3. Description:    CentOS release 6.10 (Final)
  4. Release:    6.10
  5. Codename:   Final


Here, we show one way for running a job on the HPC cluster by setting up a job script. This script will request cluster resources and list, in sequence, the commands that we want to execute.

Let's call our job script "job_script", which is a plain text file that we can edit with a UNIX editor such as vi/vim, nano, or emacs. Therefore, we can create an empty script with vim editor as

  1. $vim job_script


copy and paste the following contents into it,

  1. !/bin/sh
  2. #SBATCH --nodes=3
  3. #SBATCH --ntasks-per-node=8
  4. #SBATCH --mem=64000
  5. #SBATCH --time=01::00
  6.  
  7. #If your job requires python, load Python modules here
  8. source /home/user/setup.sh
  9. #Commands that you actually want to run go here.
  10. cd /home/tssfl
  11. ipython code.py
  12.  
  13. #Mail alert at start, end and abortion of execution
  14. #SBATCH --mail-type=ALL
  15. #Send mail to this address
  16. #SBATCH --mail-user=name@tssfl.com


save, and close (Esc, followed by : and then wq or x, hit Enter) the script.

The first four lines above that start with #SBATCH (never uncomment them), respectively, specifies compute nodes, number of tasks per node, memory and time (1 Hour). source /home/user/setup.sh loads modules (in this case Python modules) from another HP cluster user, that's, there is no need to re-install resources in your home directory, you can load them from another user if they are available. We move to the home directory tssfl where we assume the code "code.py" we want to run is placed, and then issue the command ipython code.py. We can also set mail alert at start, end and abortion of execution to monitor the progress of our computations. All these instructions are defined in the script.

Finally, we can run our job by submitting it via Terminal as

  1. $sbatch job_script


To cancel the job or see it in queue, you can respectively, issue the commands scancel Job_ID, squeue.
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
Post Reply

Return to “Parallel Programming, High-Performance Computing & Supercomputing”

  • Information
  • Who is online

    Users browsing this forum: No registered users and 2 guests