Job Optimization
As Euler's resources are limited, it is important to ensure that jobs are running optimally. Before running a batch of jobs, it's important to run some tests to find out.
- How many CPUs can you use?
Many tools have multi-threading options that cannot always be used, such as bcftools or angsd. In these cases, manual parallelization is more effective by running bcftools by chromsome.
- How much memory do you need?
Memory is a rather expensive resource. Plans will ensure that you actually use the memory you request.
- How long do your jobs take?
Avoid using a 120 hour queue for a job that runs for a few hours. Generally the priority is higher for short jobs.
Before you submit a batch of jobs you have to optimize the resource request. This might be difficult but there are tools like myjobs
as well as The WebGui might help you.
Learning Objectives
◇ Knowing to get a summery about all finished jobs.
◇ Knowing how to optimize a specific script.
Quote
myjobs
is your best friend (Nik Zemp).
An overview of the finished since 6 hours.
module load reportseff/2.7.6
reportseff --user $USER --since h=6
In this section we will dicuss how you can optimze a specific job. Let's use bcftools again.
We want to do a SNP call and on each chromosome separately, in this case we have 48 and we don't know if the tool runs on multiple CPUs (--threads 4).
(1) Run a test with 5 representative chunks (do not just take the first 5) with 4 CPUs and 4X20 Gb RAM and 24 hours run time.
#!/bin/bash
#SBATCH --job-name=bcf #Name of the job
#SBATCH --ntasks=1 #Requesting 1 node (is always 1)
#SBATCH --cpus-per-task=4 #Requesting 4 CPU
#SBATCH --array=4,12,17,47%10
#SBATCH --mem-per-cpu=4G #Requesting 16 Gb memory per job
#SBATCH --time=24:00:00 #Requesting 24 hours running time
#SBATCH --output=bcf_%a.log #Log
(2) After 30 min the CPU as well as the memory usage is pretty low using myjobs
.
CPU utilization: 24%
Resident memory utilization: 10%
(3) Lets restart the 5 representative chunks with 2 CPUs and 2 X 4 Gb memory and 24 hours run time.
#!/bin/bash
#SBATCH --job-name=bcf #Name of the job
#SBATCH --ntasks=1 #Requesting 1 node (is always 1)
#SBATCH --array=4,12,17,47%10
#SBATCH --cpus-per-task=2 #Requesting 2 CPU
#SBATCH --mem-per-cpu=4G #Requesting 8 Gb memory per job
#SBATCH --time=24:00:00 #Requesting 24 hours running time
#SBATCH --output=bcf_%a.log #Log
(4) After your runs are finished.
module load reportseff/2.7.6
reportseff <Job-ID>
JobID | State | Elapsed | TimeEff | CPUEff | MemEff |
---|---|---|---|---|---|
24765510_4 | COMPLETED | 00:49:10 | 0.09% | 50% | 43% |
24765510_12 | COMPLETED | 00:49:10 | 0.09% | 51% | 41% |
24765510_17 | COMPLETED | 00:49:10 | 0.10% | 51% | 42% |
24765510_47 | COMPLETED | 00:58:10 | 0.10% | 49% | 32% |
The memory and CPU usage is 50%.
(5) Let’s use the follwoing settings for the entire dataset
#!/bin/bash
#SBATCH --job-name=bcf #Name of the job
#SBATCH --ntasks=1 #Requesting 1 node (is always 1)
#SBATCH --array=1-48%10
#SBATCH --cpus-per-task=1 #Requesting 2 CPU
#SBATCH --mem-per-cpu=1G #Requesting 8 Gb memory per job
#SBATCH --time=4:00:00 #Requesting 24 hours running time
#SBATCH --output=bcf_%a.log #Log
We have learned that even when multi-threading options are provided by the tool, they do not always scale. For SNP calling, you can only run it on 1 CPU.