Basics

In contrast to a server (VM) a high performance computing (HPC) cluster (e.g. Euler) contains many thousands of inter-connected compute nodes. You can thus use much more resources but you need to consider some aspects.

Key resources to consider when running jobs on a HPC

📁 File system
🔋 Memory (RAM)
⚙️ CPU cores
⏱️ Run time

Some of the Admonitions are expendable. Simply click on them for more information such as in the case if the GDC Euler Tutorial.

GDC Euler tutorial

We are working on a tutorial for users who are working on an HPC cluster for the first time. Use this link here.

AI systems

AI tools have also become very useful in bioinformatics (e.g. scripting). ETH has released useful guidelines on how to use tools such as ChatGPT and Claude.

Important Tech terms

(source: wikipedia, TechTerms)

HPC cluster: relatively tightly coupled collection of compute nodes. Access to the cluster is provided through a login node. A resource manager and scheduler provide the logic to schedule jobs efficiently on the cluster (slurm).

Login node: Serve as an access point for users wishing to run jobs on the HPC cluster.
Compute node: Currently most compute nodes have two sockets, each with a single CPU, volatile working memory (RAM), a hard drive, typically small and only used to store temporary files, and a network card.
CPU: Central Processing Unit, the chip that performs the actual computation in a compute node. A modern CPU is composed of numerous cores, typically 8 or 10. It has also several cache levels that help in data reuse.

Core: part of a modern CPU. A core is capable of running processes and has its own processing logic and floating point unit. Each core has its own level 1 and level 2 cache for data and instructions. Cores share last level cache.
Threads: a process can perform multiple computations, i.e. program flows, concurrently. In scientific applications threads typically process their own subset of data or a subset of loop iterations.

CPU or threads

On Euler, the number of CPUs is specified using the --cpus-per-task option, which defines how many CPU cores are allocated to each task. This is particularly important for multi-threaded applications, where each thread typically runs on a separate CPU core. In many practical scenarios, threads and CPUs (cores) are used interchangeably, though they are technically distinct.

Memory: Each processor needs memory associated with it to provide a place for the processor to do its work. Some applications (e.g. genome assemblies) needs a lot of memory. On Euler memory is limited as it's compared to CPUs expensive and thus you need to optimise your jobs.
Cache memory: Is an extremely fast memory type that acts as a buffer between RAM and the CPU. It holds frequently requested data and instructions so that they are immediately available to the CPU when needed. Cache memory is used to reduce the average time to access data from the main memory. As schedulers are reporting both memory and cache memory, the effectively used memory can be much lower in case you read a lot of data. Therefore you can often reduced the amount of requested memory.
Inodes: An inode (short for "index node") is a data structure to store information about a file. Each inode has a unique ID that identifies an individual file or other object (folder, sof-link) in the Linux file system. The number of inodes is limited on our volume, thus, many small files needs to be archived in order to keep the file system fast.
SSD: Compared with electromechanical drives, olid-state drive (SSD) are typically more resistant to physical shock, run silently, and have higher IOPS and lower latency. Your home and your scratch contians such drives.
IO/IOPS: Input/output operations per second is a performance measurement used to characterise computer storage devices. SSD drives have much higher performance thus reading and writing on the scratch speeds up your analysis massively and reduced your memory consumption.
FTP Clients: FTP clients are used to upload, download and manage files on a (remote) server. FTP clients include, for example, Cyberduck or FileZilla.
RAID: RAID stands for “Redundant Array of Independent Disks” and is a method of storing data on multiple hard disks. When hard disks are arranged in a RAID, the computer sees them all as one big hard disk. However, they work much more efficiently than a single hard disk. The benefits of RAID come from a technique called striping, which splits up the stored data among the drives.
Shell: The shell is a purely text-based command-line interface. The user can enter commands to perform functions such as running programmes, managing directories, and displaying processes. Because the shell is only one layer above the operating system, you can perform operations that are not always possible via the graphical user interface (GUI).
VPN: VPN stands for "Virtual Private Network" and describes the possibility of establishing a protected network connection when using public networks. VPNs encrypt your internet traffic and disguise your online identity. This makes it more difficult for third parties to track your online activities and steal data. The encryption takes place in real time.
Wrapper: A wrapper function is a function in a software library whose main purpose is to call other tools. Wrapper functions are easy to use, but are often a black box. It is generally difficult, if not impossible, to optimise such tools on an HPC cluster.
Scheduler: The scheduler distributes jobs across the cluster. On Euler we use slurm.