TechTerms

This section contains a list of important tech terms.

Learning Objectives

◇ Know importand tech terms.
◇ Be able to use them in the context of a new software.

  • HPC cluster: relatively tightly coupled collection of compute nodes. Access to the cluster is provided through a login node. A resource manager and scheduler provide the logic to schedule jobs efficiently on the cluster.

  • Login node: Serve as an access point for users wishing to run jobs on the HPC cluster. Do not run demanding jobs on the login nodes.

  • Compute node: Currently most compute nodes have two sockets, each with a single CPU, volatile working memory (RAM), a hard drive, typically small and only used to store temporary files, and a network card.

  • CPU: Central Processing Unit, the chip that performs the actual computation in a compute node. A modern CPU is composed of numerous cores, typically 8 or 10. It has also several cache levels that help in data reuse.

  • Core: part of a modern CPU. A core is capable of running processes and has its own processing logic and floating point unit. Each core has its own level 1 and level 2 cache for data and instructions. Cores share last level cache.

  • Threads: a process can perform multiple computations, i.e. program flows, concurrently. In scientific applications threads typically process their own subset of data or a subset of loop iterations.

CPU versus threads

The use of threads and CPUs often varies depending on the tool being used. On Euler, we request cpus-per-task. Which is equivalent to number of CPUs.

Quote

What is the difference between a core and a thread? Think of the core as a person’s mouth and the threads as the hands. The mouth does all of the eating, while the hands just help organize the ‘workload’. The thread helps deliver the workload to the CPU more efficiently. More threads translates into a better-organized work queue, hence more efficiency in processing the information (https://www.techsiting.com/cores-vs-threads/).

  • Memory: Each processor needs memory associated with it to provide a place for the processor to do its work. Some applications (e.g. genome assemblies) needs a lot of memory. On Euler memory is limited as it's compared to CPUs expensive and thus you need to optimize your jobs.

  • Inodes: An inode (short for "index node") is a data structure to store information about a file. Each inode has a unique ID that identifies an individual file or other object (folder, sof-link) in the Linux file system. The number of inodes is limited on HPC volumes, thus, many small files needs to be archived (tar or zipped) in order to keep the file system responsive.