Skip to content

File System

Several file systems exist on Euler, each with specific limits and geared towards specific needs. A overwiew about the file system on Euler. See the table below for more information.

Overview

filesystem

Your personal scratch is your working directory. It is not allowed to directly output or unzip large amount of data to GDC home or GDC projects. I/O is faster on the SSD scratch drives, thus having the data (e.g. fastq or bam files) on the scratch often speeds up the analysis and reducing significantly memory usage. The scheduler constantly outputs information to the log file. Make sure that you use the scratch also for the log files.

The GDC needs to manage the available disk space in order to keep costs low and expansions have to be requested. Therefore, GDC users expecting to receive or produce a lot of data (> 1 Tb) should notify the bioinformaticians as early as possible.

To keep GDC home and GDC projects responsive, do not use no more than 13,000 inodes (files, soft-links, directories, hidden files) per 1 Tb of disk space.

Name Path Variable Information Backup Type of data
Home /cluster/home/<USER> $HOME Private directory, max. 50 Gb, free of charge, NetApp Yes Scripts, own installed tools. Only you have access
Scratch /cluster/scratch/<USER> $SCRATCH Private scratch, working directory, 2.5 Tb, files get deleted after 2 weeks automatically, free of charge, fast SSD drives, Lustre No Working directory, only you have access
GDC home /cluster/work/gdc/people/<USER> Personal folder, no size limit, storage fee, Lustre Yes Personal or project data that you keep longer. Only you and GDC bioinformaticians have access
GDC projects /cluster/work/gdc/shared/<p999> Project folder, no size limit, storage fee, Lustre Yes Project data that you keep longer. Your project group and GDC bioinformaticians have access
GDC source /cluster/project/gdc/shared Only read access, NetApp - GDC software stack and scripts
Node scratch $TMPDIR Fast access to the data during a job No Intermediate files that need to be as close to the CPU as possible.

There is your personal Euler home directory (/cluster/home/<USER> or ${HOME}), where you can keep scripts or own installed tools. Your personal scratch (/cluster/scratch/<USER> or ${SCRATCH}) is your working directory. Data will be automatically deleted after two weeks. Thus important files have to be copied to GDC home or GDC projects for save-keeping. It is not allowed to directly output data to GDC home or GDC projects. Your GDC home is located here (/cluster/work/gdc/people/<USER>). You might also have access to one or more GDC project folders (/cluster/work/gdc/shared/<p999>). Be aware that these folders are shared with other users. On GDC source you find our own software stack and scripts.

Best practices to work on our file system

  • Your working directory is the scratch.
  • Reading and writing files from the scratch SSD drivers could massively increase turn around times of your jobs.
  • Own installed tools or scripts should be kept on your home (${home}).
  • Do not output or unzip large amount of data (> 100 Gb) to GDC home or GDC projects.
  • Many small files (> 100; aka inodes) need to be archived (tar, zip).
  • Use subdirectories instead of storing all files in a single directory.
  • Avoid unnecessary I/O operations.
  • For directory listings use ls instead of ls -l.
  • Avoid reading the same region of a file from many processes at the same time.
Your scratch folder will be auto-mounted so you only see the folder when you access it.

cd ${SCRATCH} or you can generate a softlink in your home.

File permissions

If you have multiple users working on the same files or folders in one of your project folders, make sure that you have set the permissions correctly.

We do not recommend using recursively chmod 777 -R folder\ because other members of your group may accidentally delete files.

Have a look at this tutorial if you want to learn more about file permissions.

Let's have a look at this example we have a folder raw with fastq files.

├── raw
    ├── C1_05_F.fq.fz
    └── C1_04_F.fq.fz

Get an overview of the access rights of your files or subfolders.

ls -l

Now we can see the file permission

drwxr-x--- 13 MAX USYS-IBZ-GDC-EULER-<p9999> 4096 Mar  5 07:45 raw
-rw-r-----  1 MAX USYS-IBZ-GDC-EULER-<p9999> 10 Mar 12 16:13 C1_05_F.fq.gz
-rw-r-----  1 MAX USYS-IBZ-GDC-EULER-<p9999> 10 Mar 12 16:13 C1_04_F.fq.gz
  • raw is a folder (drwxr-x---)

  • MAX can read, write and open raw (drwxr-x---).

  • MAX can read and write the two fastq files (-rw-r-----).

  • USYS-IBZ-GDC-EULER- has access the folder raw (drwxr-x--) and can only read the fastq files (-rw-r-----). Other members of the group GDC can't delete the fastq files or create new files in the folder raw.

How can set the file permissions ?

Everybody in the group can read and write files in raw. Keep in mind that all users can delete the entire folder.

chown <USER>.USYS-IBZ-GDC-EULER-<p999> raw
chmod 770 raw

Everybody in the group can read in raw but can't generate new files.

chown <USER>.USYS-IBZ-GDC-EULER-<p999> raw
chmod 750 raw

Everybody in the group can read, owner can write the fastq files.

chown <USER>.USYS-IBZ-GDC-EULER-<p999> raw/*.fq.fz
chmod 640 raw/*.fq.gz

Everybody in the group can read and write the fastq files e.g. delete files.

chown <USER>.USYS-IBZ-GDC-EULER-<p999> raw/*.fq.gz
chmod 660 raw/*.fq.gz