File System
Several file systems exist on Euler, each with specific limits and geared towards specific needs. A overwiew about the file system on Euler. See the table below for more information.
Overview
Your personal scratch is your working directory. It is not allowed to directly output or unzip large amount of data to GDC home or GDC projects. I/O is faster on the SSD scratch drives, thus having the data (e.g. fastq or bam files) on the scratch often speeds up the analysis and reducing significantly memory usage. The scheduler constantly outputs information to the log file. Make sure that you use the scratch also for the log files.
The GDC needs to manage the available disk space in order to keep costs low and expansions have to be requested. Therefore, GDC users expecting to receive or produce a lot of data (> 1 Tb) should notify the bioinformaticians as early as possible.
To keep GDC home and GDC projects responsive, do not use no more than 13,000 inodes (files, soft-links, directories, hidden files) per 1 Tb of disk space.
Name | Path | Variable | Information | Backup | Type of data |
---|---|---|---|---|---|
Home | /cluster/home/<USER> |
$HOME |
Private directory, max. 50 Gb, free of charge, NetApp | Yes | Scripts, own installed tools. Only you have access |
Scratch | /cluster/scratch/<USER> |
$SCRATCH |
Private scratch, working directory, 2.5 Tb, files get deleted after 2 weeks automatically, free of charge, fast SSD drives, Lustre | No | Working directory, only you have access |
GDC home | /cluster/work/gdc/people/<USER> |
Personal folder, no size limit, storage fee, Lustre | Yes | Personal or project data that you keep longer. Only you and GDC bioinformaticians have access | |
GDC projects | /cluster/work/gdc/shared/<p999> |
Project folder, no size limit, storage fee, Lustre | Yes | Project data that you keep longer. Your project group and GDC bioinformaticians have access | |
GDC source | /cluster/project/gdc/shared |
Only read access, NetApp | - | GDC software stack and scripts | |
Node scratch | $TMPDIR |
Fast access to the data during a job | No | Intermediate files that need to be as close to the CPU as possible. |
There is your personal Euler home directory (/cluster/home/<USER>
or ${HOME}
), where you can keep scripts or own installed tools. Your personal scratch (/cluster/scratch/<USER>
or ${SCRATCH}
) is your working directory. Data will be automatically deleted after two weeks. Thus important files have to be copied to GDC home or GDC projects for save-keeping. It is not allowed to directly output data to GDC home or GDC projects. Your GDC home is located here (/cluster/work/gdc/people/<USER>
). You might also have access to one or more GDC project folders (/cluster/work/gdc/shared/<p999>
). Be aware that these folders are shared with other users. On GDC source you find our own software stack and scripts.
Best practices to work on our file system
- Your working directory is the scratch.
- Reading and writing files from the scratch SSD drivers could massively increase turn around times of your jobs.
- Own installed tools or scripts should be kept on your home (
${home}
). - Do not output or unzip large amount of data (> 100 Gb) to GDC home or GDC projects.
- Many small files (> 100; aka inodes) need to be archived (
tar
,zip
). - Use subdirectories instead of storing all files in a single directory.
- Avoid unnecessary I/O operations.
- For directory listings use
ls
instead ofls -l
. - Avoid reading the same region of a file from many processes at the same time.
Your scratch folder will be auto-mounted so you only see the folder when you access it.
cd ${SCRATCH}
or you can generate a softlink in your home.
File permissions
If you have multiple users working on the same files or folders in one of your project folders, make sure that you have set the permissions correctly.
We do not recommend using recursively chmod 777 -R folder\
because other members of your group may accidentally delete files.
Have a look at this tutorial if you want to learn more about file permissions.
Let's have a look at this example we have a folder raw
with fastq files.
├── raw
├── C1_05_F.fq.fz
└── C1_04_F.fq.fz
Get an overview of the access rights of your files or subfolders.
ls -l
Now we can see the file permission
drwxr-x--- 13 MAX USYS-IBZ-GDC-EULER-<p9999> 4096 Mar 5 07:45 raw
-rw-r----- 1 MAX USYS-IBZ-GDC-EULER-<p9999> 10 Mar 12 16:13 C1_05_F.fq.gz
-rw-r----- 1 MAX USYS-IBZ-GDC-EULER-<p9999> 10 Mar 12 16:13 C1_04_F.fq.gz
-
raw is a folder (drwxr-x---)
-
MAX can read, write and open raw (drwxr-x---).
-
MAX can read and write the two fastq files (-rw-r-----).
-
USYS-IBZ-GDC-EULER-
has access the folder raw (drwxr-x--) and can only read the fastq files (-rw-r-----). Other members of the group GDC can't delete the fastq files or create new files in the folder raw.
How can set the file permissions ?
Everybody in the group
chown <USER>.USYS-IBZ-GDC-EULER-<p999> raw
chmod 770 raw
Everybody in the group
chown <USER>.USYS-IBZ-GDC-EULER-<p999> raw
chmod 750 raw
Everybody in the group
chown <USER>.USYS-IBZ-GDC-EULER-<p999> raw/*.fq.fz
chmod 640 raw/*.fq.gz
Everybody in the group
chown <USER>.USYS-IBZ-GDC-EULER-<p999> raw/*.fq.gz
chmod 660 raw/*.fq.gz