Software Stack
There are two main sources of tools on Euler. Standard tools can be loaded via lmod (module load
). Make sure you source the GDC software stack (GDCstack.sh
) to access the GDC specific tools. On the other hand, for more complex tools or pipelines, we offer container solutions. If you wish to use these, please contact the GDC for support.
Own Installation
For your own installation use your home ($HOME
) and not GDC projects or GDC home.
Conda
It is not allowed to install conda environments on GDC home or GDC projects due to performance issues and the fact that they use a lot of inodes. More information can be found here. Either use your scratch or home (fast SSD drives). Conda environments can also be packed for archiving info.
Software wrappers
Software wrappers are easy to use but a black box (e.g. snpArcher, ATLAS-Pipeline, DeepARG, Qime, R workflows like dada2). We do not recommend the use of such wrappers as they are usually extremely inefficient and sometimes impossible to optimise, which would be essential for use on Euler. If the jobs cannot be optimised (average CPU and memory usage > 50%), use the "sustaind" usage mode. We are happy to assist you in setting up your own workflow step by step.
Workflow managers
Workflow managers such as snakemake or Nexflow are tools for creating repeatable workflows. The problem is that the language is rather complex and it takes time to optimise all the sub-jobs. Both tools are available on Euler, but the GDC doesn't support them. If you want to use them yourself, you need to make sure that the jobs run efficiently. We recommend using simple bash scripts instead.
Euler stack
Load the current software stack.
module load stack
Get an overview about the installed tools.
module --show-hidden avail
Or you can look for samtools
for example.
module --show-hidden avail samtools
Then you get a list with the versions avaiable.
samtools/1.16.1-dyq3w4e (H)
samtools/1.16.1-wvduumz (H)
samtools/1.16.1
samtools/1.17 (D)
You can load samtools version 1.16.1 with the following command.
module load samtools/1.16.1
An overview about the application can be found here.
Java tools
For java tools such as fastqc, picard or gatk you always need to load the openjdk module in many cases as well.
Availiable openjdk versions:
module load openjdk/11.0.20.1_1
module load openjdk/17.0.8.1_1
module load openjdk/21.0.3_9
GDC stack
In addition to the general Euler stack, there is also the GDC stack. To load the GDC software stack, you need to run the following command.
source /cluster/project/gdc/shared/stack/GDCstack.sh
There is no need to load any stack or gcc version if you have source the GDC stack.
Don't put the command directly into the bashrc file, but you can make an alias.
The following command will give you an overview of all the tools in the GDC stack.
module --show-hidden avail
Let's look again for samtools.
module --show-hidden avail samtools
As you can see there is now a newer version of samtools available.
samtools/1.16.1
samtools/1.17-yhme7vv
samtools/1.20
Let's load the latest version.
module load samtools/1.20
GDC containers
For more complex tools or pipelines, we recommend to use container solutions that can be run via Apptainer.
Container rules
- Access to apptainer needs to be requsted.
- Containers should only be used when installation is impossible or very time consuming.
- Make sure that you use reliable container sources (e.g. Galaxy Depot Software Stack, BioContainers, Sylabs, Dockers).
- Containers cannot be modified/set-up on Euler.
- If you setup/modify your own container, it is your responsibility to ensure that your container meets all security requirements.
- We do not provide support for tools inside a container. -> Contact the author(s)!
- Apptainer must be run on the Scratch (e.g. sif-file needs to be on the
$SCRATCH
). - Sometimes jobs are not killed by slurm even if the job is no longer running. -> Monitor the job more regularly.
- The efficiency can be lower compared to a compiled tool, please be aware of this and adjust the requested resources accordingly.
- Writing your own scripts is generally more complex with Apptainer but an alias is useful.
Not all containers are constructed in the same way, but in many cases you can use the following recipe. If you have any problems, please contact the GDC for assistance.
Let's create an alias for our toolX.
alias "toolX=apptainer exec \
--bind ${SCRATCH} \
container.sif \
command_to_call_toolX"
Now you can call the container like any other tool.
toolX -h
Let's use a rather simple tool like Samtools for educational purposes.
cd ${SCRATCH}
#Let's download the container from galaxy to our Scratch.
wget https://depot.galaxyproject.org/singularity/samtools:0.1.19--3
mv samtools:0.1.19--3 samtools-1.9.sif
#Create alias
alias "sm=apptainer exec \
--bind ${SCRATCH} \
samtools-1.9.sif \
samtools"
#run samtools view
sm view -h
R
Available versions
module load r/4.3.2
module load stack/2024-06 r/4.4.0
Package installation
The commands for the installation of R packages depends on the repository. Use always the default path settings.
#### CRAN Repository
install.packages("package")
install.packages(c("packageA", "packageB"))
#### Bioconductor Repository
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("package")
#### GitHub
library(devtools)
devtools::install_github("link/to/package")
Python Tools
If you want to install your own Python tools, we strongly recommend that you use virtual environments that you have full control about different versions.
Installation
Let's install the tool cutadapt.
Go to your home
cd ${HOME}
Now we need to load the python module.
module load python/3.11.6
Now you can create a virtual environment called cutadapt.
python -m venv cutadapt
Now let's source the environment.
source ${HOME}/cutadapt/bin/activate
Now you can install the tool via pip.
pip install cutadapt
Use
In your submission script, you would add the following lines
module load python/3.11.6
source ${HOME}/cutadapt/bin/activate
cutadapt -h
Databases
Databases are difficult to maintain when many users are involved. Maintaining them is more expensive than simply downloading them again, especially if they are pre-built. Each user should download the databases directly to the Scratch, run the analysis and then delete them again.
Only the large NCBI Blast databases are stored centrally and maintained by the Cluster Support.
/cluster/project/clcgenomics/CLC_BLAST_DB
With the follwing command you can blast against the nt database.
blastn -task blastn -query query.fata \
-db /cluster/project/clcgenomics/CLC_BLAST_DB/nt -out query_nt.tab \
-outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore stitle sscinames sskingdoms'