Sequencing is finished

Check-List¶

Download data if possible via terminal (e.g. ftp)
Verify file integrity (md5sum)
Verify data - N(samples) = 2 x N(Files)
Get a few random reads and blast them
Check fastq headers - how many runs?
Run a quality control (e.g. FastQC)
Read size distribution
Check for PhiX contamination
Archive a copy of you raw data
Upload the rad data (e.g. ENA)

FTP Download¶

## FTP-Server
lftp sftp://jwalser@openbis-dsu.ethz.ch:2222
mirror -c -v -p —log=p840_16S_181119.log --parallel=10 P840_16S

## SSH
scp -r P840_16S/*_R[12]_*fastq.gz jwalser@euler.ethz.ch:${P}

md5sum¶

find . -name "*_R[12]*.fastq.gz" | while read file ; do md5sum $file; done > p840_run181119_16S_md5sum.txt

Count samples and files¶

## Count R1 and R 2 files
ls -al a_data/gz/*_R1*.f*q.gz | wc -l
ls -al a_data/gz/*_R2*.f*q.gz | wc -l
## Get fastq header of first reads
zcat a_data/gz/*.f*q.gz | head -n 1
# Example @M01761:234:000000000-B32NW:1:2107:10522:1813 2:Y:0:CCTAAGAC+TAGCCTTA
#         @M01761:234 <- ID
## Count total number of reads 
zcat a_data/gz/*_R1*.f*q.gz | grep -c "^@M01761"
## Check is all reads are from the same sequencing platform
zcat a_data/gz/*_R1*.f*q.gz | grep "^@" | sort -u