Skip to content

Sequencing is finished

Check-List

  • Download data if possible via terminal (e.g. ftp)
  • Verify file integrity (md5sum)
  • Verify data - N(samples) = 2 x N(Files)
  • Get a few random reads and blast them
  • Check fastq headers - how many runs?
  • Run a quality control (e.g. FastQC)
  • Read size distribution
  • Check for PhiX contamination
  • Archive a copy of you raw data
  • Upload the rad data (e.g. ENA)

FTP Download

## FTP-Server
lftp sftp://jwalser@openbis-dsu.ethz.ch:2222
mirror -c -v -p —log=p840_16S_181119.log --parallel=10 P840_16S

## SSH
scp -r P840_16S/*_R[12]_*fastq.gz jwalser@euler.ethz.ch:${P}

md5sum

find . -name "*_R[12]*.fastq.gz" | while read file ; do md5sum $file; done > p840_run181119_16S_md5sum.txt

Count samples and files

## Count R1 and R 2 files
ls -al a_data/gz/*_R1*.f*q.gz | wc -l
ls -al a_data/gz/*_R2*.f*q.gz | wc -l
## Get fastq header of first reads
zcat a_data/gz/*.f*q.gz | head -n 1
# Example @M01761:234:000000000-B32NW:1:2107:10522:1813 2:Y:0:CCTAAGAC+TAGCCTTA
#         @M01761:234 <- ID
## Count total number of reads 
zcat a_data/gz/*_R1*.f*q.gz | grep -c "^@M01761"
## Check is all reads are from the same sequencing platform
zcat a_data/gz/*_R1*.f*q.gz | grep "^@" | sort -u