Sequencing is finished
Check-List
- Download data if possible via terminal (e.g. ftp)
- Verify file integrity (md5sum)
- Verify data - N(samples) = 2 x N(Files)
- Get a few random reads and blast them
- Check fastq headers - how many runs?
- Run a quality control (e.g. FastQC)
- Read size distribution
- Check for PhiX contamination
- Archive a copy of you raw data
- Upload the rad data (e.g. ENA)
FTP Download
## FTP-Server
lftp sftp://jwalser@openbis-dsu.ethz.ch:2222
mirror -c -v -p —log=p840_16S_181119.log --parallel=10 P840_16S
## SSH
scp -r P840_16S/*_R[12]_*fastq.gz jwalser@euler.ethz.ch:${P}
md5sum
find . -name "*_R[12]*.fastq.gz" | while read file ; do md5sum $file; done > p840_run181119_16S_md5sum.txt
Count samples and files
## Count R1 and R 2 files
ls -al a_data/gz/*_R1*.f*q.gz | wc -l
ls -al a_data/gz/*_R2*.f*q.gz | wc -l
## Get fastq header of first reads
zcat a_data/gz/*.f*q.gz | head -n 1
# Example @M01761:234:000000000-B32NW:1:2107:10522:1813 2:Y:0:CCTAAGAC+TAGCCTTA
# @M01761:234 <- ID
## Count total number of reads
zcat a_data/gz/*_R1*.f*q.gz | grep -c "^@M01761"
## Check is all reads are from the same sequencing platform
zcat a_data/gz/*_R1*.f*q.gz | grep "^@" | sort -u