Skip to content

Data Summary

Project Information

Project   : p677
Run       : run200720
Platfom   : Illumina MiSeq
Provider  : GDC
Data Typ  : PE300 (paired-end 600 cycles)

MiSeq Run Information

Cluster density: 598 K/mm2 (Optimal 500–1200 k/mm2) 
Reads Total    : 14.76 M (goal 25 M)
Reads PF       : 14.27 M (passed filter)
PhiX conc      : 18.34 % (loaded 10 %)
%>=Q30         : Total 79.31 % (should be at least 70 %)

Samples

N(Samples)  : 368 (water samples)
N(PCP)      :   4 (positive controls)
N(FCN)      :   8 (filter negative controls)
N(PCN)      :   4 (PCR negative controls)
-----------------
N(total)    : 384
=================

Sample Encoding

e.g. T002-BSL1

T   : NAWA Trend
002 : BAFU ID
-
BS  : Basel-Stadt (Swiss Cantons)
L   : left river bank [L/R]
1   : replicate [1/2]

Controls

  • PCP are the positive controls: These are our positive control where we amplified 2 µl of diluted cod DNA extract (from sustainable catch of course).
  • FCN are filter negative controls: These are filters that were taken into the field, 500 ml of distilled water was filtered there. These were extracted more or less randomly with all other samples.
  • PCN are positive-negative controls: These are PCR negative controls where 2 µl of sigma water was amplified instead of DNA extract.

MiSeq Raw-Data

Quality Report (FastQC & MultiQC)

Data Preparation

To process the raw data, I am using a standardized but parameter-optimised workflow. The raw reads are first filtered to remove PhiX (an internal standard) related, and low complexity reads. In a next step, the low-quality 3’-end or the reads are being trimmed to improve read merging. I am using an in silico PCR approach to remove the primer site from the merged reads (amplicons). The last step is a quality and size filtering step. With each data processing step, we are losing data. A detail statistic is available (link below). The workflow also provides detailed report files with the name of the application, version, and parameters for each step.

Data Preparation Overview

Data Preparation Summary Table (Excel Spreadsheet)
Data Preparation Summary Table (Text File)

Problems

The expected amplicon length (219nt) is shorter than the read length (300nt). This is the reason for the drop of the mean quality score after 200nt in the QC report. We could either use a v2 500 cycle kit, reduce the number of cycle on a v3 600 cycle kit, or upscale and run the samples on an e.g. NextSeq using paired-end 150nt. We also have about 10% of failed amplicons which are below 100nt.