OTU/Count Table (Matrix)
Schematic Layout
Sample A | Sample B | Sample C | Mock | Negative | Taxa | |
---|---|---|---|---|---|---|
ZOTU1 | 5000 | 24 | 2 | 0 | 3 | KCOFGS |
ZOTU2 | 1202 | 546 | 34 | 734 | 0 | KCOFGS |
ZOTU3 | 560 | 899 | 124 | 0 | 0 | KCOFGS |
ZOTU4 | 12 | 356 | 5461 | 0 | 0 | KCOFGS |
ZOTU5 | 0 | 0 | 65 | 0 | 0 | KCOFGS |
ZOTU6 | 0 | 5 | 0 | 0 | 1 | KCOFGS |
ZOTU7 | 0 | 34 | 0 | 0 | 2 | KCOFGS |
ZOTU8 | 0 | 0 | 453 | 1432 | 0 | KCOFGS |
ZOTU9 | 1 | 0 | 0 | 243 | 12 | KCOFGS |
ZOTU10 | 0 | 0 | 0 | 546 | 0 | KCOFGS |
... |
Table Characteristics
- contains raw (read) counts ( = non-negative integers)
- has many or lots of zeros
- showes differences in sequencing depth (sequencing artefact)
- showes differences in library composition between samples (biological artefact)
Table Summary
cat e_OTU/*_ZOTU_Count.summary
Data Characteristics
- compositional (multiple parts of non-negativ numbers)
- high dimensional (few data points and many features)
- underdetermined (the number of OTUs is much greater than the number of samples)
- overdispersed (variance of the counts of reads is larger than expected)
- zero-inflated (many zeros)
Count "Normalization" Approaches
There are various approaches to correcting for one or more of the data structure difficulties described above.
- Rarefying
- Total count scaling/proportions
- Scaling with size factors (effective library sizes)
- Data transformation
Literature
- Schloss (2023) Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. bioRxiv 2023.06.23.546313.