Skip to content

OTU/Count Table (Matrix)

Schematic Layout

Sample A Sample B Sample C Mock Negative Taxa
ZOTU1 5000 24 2 0 3 KCOFGS
ZOTU2 1202 546 34 734 0 KCOFGS
ZOTU3 560 899 124 0 0 KCOFGS
ZOTU4 12 356 5461 0 0 KCOFGS
ZOTU5 0 0 65 0 0 KCOFGS
ZOTU6 0 5 0 0 1 KCOFGS
ZOTU7 0 34 0 0 2 KCOFGS
ZOTU8 0 0 453 1432 0 KCOFGS
ZOTU9 1 0 0 243 12 KCOFGS
ZOTU10 0 0 0 546 0 KCOFGS
...

Table Characteristics

  • contains raw (read) counts ( = non-negative integers)
  • has many or lots of zeros
  • showes differences in sequencing depth (sequencing artefact)
  • showes differences in library composition between samples (biological artefact)

Table Summary

cat e_OTU/*_ZOTU_Count.summary

Data Characteristics

  • compositional (multiple parts of non-negativ numbers)
  • high dimensional (few data points and many features)
  • underdetermined (the number of OTUs is much greater than the number of samples)
  • overdispersed (variance of the counts of reads is larger than expected)
  • zero-inflated (many zeros)

Count "Normalization" Approaches

There are various approaches to correcting for one or more of the data structure difficulties described above.

  • Rarefying
  • Total count scaling/proportions
  • Scaling with size factors (effective library sizes)
  • Data transformation

Literature

  • Schloss (2023) Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. bioRxiv 2023.06.23.546313.