References for ZOTU Annotation
In a next step, I used USEARCH::SINTAX to predict taxonomic association for the (Z)OTUs. The perfect reference (diverse and accurate) for this step does not exist, but we have different choices available:
- MitoFish is a comprehensive fish mitochondrial genome database.
- [MitoFish+] is a combination of Mitofish plus NCBI-Blast hits.
- MIDORI is a reference database of DNA sequences, which can be used for taxonomic assignments of all eukaryote mitochondrial DNA sequences.
The two available databases (MitoFish and MIDORI) have limits in terms of species diversity, accuracy, and completeness but are extremely helpful for the annotation. We wanted to increase the annotation range of the MitoFish reference by adding non-fish related sequences. For this reason, we blasted the ZOTUs against the NCBI nt database. The blast hits were filtered to include only properly annotated sequences and combined it with the MitoFish reference. With adding more diversity to the existing reference, we will also be able to distinguish more tax levels. For example, with the addition, we can now distinguish between Bacteria or Eukaryota related ZOTUs. We also should not forget that the amplicon itself has resolution limits (see MiFish Refereence) and co-amplified targets are more difficult to annotate.
Annotation Summary
Percentage of the ZOTU (n=7,309) (data (n=8,756,988)) with annoations. All samples and controls are considered in this summary.
| Taxa Level | MitoFish | MitoFish+ | MIDORI |
|---|---|---|---|
| Family | 1.3 % (87.8%) | 6.4% (90.8%) | 2.7% (90.2%) |
| Genus | 0.8 % (75.9%) | 4.1% (78.3%) | 2.1% (78.0%) |
| Species | 0.6 % (35.0%) | 2.8% (37.3%) | 1.5% (37.0%) |
Positive Controls (PCP)
The positve control samples cotain cod DNA (☛ c:Actinoperygii,o:Gadiformes,f:Gadidae,g:Gadus)
➜ We have 4 positive samples with 74 ZOTUs in total. All four samples have sufficient counts:
| PCP-1 | PCP-2 | PCP-3 | PCP-4 |
|---|---|---|---|
| 12,908 | 10,259 | 14,311 | 18,203 |
The annoation for the top ZOTUS of the positve samples, agreed for the three annoation references. The dominant ZOTU (ZOTU19) in all four controls is missing in all the other samples with one exception. ZOTU19 is also present in one of the replicats of sample T002-BSL.

The taxonomic prediction is correct for all three references but the resolution stops at family level.
MitoFish : ZOTU19 k__Eukaryota; p__Chordata; c__Actinopteri; o__Gadiformes; f__Gadidae; g__; s__
MitoFish+ : ZOTU19 k__Eukaryota; p__Chordata; c__Actinopteri; o__Gadiformes; f__Gadidae; g__; s__
MIDORI : ZOTU19 k__Eukaryota; p__Chordata; c__Actinopteri; o__Gadiformes; f__Gadidae; g__; s__
>ZOTU19
CACCGCGGTTATACGAGAGGCCCAAATTGATGAAAAACGGCGTAAAGCGTGGTTAAGAAA
AAAGAGAAAATATGGCCGAACAGCTTCAAAGCAGTTATACGCATCCGAAGTCACGAAGAA
CAATCACGAAAGTTGCCCTAAAACCTCCGATTCCACGAAAGCCATAAAA
Filter Control Negative (FCN)
➜ We have 8 negative samples with a total of 43 ZOTUs.
| FCN-1A | FCN-1B | FCN-1C | FCN-1D | FCN-2A | FCN-2B | FCN-3A | FCN-3B |
|---|---|---|---|---|---|---|---|
| 40,804 | 374 | 36,170 | 12 | 21 | 12 | 12 | 15 |
Only two samples (i.e. FCN-1A and FCN-1C) have enough (>0.005%) total counts to be considered.

PCR Control Negatives (PCN)
All four PCR negative controls have very low number of counts.
| PCN-1 | PCN-2 | PCN-3 | PCN-4 |
|---|---|---|---|
| 13 | 13 | 11 | 17 |
Sample Annotation
Annotation with MitoFish
The most common families are Salmonidae, Leuciscidae, Cottidae, Cyprinidae, Nemacheilidae, and Gasterosteidae. While the MitoFish reference restricted to fish is but the primers are not fish-specific and co-amplify other targets, we expect some missing taxa labels. ZOTU13 and ZOTU24 are two examples.

Annotation with MitoFish+
Identical to the annotation with MitoFish, the most abundant families are Salmonidae, Leuciscidae, Cottidae, Cyprinidae, Nemacheilidae, and Gasterosteidae. In comparison, ZOTU13 got new an annotation as Bovidae (cattle) and ZOTU24 was identified a human contamination.
The reference includes Bacteria as a possible outgroup. 71% (5,184) of the ZOTUs are assigned to bacteria and not eukaryota. Although the majority of ZOTUs are bacteria related it only represence 7% of the data. This finding indicates that bacteria related ZOUTs are adundant but rare.

Anntoation with MIDORI
The annotation predictions with MIDORI is identical to MitoFish+ for the top abundant ZOTUs. The non-fish related ZOTUs (ZOTU13-cattle and ZOTU24-human) could be confirmed.

Missing Annotations
Most of the less(er) abundant or rare ZOTUs could not be (well) annotated with any of the references. ZOTUs without proper annotations could be unknown species or sequencing errors (e.g., ZOTU54 or ZOTU63).
>ZOTU54 (NCBI BLAST - Uncultured bacterium?)
AGCAGCGGTAATACGAGTGCCCCAAGCGTTATCCGGAATTACTGGGCGTAAAGGGTGTGTAGGCGGCTTAGTTAGTCAGT
GGTTAAAGCTCCCGGCTTAACCGGGAAAGTGCCACTGAAACGGCTAAACTAGAGAGGGTGAGGGGTCTATGGAACTCATG
GTGTAGGAGTGAAATCCGTTGATATCATGGGGAACACCAAAAGCGAAGGCAATAGACTAGCACCTTACTGACGCTGAAAC
ACGAAAGCGTGGGGAT
>ZOTU63 (NCBI BLAST - Uncultured bacterium?)
AGCCGCGGTAAGACGGGGGATGCAAGTGTTATCCGGAATCACTGGGCGTAAAGCGTCTGTAGGTGGTTAAATAAGTCAAC
TGTTAAATCTTGAGGCTCAACCTCAAAATCGCAGTCGAAACTGTTTGACTAGAGTATAGTAGGGGTAAAGGGAATTTCCA
GTGGAGCGGTGAAATGCGTAGAGATTGGAAAGAACACCAATGGCGAAGGCACTTTACTGGGCTATTACTAACACTGAGAG
ACGAAAGCTAGGGTAG
Inconsistent Annotations
Most inconsistent annotations are based on missing taxa labels. For example ZOTU100. The sequences is not fish related and therefore does not get a meaningful annoation with MitoFish. The annoation "Actinopteri" is missleading because the reference does not contain any alternatives (outgroups) to Actinopteri. The annoation with MitoFish+ and MIDORI are identical but MitoFish+ goes until species.
MitoFish - ZOTU100 k:Eukaryota p:Chordata c:Actinopteri
MitoFish+ - ZOTU100 k:Eukaryota p:Chordata c:Mammalia o:Artiodactyla f:Bovidae g:Capra s:Capra_hircus (goat)
MIDORI - ZOTU100 k:Eukaryota p:Chordata c:Mammalia o:Artiodactyla f:Bovidae g:Capra
Another example is ZOTU27 where we have annotations until species level for both Mitofish references but only family level annotation for MIDORI.
MitoFish - ZOTU27 k:Eukaryota p:Chordata c:Actinopteri o:Cypriniformes f:Leuciscidae g:Phoxinus s:Phoxinus_phoxinus
MitoFish+ - ZOTU27 k:Eukaryota p:Chordata c:Actinopteri o:Cypriniformes f:Leuciscidae g:Phoxinus s:Phoxinus_phoxinus
MIDORI - ZOTU27 k:Eukaryota p:Chordata c:Actinopteri o:Cypriniformes f:Leuciscidae
Among the few "truly" inconsistent annoations is ZOUT152. From the annoation it is not clear if it is related to bacteria or to eukaryots. Additional NCBI blast searches lean mostly towards enderobacterales and do not indicate an similarity with crab.
MitoFish - ZOTU152 k:Eukaryota p:Chordata c:Actinopteri
MitoFish+ - ZOTU152 k:Bacteria p:Proteobacteria c:Gammaproteobacteria o:Enterobacterales
MIDORI - ZOTU152 k:Eukaryota p:Arthropoda c:Malacostraca o:Decapoda f:Varunidae g:Eriocheir s:Eriocheir_sinensis (Chinese mitten crab)