MIDORI Reference
Reference Info
MIDORI is a reference database of DNA sequences, which can be used for taxonomic assignments of all eukaryota mitochondrial DNA sequences.
There are two references available:
refMDu="MIDORI_UNIQ_GB240_srRNA_SINTAX.fasta"
refMDl="MIDORI_LONGEST_GB240_srRNA_SINTAX.fasta"
# Number of Records
# n(MIDORIunique) = 132,578
# n(MIDORIlong) = 56,052
# Number of Primer Sites (unique/long)
# n(Teleo-R ): 20 / 2
# n(Teleo-Rrc): 5,750 / 3,409
# n(MiFish-F) : 103 / 64
# => We use the MIDORI_UNIQ reference.
# => MIDORI_LONGEST reference is not ideal for our purposes.
Fix Formating
sed -i 's/__/_/g' ${refMDu}
sed -i 's/f:_/f:/g' ${refMDu}
sed -i 's/_Rhiniformes group//g' ${refMDu}
First PCR
for E in 3 # max 3 primer mismatches
do
# PCR with size restriction (±10%) or expected size
${u} -search_pcr ${refMDu} \
-db primer_fragment.fa \
-strand both \
-maxdiffs ${E} \
-minamp 650 \
-maxamp 800 \
-pcrout MIDORI-Fragment.txt \
-ampout MIDORI-Fragment.fa
done
CorrectOrientation.sh MIDORI-Fragment.fa MIDORI-Fragment.txt MIDORI-Fragment_FRC.fa
PCR amplicon length distribution:
# 650 6
# 660 15
# 670 * 60
# 680 ** 111
# 690 ******** 473
# 700 ************************************************************ 3616
# 710 ********************************************************** 3483
# 720 ************************* 1484
# 730 ****************** 1088
# 740 29
# 750 2
# 760 0
# 770 2
# 780 0
# 790 0
# 800 1
# total hits: 10,370 Hits
Count Fish-Hits
## Class-Level - Fish-Hits (e3)
grep "Actinopteri" MIDORI-Fragment_FRC.fa -c
# N = 5,975
grep "Cladistia" MIDORI-Fragment_FRC.fa -c
# N = 44
grep "Chondrichthyes" MIDORI-Fragment_FRC.fa -c
# N = 374
Remove Non-Fish Hits
cat search.list
# Actinopteri
# Cladistia
# Chondrichthyes
${u} -fastx_getseqs MIDORI-Fragment_FRC.fa -label_words search.list -fastaout MIDORI_FishHits.fa
# N = 6,393 (61.6%)