Skip to content

MIDORI Reference

Reference Info

MIDORI is a reference database of DNA sequences, which can be used for taxonomic assignments of all eukaryota mitochondrial DNA sequences.

There are two references available:

refMDu="MIDORI_UNIQ_GB240_srRNA_SINTAX.fasta"
refMDl="MIDORI_LONGEST_GB240_srRNA_SINTAX.fasta"

# Number of Records
# n(MIDORIunique) = 132,578
# n(MIDORIlong)  =  56,052

# Number of Primer Sites (unique/long)
# n(Teleo-R  ):    20 /     2 
# n(Teleo-Rrc): 5,750 / 3,409 
# n(MiFish-F) :   103 /    64

# => We use the MIDORI_UNIQ reference.
# => MIDORI_LONGEST reference is not ideal for our purposes.

Fix Formating

sed -i 's/__/_/g' ${refMDu}
sed -i 's/f:_/f:/g' ${refMDu}
sed -i 's/_Rhiniformes group//g' ${refMDu}

First PCR

for E in 3 # max 3 primer mismatches
do
  # PCR with size restriction (±10%) or expected size
  ${u} -search_pcr ${refMDu} \
       -db primer_fragment.fa \
       -strand both \
       -maxdiffs ${E} \
       -minamp 650 \
       -maxamp 800 \
       -pcrout MIDORI-Fragment.txt \
       -ampout MIDORI-Fragment.fa
done
CorrectOrientation.sh MIDORI-Fragment.fa MIDORI-Fragment.txt MIDORI-Fragment_FRC.fa

PCR amplicon length distribution:

# 650  6
# 660  15
# 670 * 60
# 680 ** 111
# 690 ******** 473
# 700 ************************************************************ 3616
# 710 ********************************************************** 3483
# 720 ************************* 1484
# 730 ****************** 1088
# 740  29
# 750  2
# 760  0
# 770  2
# 780  0
# 790  0
# 800  1

# total hits: 10,370 Hits

Count Fish-Hits

## Class-Level - Fish-Hits (e3)
grep "Actinopteri" MIDORI-Fragment_FRC.fa -c
# N = 5,975
grep "Cladistia" MIDORI-Fragment_FRC.fa -c
# N = 44
grep "Chondrichthyes" MIDORI-Fragment_FRC.fa -c
# N = 374

Remove Non-Fish Hits

cat search.list
# Actinopteri
# Cladistia
# Chondrichthyes
${u} -fastx_getseqs MIDORI-Fragment_FRC.fa -label_words search.list -fastaout MIDORI_FishHits.fa
# N = 6,393 (61.6%)

MIDORI: MiFish-U-F / Teleo-R Fragment