Alignments

Lecture notes

Dotplot

OpenDotlet JS beta in you browser.

(A) Click on the bottom labeled [SEQUENCE 1] and copy&paste the following sequence in the file below.

>Seq1
GTGCAAACCAAAAATTCTAGGTTACTAGAAGTTTTGCGACGTTCTAAGAGTTGGACGAAA
TGTTTCGCGACCTAGGATGAGGTCGCCCTAGAAAATAGATTTCTGCTACTCTCCTCATAA
GCAGTCCGGTGTATCGAAAGTACAAGACTAGCCTTGCTAGCAACCGCGGGCTGGGAGCCT
AAGGCATCACTCAAGATACAGGCTCGGTAACGTACGCTCTAGCCATCTAACTATCCCCTA
TGTCTTATAGGGACCTACGTTATCTGCCTGTCGAACCATAGGATTCGCATCAGCGCGCAG
GCTTGGGTCGAGATAAAATCTCCAGTGCCCAAGACCACGGGCGCTCGGCGTCTTGGCTAA
TCCCCGTACATGTTGTTATAAATAATCAGTAGAAACTCTGTGTTAGAGGGTGGAGTGACC
TTAAATCAAGGACGATATTAATCGGAAGGAGTATTCAACGTGATGAAGTCGCAGGGTTGA
CGTGGGAATGGTGCTTCTGTCCAAACAGGTTAGGGTATAACGCCGGAACCGTCCCCCAAG
CGTACAGGGTGGGCTTTGCAACGACTTCCGAGTCCAAAGACTCGCTGTTTTCGAAATTTG
CGCTCAAGGGCGAGTATTGAACCAGGCTTACGCCCAAGTACGTAGCAAGGTGACTCAAAC
AGAGTACATCCTGCCCGCGTTTCGTATGAATCAAGTTAGAAGTTATGGAACATAATAACA
TGTGGATGGCCAGTGGTCGGTTGTTACACGCCTGCCGCAACGTTGAAAGTCCCGGATTAG
ACTGGCAGGATCTATGCCGTGAGACCCGTTATGCTCCATTACCGTCAGTGGGTCACAGCT
TGTTGTGGACTGGATTGCCATTCTCTGAGTGTATTACGGAGGCGGCCGCACGGGTCCCAT
ATAAACCAGTCATAGCTTACCTGACTGTACTTAGAAATGTGGCTTCGCCTTTGCCCACGC
ACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAAC

(B) Click on the bottom labeled [SEQUENCE 2] and copy&paste Seq2 to Seq5 and compare it with sequence #1. Explore the graphic parameters (C) and play with the widows size (D).

>Seq2
GTGCAAACCAAAAATTCTAGGTTACTAGAAGTTTTGCGACGTTCTAAGAGTTGGACGAAA
TGTTTCGCGACCTAGGATGAGGTCGCCCTAGAAAATAGATTTCTGCTACTCTCCTCATAA
GCAGTCCGGTGTATCGAAAGTACAAGACTAGCCTTGCTAGCAACCGCGGGCTGGGAGCCT
AAGGCATCACTCAAGATACAGGCTCGGTAACGTACGCTCTAGCCATCTAACTATCCCCTA
TGTCTTATAGGGACCTACGTTATCTGCCTGTCGAACCATAGGATTCGCATCAGCGCGCAG
GCTTGGGTCGAGATAAAATCTCCAGTGCCCAAGACCACGGGCGCTCGGCGTCTTGGCTAA
CCAGTGAGGTGGGAGATTGTGTCTCAAAGATGACTAATAAATATTGTTGTACATGCCCCT
AGTTGGGACGCTGAAGTAGTGCAACTTATGAGGAAGGCTAATTATAGCAGGAACTAAATT
GAACCCCCTGCCAAGGCCGCAATATGGGATTGGACAAACCTGTCTTCGTGGTAAGGGTGC
CGTACAGGGTGGGCTTTGCAACGACTTCCGAGTCCAAAGACTCGCTGTTTTCGAAATTTG
CGCTCAAGGGCGAGTATTGAACCAGGCTTACGCCCAAGTACGTAGCAAGGTGACTCAAAC
AGAGTACATCCTGCCCGCGTTTCGTATGAATCAAGTTAGAAGTTATGGAACATAATAACA
TGTGGATGGCCAGTGGTCGGTTGTTACACGCCTGCCGCAACGTTGAAAGTCCCGGATTAG
ACTGGCAGGATCTATGCCGTGAGACCCGTTATGCTCCATTACCGTCAGTGGGTCACAGCT
TGTTGTGGACTGGATTGCCATTCTCTGAGTGTATTACGGAGGCGGCCGCACGGGTCCCAT
ATAAACCAGTCATAGCTTACCTGACTGTACTTAGAAATGTGGCTTCGCCTTTGCCCACGC
ACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAAC
>Seq3
GTGCAAACCAAAAATTCTAGGTTACTAGAAGTTTTGCGACGTTCTAAGAGTTGGACGAAA
TGTTTCGCGACCTAGGATGAGGTCGCCCTAGAAAATAGATTTCTGCTACTCTCCTCATAA
GCAGTCCGGTGTATCGAAAGTACAAGACTAGCCTTGCTAGCAACCGCGGGCTGGGAGCCT
TCCCCGTACATGTTGTTATAAATAATCAGTAGAAACTCTGTGTTAGAGGGTGGAGTGACC
TTAAATCAAGGACGATATTAATCGGAAGGAGTATTCAACGTGATGAAGTCGCAGGGTTGA
CGTGGGAATGGTGCTTCTGTCCAAACAGGTTAGGGTATAACGCCGGAACCGTCCCCCAAG
CGTACAGGGTGGGCTTTGCAACGACTTCCGAGTCCAAAGACTCGCTGTTTTCGAAATTTG
CGCTCAAGGGCGAGTATTGAACCAGGCTTACGCCCAAGTACGTAGCAAGGTGACTCAAAC
AGAGTACATCCTGCCCGCGTTTCGTATGAATCAAGTTAGAAGTTATGGAACATAATAACA
AAGGCATCACTCAAGATACAGGCTCGGTAACGTACGCTCTAGCCATCTAACTATCCCCTA
TGTCTTATAGGGACCTACGTTATCTGCCTGTCGAACCATAGGATTCGCATCAGCGCGCAG
GCTTGGGTCGAGATAAAATCTCCAGTGCCCAAGACCACGGGCGCTCGGCGTCTTGGCTAA
TGTGGATGGCCAGTGGTCGGTTGTTACACGCCTGCCGCAACGTTGAAAGTCCCGGATTAG
ACTGGCAGGATCTATGCCGTGAGACCCGTTATGCTCCATTACCGTCAGTGGGTCACAGCT
TGTTGTGGACTGGATTGCCATTCTCTGAGTGTATTACGGAGGCGGCCGCACGGGTCCCAT
ATAAACCAGTCATAGCTTACCTGACTGTACTTAGAAATGTGGCTTCGCCTTTGCCCACGC
ACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAAC
>Seq4
GTGCAAACCAAAAATTCTAGGTTACTAGAAGTTTTGCGACGTTCTAAGAGTTGGACGAAA
TGTTTCGCGACCTAGGATGAGGTCGCCCTAGAAAATAGATTTCTGCTACTCTCCTCATAA
GCAGTCCGGTGTATCGAAAGTACAAGACTAGCCTTGCTAGCAACCGCGGGCTGGGAGCCT
AAGGCATCACTCAAGATACAGGCTCGGTAACGTACGCTCTAGCCATCTAACTATCCCCTA
TGTCTTATAGGGACCTACGTTATCTGCCTGTCGAACCATAGGATTCGCATCAGCGCGCAG
GCTTGGGTCGAGATAAAATCTCCAGTGCCCAAGACCACGGGCGCTCGGCGTCTTGGCTAA
GGCTTATGCCCAAGATCGTAGCAAGCAGACTCAAACAAGATATATTTTGCCCGCCTTACA
TTAAATCAAGGACGATATTAATCGGAAGGAGTATTCAACGTGATGAAGTCGCAGGGTTGA
CGTGGGAATGGTGCTTCTGTCCAAACAGGTTAGGGTATAACGCCGGAACCGTCCCCCAAG
CGTACAGGGTGGGCTTTGCAACGACTTCCGAGTCCAAAGACTCGCTGTTTTCGAAATTTG
CGCTCAAGGGCGAGTATTGAACCAGGCTTACGCCCAAGTACGTAGCAAGGTGACTCAAAC
AGAGTACATCCTGCCCGCGTTTCGTATGAATCAAGTTAGAAGTTATGGAACATAATAACA
TGTGGATGGCCAGTGGTCGGTTGTTACACGCCTGCCGCAACGTTGAAAGTCCCGGATTAG
GACGAAACTAGTTGGAGGTTATGGAGCATACTATCACGTGGGCGGCCACTGGTGAGTTAC
TGTTGTGGACTGGATTGCCATTCTCTGAGTGTATTACGGAGGCGGCCGCACGGGTCCCAT
ATAAACCAGTCATAGCTTACCTGACTGTACTTAGAAATGTGGCTTCGCCTTTGCCCACGC
ACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAAC
>Seq5
GTGCAAACCAAAAATTCTAGGTTACTAGAAGTTTTGCGACGTTCTAAGAGTTGGACGAAA
TGTTTCGCGACCTAGGATGAGGTCGCCCTAGAAAATAGATTTCTGCTACTCTCCTCATAA
GCAGTCCGGTGTATCGAAAGTACAAGACTAGCCTTGCTAGCAACCGCGGGCTGGGAGCCT
AAGGCATCACTCAAGATACAGGCTCGGTAACGTACGCTCTAGCCATCTAACTATCCCCTA
TGTCTTATAGGGACCTACGTTATCTGCCTGTCGAACCATAGGATTCGCATCAGCGCGCAG
TTTATATTATTTTATAATTATTTATAATTTAAAATTTAATTTATATTAAATTATTATTTT
TCCCCGTACATGTTGTTATAAATAATCAGTAGAAACTCTGTGTTAGAGGGTGGAGTGACC
TTAAATCAAGGACGATATTAATCGGAAGGAGTATTCAACGTGATGAAGTCGCAGGGTTGA
CGTGGGAATGGTGCTTCTGTCCAAACAGGTTAGGGTATAACGCCGGAACCGTCCCCCAAG
CGTACAGGGTGGGCTTTGCAACGACTTCCGAGTCCAAAGACTCGCTGTTTTCGAAATTTG
CGCTCAAGGGCGAGTATTGAACCAGGCTTACGCCCAAGTACGTAGCAAGGTGACTCAAAC
AGAGTACATCCTGCCCGCGTTTCGTATGAATCAAGTTAGAAGTTATGGAACATAATAACA
TGTGGATGGCCAGTGGTCGGTTGTTACACGCCTGCCGCAACGTTGAAAGTCCCGGATTAG
TTTATATTATTTTATAATTATTTATAATTTAAAATTTAATTTATATTAAATTATTATTTT
ACTGGCAGGATCTATGCCGTGAGACCCGTTATGCTCCATTACCGTCAGTGGGTCACAGCT
TGTTGTGGACTGGATTGCCATTCTCTGAGTGTATTACGGAGGCGGCCGCACGGGTCCCAT
ATAAACCAGTCATAGCTTACCTGACTGTACTTAGAAATGTGGCTTCGCCTTTGCCCACGC
ACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAAC

The multi-fasta sequences can also been downloaded if you prefer.

BLAST

Got to NCBI Blast website the blast website and use the blastn algorithm to blast the following randomly generated sequences:

>Random_Sequence 1 (GC content 0.5, length = 10nt)
ATCCAACACT
>Random_Sequence 2 (GC content 0.5, length = 20nt)
TGGGTAGGGGCCTCCAATTC
>Random_Sequence 3 (GC content 0.5, length = 30nt)
GGCTCCCGTCTTTGAATAGGGGTAAACATA
>Random_Sequence 4 (GC content 0.05, length = 30nt)
TTTACAATTAAAAAAAAAATTATTTTATAA

What did you observe? Did you get good hits and if so can you explain why?

Blast the following fasta sequence and try to figure out what it could be? It is not a random sequence but its GC content and length is identical to the Random_Sequence_3.

>NOT_A_Random_Sequence (GC content 0.5, length = 30nt)
TCGGCAGCGTCAGATGTGTATAAGAGACAG

Any idea what it could be?

Now, lets Blast a bunch of Illumina specific adaptor sequences. You can blast multiple sequences either by copy&paste the (not aligned) sequences into the Enter Query Sequences window or you upload a text file. The fasta sequences were obtained from a offical Illumina Adapter Sequences Document.

>Adapter Nextera Kits
CTGTCTCTTATACACATCT
>Adapter Nextera Transposase Read1
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
>Adapter Nextera Transposase Read2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
>Adapter Index 1 (i7)
CAAGCAGAAGACGGCATACGAGATNNNNNGTCTCGTGGGCTCGG
>Adapter Index 2 (i5) 
AATGATACGGCGACCACCGAGATCTACACNNNNNTCGTCGGCAGCGTC
>Adapter TruSeq Read 1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
>Adapter TruSeq Read 2
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
>Adapter TruSeq Index 1 (i7) 
GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNATCTCGTATGCCGTCTTCTGCTTG
>Adapter TruSeq Index 2 (i5)
AATGATACGGCGACCACCGAGATCTACACNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT
>Universal Adapter TruSight RNA
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>Index Adapter TruSight RNA
GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNATCTCGTATGCCGTCTTCTGCTTG 

What do you find and more importantly to you have an idea what it means?

Blast Terminal

As an alternative to the online BLAST service provided by NCBI or ENA you could also install Blast command line application locally and create your own databases and modify the outputs. For more details please consult the BLAST Command Line Applications User Manual.

Before you can search (e.g. blast) your fasta sequence collection (e.g. database) you have to create an index using the makeblastdb application that produces BLAST databases from FASTA files.

# Building a BLAST database with local sequences:
makeblastdb -in database.fa -dbtype nucl -title MyTestDB
# Run Blast:
blastn -db MyTestDB -evalue 0.001 -query Input.fa -num_threads 5 -out Results.blast

BLAST Server: