Alignments¶
Lecture notes¶
Dotplot¶
OpenDotlet JS beta in you browser.
(A) Click on the bottom labeled [SEQUENCE 1] and copy&paste the following sequence in the file below.
>Seq1 GTGCAAACCAAAAATTCTAGGTTACTAGAAGTTTTGCGACGTTCTAAGAGTTGGACGAAA TGTTTCGCGACCTAGGATGAGGTCGCCCTAGAAAATAGATTTCTGCTACTCTCCTCATAA GCAGTCCGGTGTATCGAAAGTACAAGACTAGCCTTGCTAGCAACCGCGGGCTGGGAGCCT AAGGCATCACTCAAGATACAGGCTCGGTAACGTACGCTCTAGCCATCTAACTATCCCCTA TGTCTTATAGGGACCTACGTTATCTGCCTGTCGAACCATAGGATTCGCATCAGCGCGCAG GCTTGGGTCGAGATAAAATCTCCAGTGCCCAAGACCACGGGCGCTCGGCGTCTTGGCTAA TCCCCGTACATGTTGTTATAAATAATCAGTAGAAACTCTGTGTTAGAGGGTGGAGTGACC TTAAATCAAGGACGATATTAATCGGAAGGAGTATTCAACGTGATGAAGTCGCAGGGTTGA CGTGGGAATGGTGCTTCTGTCCAAACAGGTTAGGGTATAACGCCGGAACCGTCCCCCAAG CGTACAGGGTGGGCTTTGCAACGACTTCCGAGTCCAAAGACTCGCTGTTTTCGAAATTTG CGCTCAAGGGCGAGTATTGAACCAGGCTTACGCCCAAGTACGTAGCAAGGTGACTCAAAC AGAGTACATCCTGCCCGCGTTTCGTATGAATCAAGTTAGAAGTTATGGAACATAATAACA TGTGGATGGCCAGTGGTCGGTTGTTACACGCCTGCCGCAACGTTGAAAGTCCCGGATTAG ACTGGCAGGATCTATGCCGTGAGACCCGTTATGCTCCATTACCGTCAGTGGGTCACAGCT TGTTGTGGACTGGATTGCCATTCTCTGAGTGTATTACGGAGGCGGCCGCACGGGTCCCAT ATAAACCAGTCATAGCTTACCTGACTGTACTTAGAAATGTGGCTTCGCCTTTGCCCACGC ACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAAC
(B) Click on the bottom labeled [SEQUENCE 2] and copy&paste Seq2 to Seq5 and compare it with sequence #1. Explore the graphic parameters (C) and play with the widows size (D).
>Seq2 GTGCAAACCAAAAATTCTAGGTTACTAGAAGTTTTGCGACGTTCTAAGAGTTGGACGAAA TGTTTCGCGACCTAGGATGAGGTCGCCCTAGAAAATAGATTTCTGCTACTCTCCTCATAA GCAGTCCGGTGTATCGAAAGTACAAGACTAGCCTTGCTAGCAACCGCGGGCTGGGAGCCT AAGGCATCACTCAAGATACAGGCTCGGTAACGTACGCTCTAGCCATCTAACTATCCCCTA TGTCTTATAGGGACCTACGTTATCTGCCTGTCGAACCATAGGATTCGCATCAGCGCGCAG GCTTGGGTCGAGATAAAATCTCCAGTGCCCAAGACCACGGGCGCTCGGCGTCTTGGCTAA CCAGTGAGGTGGGAGATTGTGTCTCAAAGATGACTAATAAATATTGTTGTACATGCCCCT AGTTGGGACGCTGAAGTAGTGCAACTTATGAGGAAGGCTAATTATAGCAGGAACTAAATT GAACCCCCTGCCAAGGCCGCAATATGGGATTGGACAAACCTGTCTTCGTGGTAAGGGTGC CGTACAGGGTGGGCTTTGCAACGACTTCCGAGTCCAAAGACTCGCTGTTTTCGAAATTTG CGCTCAAGGGCGAGTATTGAACCAGGCTTACGCCCAAGTACGTAGCAAGGTGACTCAAAC AGAGTACATCCTGCCCGCGTTTCGTATGAATCAAGTTAGAAGTTATGGAACATAATAACA TGTGGATGGCCAGTGGTCGGTTGTTACACGCCTGCCGCAACGTTGAAAGTCCCGGATTAG ACTGGCAGGATCTATGCCGTGAGACCCGTTATGCTCCATTACCGTCAGTGGGTCACAGCT TGTTGTGGACTGGATTGCCATTCTCTGAGTGTATTACGGAGGCGGCCGCACGGGTCCCAT ATAAACCAGTCATAGCTTACCTGACTGTACTTAGAAATGTGGCTTCGCCTTTGCCCACGC ACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAAC >Seq3 GTGCAAACCAAAAATTCTAGGTTACTAGAAGTTTTGCGACGTTCTAAGAGTTGGACGAAA TGTTTCGCGACCTAGGATGAGGTCGCCCTAGAAAATAGATTTCTGCTACTCTCCTCATAA GCAGTCCGGTGTATCGAAAGTACAAGACTAGCCTTGCTAGCAACCGCGGGCTGGGAGCCT TCCCCGTACATGTTGTTATAAATAATCAGTAGAAACTCTGTGTTAGAGGGTGGAGTGACC TTAAATCAAGGACGATATTAATCGGAAGGAGTATTCAACGTGATGAAGTCGCAGGGTTGA CGTGGGAATGGTGCTTCTGTCCAAACAGGTTAGGGTATAACGCCGGAACCGTCCCCCAAG CGTACAGGGTGGGCTTTGCAACGACTTCCGAGTCCAAAGACTCGCTGTTTTCGAAATTTG CGCTCAAGGGCGAGTATTGAACCAGGCTTACGCCCAAGTACGTAGCAAGGTGACTCAAAC AGAGTACATCCTGCCCGCGTTTCGTATGAATCAAGTTAGAAGTTATGGAACATAATAACA AAGGCATCACTCAAGATACAGGCTCGGTAACGTACGCTCTAGCCATCTAACTATCCCCTA TGTCTTATAGGGACCTACGTTATCTGCCTGTCGAACCATAGGATTCGCATCAGCGCGCAG GCTTGGGTCGAGATAAAATCTCCAGTGCCCAAGACCACGGGCGCTCGGCGTCTTGGCTAA TGTGGATGGCCAGTGGTCGGTTGTTACACGCCTGCCGCAACGTTGAAAGTCCCGGATTAG ACTGGCAGGATCTATGCCGTGAGACCCGTTATGCTCCATTACCGTCAGTGGGTCACAGCT TGTTGTGGACTGGATTGCCATTCTCTGAGTGTATTACGGAGGCGGCCGCACGGGTCCCAT ATAAACCAGTCATAGCTTACCTGACTGTACTTAGAAATGTGGCTTCGCCTTTGCCCACGC ACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAAC >Seq4 GTGCAAACCAAAAATTCTAGGTTACTAGAAGTTTTGCGACGTTCTAAGAGTTGGACGAAA TGTTTCGCGACCTAGGATGAGGTCGCCCTAGAAAATAGATTTCTGCTACTCTCCTCATAA GCAGTCCGGTGTATCGAAAGTACAAGACTAGCCTTGCTAGCAACCGCGGGCTGGGAGCCT AAGGCATCACTCAAGATACAGGCTCGGTAACGTACGCTCTAGCCATCTAACTATCCCCTA TGTCTTATAGGGACCTACGTTATCTGCCTGTCGAACCATAGGATTCGCATCAGCGCGCAG GCTTGGGTCGAGATAAAATCTCCAGTGCCCAAGACCACGGGCGCTCGGCGTCTTGGCTAA GGCTTATGCCCAAGATCGTAGCAAGCAGACTCAAACAAGATATATTTTGCCCGCCTTACA TTAAATCAAGGACGATATTAATCGGAAGGAGTATTCAACGTGATGAAGTCGCAGGGTTGA CGTGGGAATGGTGCTTCTGTCCAAACAGGTTAGGGTATAACGCCGGAACCGTCCCCCAAG CGTACAGGGTGGGCTTTGCAACGACTTCCGAGTCCAAAGACTCGCTGTTTTCGAAATTTG CGCTCAAGGGCGAGTATTGAACCAGGCTTACGCCCAAGTACGTAGCAAGGTGACTCAAAC AGAGTACATCCTGCCCGCGTTTCGTATGAATCAAGTTAGAAGTTATGGAACATAATAACA TGTGGATGGCCAGTGGTCGGTTGTTACACGCCTGCCGCAACGTTGAAAGTCCCGGATTAG GACGAAACTAGTTGGAGGTTATGGAGCATACTATCACGTGGGCGGCCACTGGTGAGTTAC TGTTGTGGACTGGATTGCCATTCTCTGAGTGTATTACGGAGGCGGCCGCACGGGTCCCAT ATAAACCAGTCATAGCTTACCTGACTGTACTTAGAAATGTGGCTTCGCCTTTGCCCACGC ACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAAC >Seq5 GTGCAAACCAAAAATTCTAGGTTACTAGAAGTTTTGCGACGTTCTAAGAGTTGGACGAAA TGTTTCGCGACCTAGGATGAGGTCGCCCTAGAAAATAGATTTCTGCTACTCTCCTCATAA GCAGTCCGGTGTATCGAAAGTACAAGACTAGCCTTGCTAGCAACCGCGGGCTGGGAGCCT AAGGCATCACTCAAGATACAGGCTCGGTAACGTACGCTCTAGCCATCTAACTATCCCCTA TGTCTTATAGGGACCTACGTTATCTGCCTGTCGAACCATAGGATTCGCATCAGCGCGCAG TTTATATTATTTTATAATTATTTATAATTTAAAATTTAATTTATATTAAATTATTATTTT TCCCCGTACATGTTGTTATAAATAATCAGTAGAAACTCTGTGTTAGAGGGTGGAGTGACC TTAAATCAAGGACGATATTAATCGGAAGGAGTATTCAACGTGATGAAGTCGCAGGGTTGA CGTGGGAATGGTGCTTCTGTCCAAACAGGTTAGGGTATAACGCCGGAACCGTCCCCCAAG CGTACAGGGTGGGCTTTGCAACGACTTCCGAGTCCAAAGACTCGCTGTTTTCGAAATTTG CGCTCAAGGGCGAGTATTGAACCAGGCTTACGCCCAAGTACGTAGCAAGGTGACTCAAAC AGAGTACATCCTGCCCGCGTTTCGTATGAATCAAGTTAGAAGTTATGGAACATAATAACA TGTGGATGGCCAGTGGTCGGTTGTTACACGCCTGCCGCAACGTTGAAAGTCCCGGATTAG TTTATATTATTTTATAATTATTTATAATTTAAAATTTAATTTATATTAAATTATTATTTT ACTGGCAGGATCTATGCCGTGAGACCCGTTATGCTCCATTACCGTCAGTGGGTCACAGCT TGTTGTGGACTGGATTGCCATTCTCTGAGTGTATTACGGAGGCGGCCGCACGGGTCCCAT ATAAACCAGTCATAGCTTACCTGACTGTACTTAGAAATGTGGCTTCGCCTTTGCCCACGC ACCTGATCGCTCCTCGTTTGCTTTTAAGGACCGGACGAAC
The multi-fasta sequences can also been downloaded if you prefer.
BLAST¶
Got to NCBI Blast website the blast website and use the blastn algorithm to blast the following randomly generated sequences:
>Random_Sequence 1 (GC content 0.5, length = 10nt) ATCCAACACT >Random_Sequence 2 (GC content 0.5, length = 20nt) TGGGTAGGGGCCTCCAATTC >Random_Sequence 3 (GC content 0.5, length = 30nt) GGCTCCCGTCTTTGAATAGGGGTAAACATA >Random_Sequence 4 (GC content 0.05, length = 30nt) TTTACAATTAAAAAAAAAATTATTTTATAA
What did you observe? Did you get good hits and if so can you explain why?
Blast the following fasta sequence and try to figure out what it could be? It is not a random sequence but its GC content and length is identical to the Random_Sequence_3.
>NOT_A_Random_Sequence (GC content 0.5, length = 30nt) TCGGCAGCGTCAGATGTGTATAAGAGACAG
Any idea what it could be?
Now, lets Blast a bunch of Illumina specific adaptor sequences. You can blast multiple sequences either by copy&paste the (not aligned) sequences into the Enter Query Sequences window or you upload a text file. The fasta sequences were obtained from a offical Illumina Adapter Sequences Document.
>Adapter Nextera Kits CTGTCTCTTATACACATCT >Adapter Nextera Transposase Read1 TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG >Adapter Nextera Transposase Read2 GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG >Adapter Index 1 (i7) CAAGCAGAAGACGGCATACGAGATNNNNNGTCTCGTGGGCTCGG >Adapter Index 2 (i5) AATGATACGGCGACCACCGAGATCTACACNNNNNTCGTCGGCAGCGTC >Adapter TruSeq Read 1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCA >Adapter TruSeq Read 2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT >Adapter TruSeq Index 1 (i7) GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNATCTCGTATGCCGTCTTCTGCTTG >Adapter TruSeq Index 2 (i5) AATGATACGGCGACCACCGAGATCTACACNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT >Universal Adapter TruSight RNA AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT >Index Adapter TruSight RNA GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNATCTCGTATGCCGTCTTCTGCTTG
What do you find and more importantly to you have an idea what it means?
Blast Terminal¶
As an alternative to the online BLAST service provided by NCBI or ENA you could also install Blast command line application locally and create your own databases and modify the outputs. For more details please consult the BLAST Command Line Applications User Manual.
Before you can search (e.g. blast) your fasta sequence collection (e.g. database) you have to create an index using the makeblastdb application that produces BLAST databases from FASTA files.
# Building a BLAST database with local sequences: makeblastdb -in database.fa -dbtype nucl -title MyTestDB # Run Blast: blastn -db MyTestDB -evalue 0.001 -query Input.fa -num_threads 5 -out Results.blast