=============== DATA SUBMISSION =============== In these exercises you will familiarise yourself with the steps taken to submit a binned metagenome. Please use the same account you used for the previous ENA submission exercise. Some of these exercises require use of command-line programs. Where there is a command to run, it will be provided for you mostly complete, indicated by an indented line beginning with a dollar ('$') sign. Do not enter the $, just the text following it. Some commands have been left incomplete for you to fill in, so read them carefully before you enter them. Feel free to ask for help if anything is unclear! If you need to review anything that was covered in the presentation regarding metagenome assembly submission, you can follow the documentation here: https://ena-docs.readthedocs.io/en/latest/assembly/metagenome.html THE SCENARIO ____________ Interested to learn more about the metagenome you sequenced previously, you have assembled and binned the raw reads. You have a set of contigs which you have identified originated in species of the genus Aeromonas. The result is a fasta file containing an assembly with a completeness score of 91.12% and a contamination score of 7.80%. THE STUDY _________ A study has already been registered. Further data related to the previously submitted reads should be submitted to the same study, so there is no need to register a new one. THE SAMPLE __________ The first step is to register your binned sample. This will provide context as to how your assembly was derived and will describe measures of its quality. You should submit the sample through the programmatic route. 1. If you closed your terminal window after the first exercise, re-open it now, connect to the server, and return to the ENA directory: $ ssh student@gdcsrv2.ethz.ch # replace ?? with your student number $ cd ~/ENA 2. Start by looking at the sample XML we have prepared for you in the file: $ cat binned_metagenome_sample.xml 3. Have a look at the taxonomy that is used to describe the sample. "uncultured Aeromonas sp." is 'uncultured' taxonomy used specifically for samples derived from an environmental source. In this case it describes a bacteria of genus Aeromonas. 4. Note the sample description field - It is good practice to describe your binned metagenome or MAG sample very clearly so it is clear it is not a physical, biological sample but in fact represents the bacteria from the environment that was sampled. It is also good practice to reference either the original raw reads or environmental sample from within the description to give your binned sample as much context as possible. Note the 'sample derived from' field - This is an attribute that points this sample back to the sample from which it was derived. In this case, the values have been filled in for you. 5. Next, look again at the submission XML $ cat 'submission.xml' 6. You now have the three essential components of a programmatic submission: - An account name and password - An object to submit in XML format - An instruction in XML format 7. Enter the submission command below, editing it as you go to include the USERNAME, PASSWORD and file names (TODO). Note that the '@' symbols are required. $ curl -u USERNAME:PASSWORD -F "SUBMISSION=@TODO.xml" -F "SAMPLE=@TODO.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" 8. You will see an XML document printed to your terminal. In the 'RECEIPT' tag, find the 'success' entry, which will read true if your submission was a success. 9. Find the two sample accessions (SAMEAxxxxx and ERSxxxxxx). Be careful not to confuse the submission accession with these (ERAxxxxxx). 10. Also in the submission receipt, note the 'INFO' tags. One of these should indicate that your submission is a test, to be discarded in 24 hours. THE BINNED METAGENOME _____________________ The final step is to register the assembly data which was produced by binning and assembling your raw batter reads. This will be done using the Webin-CLI program. 1. Firstly, let's look at the FASTA file you'll be submitting. There is one FASTA file available to you: - binned_metagenome.fasta.gz Ensure you are in the correct directory: $ cd ~/ENA/ Take a look at the contents of the file using the 'zcat' command: $ zcat binned_metagenome.fasta.gz | head 2. You may need to rerun the below command to make sure Webin-CLI is easily available to you: $ alias webin-cli='java -jar ~/ENA/webin-cli-2.1.0.jar' You can always check by running the command: $ webin-cli -help 3. Next, view the manifest file. This file performs three functions: - Indicates the names of files to submit - Describes experimental metadata - States the study and sample the data belong to Navigate to the directory containing the files and view the file now: $ cat binned_metagenome_manifest.txt 4. The manifest file is a list of tag-value pairs. Look carefully over the information entered here. Study and sample accessions have been provided but you would normally use your own accessions. If you are confident with Linux, feel free to edit the manifest to use the accessions you registered earlier. 5. Now you have everything you need to submit your sequence data, it's time to run the Webin-CLI command to check that everything is valid. Again, fill in the 3 incomplete sections: $ webin-cli -context genome -manifest TODO.txt -userName "Webin-TODO" -password TODO -test -validate 6. Finally, it's time to submit the sequence files, so complete the following command and run it: $ webin-cli -context genome -manifest TODO.txt -userName "Webin-TODO" -password TODO -test -submit 7. The program will notify you as it successfully validates your submission and uploads the fasta file to ENA. You will then be provided an accession for your analysis object (ERZxxxxxx). Note this down and your submission is complete. 8. Return to the interactive Webin interface and view the sample and analysis tabs to see the objects you submitted today: https://wwwdev.ebi.ac.uk/ena/submit/sra/#home