=============== DATA SUBMISSION =============== As we have discussed, there are three routes of submission: - Interactive - Programmatic - Webin-CLI In these exercises you will use all three in turn to submit an RNA-Seq dataset to the ENA's test submission service. Choose an account to use for these exercises by putting your name next to it in the list available at this address: https://docs.google.com/document/d/1D4PUsTa2zZOs19iFzHhsU05QOTFqD9cDDxqyKrbiMls/edit Make a note of your account ID. The password for all accounts is 'ws2017'. Some of these exercises require use of command-line programs. Where there is a command to run, it will be provided for you mostly complete, indicated by an indented line beginning with a dollar ('$') sign. Do not enter the $, just the text following it. Some commands have been left incomplete for you to fill in, so read them carefully before you enter them. Feel free to ask for help if anything is unclear! If you need to review how the metadata model works, you can find an overview of it here: https://ena-docs.readthedocs.io/en/latest/general-guide/metadata.html THE SCENARIO ____________ Interested to learn more about food metatranscriptomes, you have sampled cake batter from a bakery in Zurich, Switzerland. You have prepared a shotgun library for paired end sequencing on an Illumina MiSeq. The result is a set of paired FASTQ files. THE STUDY _________ To begin, you should register a study object using the interactive submission service. Recall that a study describes the purpose of the work you have done, groups other objects beneath it, and controls when the data becomes public. 1. Log in to the Webin test submission service with your account ID and the password 'ws2017': https://wwwdev.ebi.ac.uk/ena/submit/sra/#home Please use the above link to ensure you are using the TEST server 2. Click the 'New Submission' tab and find the 'Register study (project)' radio button. Click 'Next'. 3. The 'Short Name' field should be filled in with something brief and meaningful, as you will need it later. Use the phrase 'cake_study' along with the date for this, e.g. 'cake_study_24_01_2020'. 4. You should take time to provide a descriptive title and informative abstract for your own studies, but these can be edited later if needed. For now, "Metatranscriptome sequencing of cake batter from Zurich, Switzerland" will be an appropriate title. 5. When you have completed all required fields, click 'Submit' and then confirm. 6. Now navigate to the 'Studies' tab to see the study you just registered. You might need to refresh the page! Make a note of its accession numbers (ERPxxxxxx and PRJEBxxxxxx). THE SAMPLE __________ The next step is to register the sample, which will give users essential context for the sequence data you are submitting. You will submit the sample through the programmatic route. 1. Start opening a terminal window and use the following commands to get into the server: $ ssh student@gdcsrv2.ethz.ch # replace ?? with your student number Enter your password when prompted 2. Move to the folder containing the files using the 'cd' command: $ cd ~/ENA List the folder contents with the 'ls' command: $ ls -l Take a look at the sample XML we have prepared for you in the file: $ cat batter_sample.xml Look at the different attributes contained within. Do you feel this is a well-annotated sample? 3. Next, look at the submission XML: $ cat submission.xml What action is specified for this submission? 4. You now have the three essential components of a programmatic submission: - An account name and password - An object to submit in XML format - An instruction in XML format 5. Enter the submission command below, editing it as you go to include the USERNAME, PASSWORD and file names (TODO). Note that the '@' symbols are required. $ curl -u USERNAME:PASSWORD -F "SAMPLE=@TODO.xml" -F "SUBMISSION=@TODO.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" 6. You will see an XML document printed to your terminal. In the 'RECEIPT' tag, find the 'success' entry, which will read true if your submission was a success. 7. Find the two sample accessions (SAMEAxxxxx and ERSxxxxxx). Be careful not to confuse the submission accession with these (ERAxxxxxx). 8. Also in the submission receipt, note the 'INFO' tags. One of these should indicate that your submission is a test, to be discarded in 24 hours. THE READ DATA _____________ The final step is to register the read data produced by sequencing the RNA extracted from the batter. This will be done using the Webin-CLI program. 1. Note the file 'webin-cli-1.8.11.jar' which is all you need to run the program. Enter the following command in your terminal: $ alias webin-cli='java -jar ~/ENA/webin-cli-2.1.0.jar' 2. This makes the program available run simply by typing the command 'webin-cli' Try it now with the help option: $ webin-cli -help This should output the list of options which can be used with this program. If you close your terminal, you will need to rerun the alias command above before you can use webin-cli this way again. 3. Now that you have set up the program, let's look at the FASTQ files you'll be submitting. There are two FASTQ files available to you: - batter_1.fastq.gz - batter_2.fastq.gz Ensure you are in the correct directory: $ cd ~/ENA/ Take a look at the contents of one of the files using the 'zcat' command: $ zcat batter_1.fastq.gz | head 3. Next, view the manifest file. This file performs three functions: - Indicates the names of files to submit - Describes experimental metadata - States the study and sample the data belong to Navigate to the directory containing the files and view the file now: $ cat batter_manifest.txt 4. The manifest file is a list of tag-value pairs. Look carefully over the information entered here. Study and sample accessions have been provided but you would normally use your own accessions. 5. Now you have everything you need to submit your sequence data, it's time to check whether your submission is valid. Use the below command, filling in the 3 incomplete 'TODO' sections: $ webin-cli -context reads -manifest TODO.txt -userName "Webin-TODO" -password TODO -test -validate 6. Finally, it's time to submit the sequence files, so complete the following command and run it: $ webin-cli -context reads -manifest TODO.txt -userName "Webin-TODO" -password TODO -test -submit 7. The program will notify you as it successfully validates the files and uploads them to ENA. You will then be provided accessions for your experiment and run objects. Note these down and your submission is complete. 8. Return to the interactive Webin interface and view the sample and run tabs to see the objects you submitted today: https://wwwdev.ebi.ac.uk/ena/submit/sra/#home