===============
DATA SUBMISSION
===============

In these exercises you will familiarise yourself with the steps taken to submit
a binned metagenome.

Please use the same account you used for the previous ENA submission exercise.

Some of these exercises require use of command-line programs. Where there is a
command to run, it will be provided for you mostly complete, indicated by an
indented line beginning with a dollar ('$') sign. Do not enter the $, just the
text following it. Some commands have been left incomplete for you to fill in,
so read them carefully before you enter them.

Feel free to ask for help if anything is unclear!

If you need to review anything that was covered in the presentation regarding
metagenome assembly submission, you can follow the documentation here:

	https://ena-docs.readthedocs.io/en/latest/assembly/metagenome.html


THE SCENARIO
____________

Interested to learn more about the metagenome you sequenced previously, you have
assembled and binned the raw reads. You have a set of contigs which you have
identified originated in species of the genus Aeromonas.

The result is a fasta file containing an assembly with a completeness score of
91.12% and a contamination score of 7.80%.

THE STUDY
_________

A study has already been registered. Further data related to the previously
submitted reads should be submitted to the same study, so there is no need to
register a new one.


THE SAMPLE
__________

The first step is to register your binned sample. This will provide context as to
how your assembly was derived and will describe measures of its quality.

You should submit the sample through the programmatic route.

1. If you closed your terminal window after the first exercise, re-open it now,
   connect to the server, and return to the ENA directory:

    $ ssh student<??>@gdcsrv2.ethz.ch # replace ?? with your student number
    $ cd ~/ENA

2. Start by looking at the sample XML we have prepared for you in the file:

    $ cat binned_metagenome_sample.xml

3. Have a look at the taxonomy that is used to describe the sample.

   "uncultured Aeromonas sp." is 'uncultured' taxonomy used specifically for
   samples derived from an environmental source. In this case it describes a
   bacteria of genus Aeromonas.

4. Note the sample description field - It is good practice to describe your
   binned metagenome or MAG sample very clearly so it is clear it is not a
   physical, biological sample but in fact represents the bacteria from the
   environment that was sampled.

   It is also good practice to reference either the original raw reads or
   environmental sample from within the description to give your binned sample
   as much context as possible.

   Note the 'sample derived from' field - This is an attribute that points this
   sample back to the sample from which it was derived.

   In this case, the values have been filled in for you.

5. Next, look again at the submission XML

    $ cat 'submission.xml'

6. You now have the three essential components of a programmatic submission:
   - An account name and password
   - An object to submit in XML format
   - An instruction in XML format

7. Enter the submission command below, editing it as you go to include the
   USERNAME, PASSWORD and file names (TODO). Note that the '@' symbols are
   required.

    $ curl -u USERNAME:PASSWORD -F "SUBMISSION=@TODO.xml" -F "SAMPLE=@TODO.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/"


8. You will see an XML document printed to your terminal. In the 'RECEIPT' tag,
   find the 'success' entry, which will read true if your submission was a
   success.

9. Find the two sample accessions (SAMEAxxxxx and ERSxxxxxx). Be careful not
    to confuse the submission accession with these (ERAxxxxxx).

10. Also in the submission receipt, note the 'INFO' tags. One of these should
    indicate that your submission is a test, to be discarded in 24 hours.


THE BINNED METAGENOME
_____________________

The final step is to register the assembly data which was produced by binning
and assembling your raw batter reads. This will be done using the Webin-CLI
program.

1. Firstly, let's look at the FASTA file you'll be submitting. There is one FASTA
   file available to you:
   - binned_metagenome.fasta.gz
   Ensure you are in the correct directory:

    $ cd ~/ENA/


   Take a look at the contents of the file using the 'zcat' command:

    $ zcat binned_metagenome.fasta.gz | head


2. You may need to rerun the below command to make sure Webin-CLI is easily
   available to you:

    $ alias webin-cli='java -jar ~/ENA/webin-cli-2.1.0.jar'


   You can always check by running the command:

    $ webin-cli -help


3. Next, view the manifest file. This file performs three functions:
   - Indicates the names of files to submit
   - Describes experimental metadata
   - States the study and sample the data belong to
   Navigate to the directory containing the files and view the file now:

    $ cat binned_metagenome_manifest.txt


4. The manifest file is a list of tag-value pairs. Look carefully over the
   information entered here. Study and sample accessions have been provided
   but you would normally use your own accessions.

   If you are confident with Linux, feel free to edit the manifest to use the
   accessions you registered earlier.

5. Now you have everything you need to submit your sequence data, it's time to
   run the Webin-CLI command to check that everything is valid.
   Again, fill in the 3 incomplete sections:

   $ webin-cli -context genome -manifest TODO.txt -userName "Webin-TODO" -password TODO -test -validate


6. Finally, it's time to submit the sequence files, so complete the following
   command and run it:

   $ webin-cli -context genome -manifest TODO.txt -userName "Webin-TODO" -password TODO -test -submit


7. The program will notify you as it successfully validates your submission and
   uploads the fasta file to ENA. You will then be provided an accession for your
   analysis object (ERZxxxxxx). Note this down and your submission is complete.

8. Return to the interactive Webin interface and view the sample and analysis tabs
   to see the objects you submitted today:
   https://wwwdev.ebi.ac.uk/ena/submit/sra/#home