Data Sets

These are the data sets needed during the workshop.

Download Files

You need R functions to download files from the internet. You can use either download.file from the package utils or function curl from the package curl. You only need one and the choice is yours.

## Your current (working) directory
getwd()
## Change working directory (if needed)
# setwd("my/folder")
## Create new working directory (if needed)
# dir.create("MDA"); setwd("MDA")

An Apple A Day

Wassermann, Müller and Berg (2019) An Apple a Day: Which Bacteria Do We Eat With Organic and Conventional Apples? Front. Microbiol. 10:1629.

## Create new working directory 
dir.create("Apple"); setwd("Apple")

## Download data (R data image file)
apple.url <- "https://www.gdc-docs.ethz.ch/MDA/data/Wassermann_2019.RData"
utils::download.file(apple.url, destfile = "Wassermann_2019.RData")

## Alternaitve download (curl has more and better options) 
#  curl:curl_download(apple.url, "Wassermann_2019.RData")

## Verify loaded data
load("Wassermann_2019.RData")
apple <- d
apple
# phyloseq-class experiment-level object
# otu_table()   OTU Table:         [ 3128 taxa and 48 samples ]
# sample_data() Sample Data:       [ 48 samples by 7 sample variables ]
# tax_table()   Taxonomy Table:    [ 3128 taxa by 7 taxonomic ranks ]
# phy_tree()    Phylogenetic Tree: [ 3128 tips and 3127 internal nodes ]
setwd("../")

Food-Specific Bacterial Communities

Chaillou et al. (2015) Origin and ecological selection of core and food-specific bacterial communities associated with meat and seafood spoilage. ISME 9:1105-1118

## Create new working directory
dir.create("Food"); setwd("Food")

## Download data (R data image file)
food.url <- "https://www.gdc-docs.ethz.ch/MDA/data/Chaillou2015.zip"
utils::download.file(food.url, destfile = "Chaillou2015.zip")
unzip("Chaillou2015.zip")
file.remove("Chaillou2015.zip")
list.files()
# "chaillou.biom"
# "Chaillou2015b.pdf"
# "otu_table.tsv"
# "sample_data.tsv"
# "sequences.fasta"
# "tax_table.tsv"    
# "tree.nwk"  
setwd("../")

Let us see, if we have what we need.

## Function twee - a plain text listing of directories
## similar to tree in linux
dir.create("Scripts"); setwd("Scripts")
twee.url <- "https://www.gdc-docs.ethz.ch/MDA/scripts/twee.R"
utils::download.file(twee.url, destfile = "twee.R")
source("twee.R")
setwd("../")

## Show diretories
twee("MDA/")

If everything worked, you should see the following directories and files:

MDA/
├── Apple
│   └── Wassermann_2019.RData
└── Food
    ├── Chaillou2015b.pdf
    ├── Chaillou2015.zip
    ├── chaillou.biom
    ├── otu_table.tsv
    ├── sample_data.tsv
    ├── sequences.fasta
    ├── tax_table.tsv
    └── tree.nwk

Load and Adjust Food Data

## Import from biom
biomfile <- "chaillou.biom"
treefile <- "tree.nwk"
food     <- import_biom(biomfile, treefile, parseFunction = parse_taxonomy_greengenes)

## Verify Import
food
# otu_table()   OTU Table:         [ 508 taxa and 64 samples ]
# sample_data() Sample Data:       [ 64 samples by 3 sample variables ]
# tax_table()   Taxonomy Table:    [ 508 taxa by 7 taxonomic ranks ]
# phy_tree()    Phylogenetic Tree: [ 508 tips and 507 internal nodes ]

## Summary
microbiome::summarize_phyloseq(food)

## Variables
sample_data(food)

## French to English
dictionary = c("BoeufHache"      = "Ground_Beef", 
               "VeauHache"       = "Ground_Veal", 
               "MerguezVolaille" = "Poultry_Sausage", 
               "DesLardons"      = "Bacon_Dice", 
               "SaumonFume"      = "Smoked_Salmon", 
               "FiletSaumon"     = "Salmon_Fillet", 
               "FiletCabillaud"  = "Cod_Fillet", 
               "Crevette"        = "Shrimp")
env_type <- sample_data(food)$EnvType
sample_data(food)$EnvType <- factor(dictionary[env_type], levels = dictionary)

## Add Sample ID
sample_data(food)$SID <- sample_names(food)

## Save / Load
# save.image("food.Rdata")
# load("food.Rdata")