Code Monkey home page Code Monkey logo

edenmicrobes's Introduction

EdenMicrobes

The goal of EdenMicrobes is to document and track the progress of the datasets and analyses related to microbes at the Eden Project.

Outline

Using the enclosed controlled biomes of the Eden Project botanic garden as mesocosms, we aim to determine the effects of biotic (above ground plant) and abiotic (soil physicochemistry, microclimate) controls on soil microbial community (SMC) composition, after accounting for spatial scale effects.

  • Soil Sampling Methods Environmental DNA (eDNA) was sampled across the site, with collection, extraction and sequencing of eDNA from 128 soil samples in October 2019, from 10 different plant assemblage habitats. Within two temperature controlled "biomes", 128 soil samples were collected across 10 distinct "ecosystems" characterised by either high or low plant diversity (see table 1).

Table 1: Sample design

Biome Habitat Plant diversity Plots Samples Extractions PCRs
Mediterranean South Africa High 4 16 32 96
Mediterranean Australia High 4 16 32 96
Mediterranean Citrus Low 2 8 16 48
Mediterranean Vines Low 2 8 16 48
Mediterranean Med Basin High 4 16 32 96
Rainforest Bamboo Low 2 8 16 48
Rainforest Cocoa Low 2 8 16 48
Rainforest Malaysia High 4 16 32 96
Rainforest West Africa High 4 16 32 96
Rainforest Amazon High 4 16 32 96
Totals 10 2 32 128 256 768

Figure 1: Sampling INSERT IMAGE

https://github.com/padpadpadpad/EdenMicrobes/blob/main/plots/

At each sample plot, from each corner of a 2 x 2 m quadrat, four soil samples were collected using sterile soil augers. Any leaf litter was removed from the surface, before approximately 200g of soil was collected, typically representing 3 auger cores worth of material, from within the first 10 cm of the topsoil. Auger blades were “cleaned” by immersion in soil adjacent to the collection point prior to each sampling event. Blades were changed between sampling of the humid and dry biomes. The soil cores were sealed in sterile plastic bags and transported to the onsite laboratory for immediate processing.

  • Sequencing Methods

Two eDNA extractions were performed from a 15 g subsample of each of the 128 samples following methods developed by Taberlet et al. (2012b) and modified by Zinger et al. (2016). DNA extractions were conducted in the field lab less than 2 hours after collection using a NucleoSpin® Soil kit (Machery Nagel, Duren, Germany). Eight negative controls were included for a total of 256 DNA extractions. The last elution step of the DNA extraction protocol was not carried out on site, with DNA on the column instead stored with silica gel and transported back to the EDB lab in Toulouse for subsequent steps.

PCRs were performed in triplicate, meaning that each sample was extracted twice, and each extract was amplified 3 times, resulting in 6 replicates for each sample in total. Each PCR reaction was performed in a total volume of 20 μl and comprised 10 μl of AmpliTaq Gold Master Mix (Life Technologies, Carlsbad, CA, USA), 5.84 μl of Nuclease-Free Ambion Water (Thermo Fisher Scientific, Massachusetts, USA), 0.25 μM of each primer, 3.2 μg of BSA (Roche Diagnostic, Basel, Switzerland), and 2 μl DNA template that was 10-fold diluted to reduce PCR inhibition. PCR was conducted by targeting a range of barcode regions to sequence for Eukaryotes, Fungi and Bacteria under the following conditions :

  • the V7 region of the 18S rRNA gene as a diagnostic marker for Eukaryotes with the following universal primers : forward (5’-TCACAGACCTGTTATTGC-3’), and reverse (5’-TTTGTCTGCTTAATTSCG-3’) (Guardiola et al. 2015). PCR was conducted with 35 cycles, denaturation at 95°C for 30s, annealing at 45°C for 30s, elongation at 72°C for 60s, with the final elongation for 7 mins.

  • the V5-V6 regions of the 16S rRNA gene as a diagnostic marker for Bacteria with the following universal primers : forward (5’-GGATTAGATACCCTGGTAGT-3’), and reverse (5’-CACGACACGAGCTGACG-3’) (Fliegerova et al. 2014). PCR was conducted with 30 cycles, denaturation at 95°C for 30s, annealing at 57°C for 30s, elongation at 72°C for 90s, with the final elongation for 7 mins.

  • the ITS1 region of the nuclear ribosomal RNA genes as a diagnostic marker for Fungi with the following universal primers : forward (5’-CAAGAGATCCGTTGTTGAAAGTK-3’), and reverse (5’-GGAAGTAAAAGTCGTAACAAGG-3’) (Epp et al 2012; Taberlet et al 2018). PCR was conducted with 35 cycles, denaturation at 95°C for 30s, annealing at 55°C for 30s, elongation at 72°C for 60s, with the final elongation for 7 mins.

Table 2: PCR Amplification specifications

Sper01 Bact01 Fung02 Euka02
Taxa Plants (Spermatophyta) Bacteria Fungi Eukaryotes
Target region P6 loop of thechloroplastic trnL intron V3-V4 regions of the 16S rRNA gene ITS1 region of the nuclear ribosomal RNA genes V7 region of the 18S rRNA gene
Forward sequence GGGCAATCCTGAGCCAA GGATTAGATACCCTGGTAGT CAAGAGATCCGTTGTTGAAAGTK TCACAGACCTGTTATTGC
Reverse sequence CCATTGAGTCTCTGCACCTATC CACGACACGAGCTGACG GGAAGTAAAAGTCGTAACAAGG TTTGTCTGCTTAATTSCG
Reference Taberlet et al. 2007 Parada et al., 2016; Apprill et al., 2015) Epp et al 2012; Taberlet et al 2018 Guardiola et al. 2015
Thermocycling (number of cycles, denaturation, annealing, elongation, final elongation) [35, 95°C (30s), 50°C (30s), 72°C (60s) - 72°C (7 min)] [30, 95°C (30s), 57°C (30s), 72°C (90s) - 72°C (7 min)] [35, 95°C (30s), 55°C (30s), 72°C (60s) - 72°C (7 min)] [35, 95°C (30s), 45°C (30s), 72°C (60s) - 72°C (7 min)]
Sequencing technology HiSeq MiSeq MiSeq HiSeq
Sequence length (l = min, L = max) l=10, L=220 l=30 L=400 l=30, L=900 l=90, L=200
Taxonomic reference database & threshold EMBLr141 SILVAngs v1.3 UNITE SILVAngs v1.3

Three negative PCR controls per plate were amplified and sequenced in parallel with the regular samples. Three positive controls were also included and consisted of XX TO CONFIRM XX. Six wells per PCR plate were left empty (non-used tag combinations) to control for tag jumps which can occur during amplification and sequencing (see below for downstream data curation). All PCR products were pooled and the library was constructed using the Illumina TruSeq NanoPCRFree kit following the supplier’s instructions (Illumina Inc., San Diego, California, USA). Sequencing was performed on a Hiseq run (Illumina platform,San Diego, CA, USA) at the GeT-Plage platform (Toulouse, France).

  • Bioinformatics Methods Bioinformatic analyses were performed using the GenoToul bioinformatics platform, with sequence reads processed using the OBITOOLS package (Boyer et al. 2016), with initial compilation and filtering of contaminants using and R scripts (R Core Team, 2020) following the procedure described in Zinger et al. (2019). To consolidate ASVs, further processing was performed using the University of Exeter RStudio Server...tools from Phyloseq to merge files were deployed..., before DADA2 (..) was used to assign taxonomy from UNITE () and SILVA () datasets.

Bioinformatic analyses were performed on the GenoToul bioinformatics platform (Toulouse, France), with the OBITOOLS package (Boyer et al. 2016). PCR replicates were prepared and processed in 3 seperate libraries, with initial processing run for each of these seperately, prior to subsequent merging. First, ‘illuminapairedend’ was used to assemble paired-end reads. This algorithm is based on an exact alignment algorithm that considers the quality scores at all positions during the assembly process. Subsequently, we used the ‘ngsfilter’ command to identify and remove the primers and tags on each read, and assign reads to their respective samples. This program was used with its default parameters tolerating two mismatches for each of the two primers and no mismatch for the tags. Following this, sequencing reads were dereplicated using the ‘obiuniq’ command. Sequences of low quality (containing Ns or with paired-end alignment scores below 50) were excluded using the ‘obigrep’ command. The same command was used to exclude sequences represented by only one read (singletons) as they are more likely to be molecular artefacts (Taberlet et al. 2018). Sequences outside of the preset range were also discarded (90-200 in length for Eukaryotes; 30-400 for Bacteria; 30-900 for Fungi).

Datasets were subsequently filtered to remove contaminants as well as artefacts such as PCR chimeras and remaining sequencing errors, following Zinger et al. (2019) and using the metabaR R package (Zinger et al 2020), in R version 3.6.1 (R Development Core Team, 2013). The filtering process consisted of three steps: (i) a negative control-based filtering. ASVs whose maximum abundance was found in extraction/PCR negative controls were removed from the dataset, as they were likely to be reagent/aerosol contaminants, better amplified in the absence of competing DNA fragments as it is the case in biological samples. (ii) an abundance-based filtering. This procedure targets incorrect assignment of a few numbers of sequences corresponding to true ASVs occurring to the wrong sample, a phenomenon called “tag-switching” (Esling et al. 2015), “tag jumps” (Schnell et al. 2015) or “cross-talk” (Edgar 2018). It consists in setting ASVs abundances to 0 in samples where their abundance represents < 0.03% of the total OTU abundance in the entire dataset. (iii) Finally, we conducted a PCR-based filtering by considering any PCR reaction that yielded less than 1000 reads for fungi, bacteria and eukaryotes as non-functional, and removed them from the dataset. The number of reads, ASVs and PCRs removed at each stage for each marker are detailed in table 3.

Following initial curation, the three separate libraries were merged to create a single phyloseq object per marker which contained all of the ASVs x PCR reads. These were then assigned a taxonomy from the SILVA taxonomic database for Bacteria and Eukaryotes (version 1.3; release 132 Quast et al., 2012), and the UNITE data base () for fungi using the DADA2 pipeline ()....

To this, a further subset of curation was performed: (i) PCRs with a readcount below a set threshold XXXX were removed (ii) Using the DADA2 processing, with a bootstrapping score set at 50, ASVs were removed from the dataset if they could not be assigned to the Phylum level

Table 3: ASV number and total readcount evolution following bioinformatic curation and contaminant removal

paired ngs_filtered curated uniq no_singletons PCRs above threshold (>1000) Extraction Contaminent Removal PCR Contaminent Removal Sequencing Contaminent Removal
Stage ASV Reads ASV Reads ASV Reads ASV Reads ASV Reads ASV ASV ASV ASV
ITS1-Rep1 3205177 3205177 2255445 2255445 2243828 2243828 469368 2243828 76499 1850959 176 14 51 12
ITS1-Rep2 2754254 2754254 1932598 1932598 1919187 1919187 416437 1919187 63812 1566562 128 21 3 20
ITS1-Rep3 4054254 4054254 2800343 2800343 2786030 2786030 582040 2786030 94062 2298052 244 15 88 21
16S-Rep1 4539219 4539219 2697244 2697244 2682047 2682047 1504427 2682047 139331 1316951 237 287 71 244
16S-Rep2 4109968 4109968 2421281 2421281 2407686 2407686 1370039 2407686 125006 1162653 231 260 38 229
16S-Rep3 8175582 8175582 4813103 4813103 4786542 4786542 2575508 4786542 246502 2457536 244 618 109 470
18S-Rep1 16837584 16837584 14364998 14364998 14281333 14281333 677100 14281333 152495 13756728 250 67 37 270
18S-Rep2 12133280 12133280 10412433 10412433 10345662 10345662 503842 10345662 120786 9962606 249 61 20 172
18S-Rep3 9110898 9110898 7732534 7732534 7694307 7694307 408830 7694307 95583 7381060 244 65 19 150

Upon filtering completion, remaining PCRs per technical replicate were summed and the read count of technical replicates was normalised to reduce potential bias caused by PCR stochasticity and differential sequencing efforts. Standardization consisted in randomly resampling (with replacement) a number of reads that corresponded to the first quartile of the total read number for reads per samples. This returns samples with a read count equal across all samples, whilst maintaining sample specific OTU relative abundances. To do so, each OTU in each sample was resampled with replacement a thousand times, following an approach detailed by Veresoglou et al. (2019). Finally, in order to reduce stochastic variation of taxa from one soil core to another, and to match DNA sequencing data with the soil chemistry ones, the four replicate samples within each subplot were aggregated by summing reads (after normalisation).

  • Next Steps

The influence of plant community and abiotic conditions on local alpha-diversity, and regional beta-diversity of the SMC will then be assessed by conducting Principle Coordinate Ordination and PERMANOVA analysis. Structural Equation Models (SEMs) and variance partitioning will be used to explore the explanatory power of abiotic conditions, the identity of individual plants and plant community characteristics, whilst accounting for spatial autocorrelation.

It is hypothesised that abiotic conditions will explain the greatest proportion of community dissimilarity, with microclimate (soil temperature and humidity) having greater effects than soil chemistry. However, after accounting for abiotic variation plant and microbial diversity are likely to be positively correlated.

Scripts

Datasets

We have sequencing datasets of the biomes contained in data/sequencing and data from monitoring of abiotic variables present in climate_data.

  • Eden_GPS_data.csv contains the GPS coordinates of each site. A breakdown of the columns are below:

    • Sample - Sample name that corresponds to a sequencing file and a sampling point.

    • Ecosystem - What ecosystem did the sample come from.

    • Diversity - Whether the sample has high or low plant diversity.

    • Biome - Which biome did the sample come from (Humid = rainforest, Dry = mediterranean).

    • GPS - Coordinates corresponding to the point from which soil was sampled

    • X - longitude

    • Y - latitude

Sequencing data (in data/sequencing)

Climate data (in data/climate)

  • Eden_microclimare To characterise variation in soil temperature and moisture across both biomes, Aranet Substrate Sensors (QH21142) were deployed at each of the sample locations twice for a period of 1 week between 28th of April – 30th of June for the Rainforest Biome and October the 6th – 15th December for the Mediterranean biome. The three metal probe spikes of each logger were buried, with their tips at approx 5cm depth.These took a measure of temperature (degree C) and moisture (volumetric water content, VMC) every 5 minutes, before connecting to the Eden 5G cloud and storing data. Since only 14 probes were available, deployment was rotated across plots, with two probes deployed in one of the 32 plots at any one time. Upon data processing, the dataset was reduced in dimension by calculating an hourly average from the collected dataset.

  • Eden_soil_chem_data_nov23.csv contains nutrient and chemical composition data for each sample. Bulk soil samples were processed by NRM laboratory (Berkshire, UK) in 2019 to determine pH, available Phosphorus, available Potassium, available Magnesium, total Nitrogen, total Carbon, total Phosphorus, Manganese, Iron, and the relative percentage of Sand, Silt and Clay. Further soil analysis conducted in 2021, was used to characterise sample site soil respiration, soil moisture, soil organic matter content, and total root biomass. For further information see Duley et al. (2023) and supplementary materials here (https://onlinelibrary.wiley.com/doi/abs/10.1111/rec.13831). A breakdown of all the column names and their units is below:

    • Sample - Sample name that corresponds to a sequencing file and a sampling point.

    • Ecosystem - What ecosystem did the sample come from.

    • Diversity - Whether the sample has high or low plant diversity.

    • Biome - Which biome did the sample come from (Humid = rainforest, Dry = mediterranean).

    • Soil_OM / GWC - Performed on samples aggregated at the quadrat level. Loss on ignition was used as a proxy for soil moisture and organic matter content (Heiri et al. 2001), following methods developed by Jensen et al. (2018) and Joy et al. (2021). Approximately 40g of soil from each pooled sample was placed in a foil tray and dried in an oven at 105°C for 24 hours, these were reweighed to determine gravimetric moisture content (GWC). From this, subsamples of 5g were placed in crucibles and transferred to a muffle furnace at 600°C for 4 hours. Samples were cooled by being left in the furnace overnight after being turned off, before being reweighed.

    • Root_biomass - Performed on samples aggregated at the quadrat level. Roots were extracted from the remainder of pooled soil samples using methods developed by Frasier et al. (2016). To separate the roots from the soil the samples were washed through a submerged 250 μm sieve with running tap water, and larger soil aggregates were broken down by hand. Roots were then collected from the surface of the sieve using tweezers, placed in foil trays, weighed and oven dried at 60°C for 12 hours, then reweighed to give a representative root biomass for each plot.

    • pH - Performed on samples aggregated at the quadrat level. Measured in water (15 g fresh weight soil suspended in 20ml of deinonised water) using a Jensen desktop probe, calibrated with standard buffer solutions of pH 7 and pH 4.

    • Soil_respiration - Performed on samples aggregated at the quadrat level. Measured using a TARGAS-1 CO2/H2O infrared gas analyser with soil respiration chamber (PP systems 2016).

Here and below, methods conducted by NRM Cawood Laboratories following their set methodology, with samples aggregated at the habitat level.

  • pH2 - Measured in water (15 g fresh weight soil suspended in 20ml of deinonised water) using a Jensen desktop probe, calibrated with standard buffer solutions of pH 7 and pH 4.

  • Phosphorus -- Phosphorus - Sodium Bicarbonate Extractable - Olsens - reported as mg/l dry basis.

  • Potassium - Ammonium Nitrate Extractable - reported as mg/l dry basis.

  • Magnesium - Ammonium Nitrate Extractable - reported as mg/l dry basis.

  • Sand - Textural Classification Reported as % w/w dry matter basis.

  • Silt - Textural Classification Reported as % w/w dry matter basis.

  • Clay - Textural Classification Reported as % w/w dry matter basis.

  • Nitrogen - Dumas - reported as % w/w dry basis.

  • Manganese - DTPA Extractable Reported as mg/l dry basis.

  • Iron - DTPA Extractable Reported as mg/l dry basis.

  • Phosphorus2 - Reported as mg/kg or % w/wdry basis.

  • Carbon - Dumas - reported as % w/w dry basis.

  • C:N - Dumas - reported as % w/w dry basis.

Contact

This project is primarily a collaboration between Daniel Padfield ([email protected]) at the University of Exeter and Julian Donald (formerly of the Eden Project).

edenmicrobes's People

Contributors

juliandonald avatar padpadpadpad avatar

Stargazers

 avatar

Watchers

 avatar  avatar

edenmicrobes's Issues

Add datasets

Datasets to add

Climate data:

  • raw data from probes with metadata

Sequencing data:

  • processed data from sequencing Julian's pipeline data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.