Code Monkey home page Code Monkey logo

differential-abundance-theory's Introduction

Manuscript on taxonomic bias and differential-abundance analysis

DOI

This repository contains an in-progress manuscript on the effects of taxonomic bias in microbiome measurements on microbial differential-abundance analysis. The manuscript is structured as a bookdown article. The latest rendered version can be viewed here. Data analyses and simulations supporting this work can be seen here. This repository and the rendered manuscript are licensed under a CC BY 4.0 License. See the Zenodo record for how to cite the latest version.

differential-abundance-theory's People

Contributors

adw96 avatar mikemc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

differential-abundance-theory's Issues

Create figure illustrating the fundamental problem

Normalization imposes competition, which causes the abundance of a focal taxon to depend on the relative abundances of all taxa. Perhaps have a panel that illustrates the experimental workflow, showing different types of normalization that can occur (saturating extraction yield; deliberate library normalization; computational normalization to proportions). Then have another panel (or two) showing that the error in the proportion (or abundance?) of a particular taxon depends on the composition of the rest of the sample, similar to Figure 2 of https://elifesciences.org/articles/46923

Add derivation of relationship between diversity and variance in mean efficiency under IID assumption

Some relevant notes below; I may also have the full derivation from GoodNotes typed up elsewhere already.

Notes from 2020-12-16 Wednesday

Intuition for why bias might be less problematic in diverse ecosystems

To get some intuition as to why bias might be less problematic in diverse ecosystems, consider assembling a community by adding random species, whose efficiencies are independently chosen from a distribution with mean $\mu$ and variance $\sigma^2$. The variance in the mean efficiency in the community after adding $I$ species, conditional on the community proportions $\tilde A_i$, is $\sigma^2 \sum_{i=1}^I \tilde A_i^2 = \sigma^2 / ^2D$, where $^2D = 1/ \tilde A_i^2$ is diversity of order 2, also known as the Inverse Simpson index. Thus, if species are added in such a way that the Inverse Simpson index increases, and the efficiencies are IID, then the sample mean efficiency will tend to $\mu$.

See GoodNotes file for the math. Something to keep in mind is that this result is for the variance, but the geometric variance is most relevant. (Though it should remain true that the geometric variance tends to 1 if the variance tends to 0.)

Synthesize references and evidence on the use of host- and diet-derived reads as natural constant references

Theoretical points

  • Relevant theory for how might be used for differential absolute abundance analysis in appendix Differential absolute abundance
  • Can be thought of as a natural spike-in, or a reference taxon with assumed fixed abundance.
  • Can be used for proportion- or ratio-based AA inference.
  • Current studies seem to use in proportion mode - use the ratio (bacterial reads) / (host + diet reads) to infer bacterial biomass; if AA of individual microbes is needed, multiply this biomass estimate by MGS proportions.
  • Use in ratio-mode is more robust to bias under the MWC model for the purposes of inferring fold changes across samples; however, it may be important not to aggregate reads from different taxa (e.g. host and different plants) for this method to remain robust to bias; see appendix section Multiple reference taxa.

Key empirical refs

Human or mouse gut

Plants

Other

Other notes and links

Twitter discussions

Fix theorem rendering

make pdf now fails with below error message. Issue may be related updating my to R 4.1.0

> bookdown::render_book('.', 'bookdown::pdf_book', quiet = TRUE)
Rendering book in directory '.'
! Extra \fi.
l.1084 ...-102-105-99-105-101-110-116-115-93-\}\fi
                                                  {} 

Error: LaTeX failed to compile _main.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See _main.log for more info.
Execution halted
make: *** [Makefile:5: pdf] Error 1

DAA Critique Discussion

Hey Mike,

This popped up on my GitHub feed. It's so cool to see a manuscript being written in the open! I am a big fan of your "Consistent and Correctable Bias" paper from a couple years back, and appreciate your commitment to truly open science.

Like you, I've come to doubt the utility of differential abundance analysis. We recently wrote up our perspective that challenges DAA, and offer some possible (ratio-based) alternatives. I wanted to link you to it because I think there may be some synergy between our perspectives, and it could be fun to start a discussion.

Settle notation regarding spike-ins, targeted measurements, and reference taxa

Currently the Models appendix defines separate notation for the abundance of spike-in taxa and of taxa measured by target measurement methods. Both approaches can be seen as special cases of having estimated abundances for a set of reference taxa. For this reason the later sections analyze spike-ins using the targeted-measurement results, and thus make no real use of the notation $S$ for spike-in abundances. I should consider dropping this necessary notation, and perhaps just using just the $T$ notation for both. In this case the text needs to be updated to make clearer this fundamental unity early on and establish that all "targeted" results also apply to spike-ins (and so perhaps should really be considered "reference taxa" results).

other thoughts

  • Certain computational normalization approaches also fit into this category.
  • The Zemb2020 spike-in + qPCR method (#6) is fundamentally different; it uses a spike-in to improve the bulk abundance estimate and then do the method I describe for bulk-abundance estimation, throwing a wrinkle into my categorization scheme.

Different flavors of CoDa regression

Hi Mike,

This is a great resource/reading material. Thanks for making it public!

I was wondering, I have you looked into linear models on clr transformed relative abundances vs multinomial regression suggested by Morton et al. (2019)?

I think the former is becoming common place as it's more straightforward to run. But the latter has the benefit that ranks of coefficients are identical on both relative and absolute data. I use Justin Silverman's fido package to run it, but it would be useful if there also were a frequentist way of running it.

Thanks
Johannes

Synthesize spike-in approach of Zemb et al 2020

Zemb O, Achard CS, Hamelin J, De Almeida M, Gabinaud B, Cauquil L, Verschuren LMG, Godon J. 2020. Absolute quantitation of microbes using 16S rRNA gene metabarcoding: A rapid normalization of relative abundances by quantitative PCR targeting a 16S rRNA gene spike‐in standard. Microbiologyopen 9:1–21. doi:10.1002/mbo3.977 https://onlinelibrary.wiley.com/doi/abs/10.1002/mbo3.977

At least two interesting things to consider in this study

  1. they apply the synthetic DNA standard prior to DNA extraction
  2. They use qPCR of the synthetic standard and use this info somehow

The earlier experiment of Tkacz et al (2018) is also relevant

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.