Code Monkey home page Code Monkey logo

p2c2m2's Introduction

P2C2M2

P2C2M2 provides functions to read default output from BEAST 2 (Bouckaert et al. 2019) and StarBEAST2 (Ogilvie et al. 2017) and conduct posterior predictive checks of coalescent models (Reid et al. 2014) based on the functions originally provided in the P2C2M package (Gruenstaeudl et al. 2016) to parse tree and xml files from BEAST and *BEAST. It also implements estimation of type-I (false-positive) error rates using pseudo-observed datasets (“pods”) sampled from the posterior predictive distribution.

Installation

You can install the development version of P2C2M2 from GitHub with:

# install.packages("devtools")
devtools::install_github("arleyc/P2C2M2")

Input files

In order to generate the tree files annotated with the required branch rate estimates, the XML file has to be modified by hand to include the following code before running BEAST 2:

#For strict clocks:

Insert the following:
substitutions=“true” branchratemodel=“@StrictClock.c:Locus_Name”
after id=“TreeWithMetaDataLogger.t:Locus_Name” and before spec=“beast.base.evolution.TreeWithMetaDataLogger”

#For UCLN or RLC clocks:

Insert the following:
substitutions=“true”
after id=“TreeWithMetaDataLogger.t:Locus_Name” and before spec=“beast.base.evolution.TreeWithMetaDataLogger”

Example

Basic example showing how to run the full pipeline with a single function using the provided dataset to test the multispecies coalescent model and to estimate false positive error rates per locus and across loci:

library(P2C2M2)

# The absolute path to the input directory is set
inPath <- system.file("extdata", "thomomys/", package="P2C2M2")

# The name of the xml-file generated by BEAUTi and located in 
# "inPath" is set
inFile <- "thomomys.xml"

# Posterior predictive simulations with a setting of 2 simulation 
# replicates are preformed
suppressWarnings(thomomys <- p2c2m2.complete(inPath, inFile, num.reps=2, error.rate = TRUE))
#> 
#> bindings, loci, and alignments read
#> 
#> locus 26 is done
#> 
#> locus 29 is done
#> 
#> locus 47 is done
#> 
#> locus 53 is done
#> 
#> locus 59 is done
#> 
#> locus 64 is done
#> 
#> locus 72 is done
#> 
#> gene trees done
#> 
#> species trees done
#> 
#> Calculating metrics for 3 trees of the posterior distribution
#> 
#> Calculating metrics for the trees of the post. predictive distribution
#> 
#> Calculating differences btw. the posterior and the post. predictive distribution
#> 
#> Calculating differences btw. 1000 pods and the post. predictive distribution to estimate error rates

thomomys$results$alpha0.01
#> $perGene
#>    LCWT[2]                NDC[2]               
#> 26 "-14.27 (±37.45) n.s." "-2 (±7.21) n.s."    
#> 29 "-20.41 (±38.11) n.s." "-2.33 (±10.41) n.s."
#> 47 "9.85 (±2.45) *"       "2.33 (±4.16) n.s."  
#> 53 "-20.49 (±2.29) *"     "5.33 (±2.08) *"     
#> 59 "-3.14 (±7.53) n.s."   "-2.33 (±3.06) n.s." 
#> 64 "-22.06 (±7.76) *"     "10 (±6) *"          
#> 72 "10.64 (±28.26) n.s."  "-8.67 (±7.23) *"    
#> 
#> $acrGenes
#>        LCWT[2]                NDC[2]               
#> Sum    "-43.29 (±93.14) n.s." "-0.57 (±20.15) n.s."
#> Mean   "-6.18 (±13.31) n.s."  "-0.08 (±2.88) n.s." 
#> Median "-14.21 (±18.06) n.s." "0.86 (±5.52) n.s."  
#> Mode   "8.04 (±26.61) n.s."   "-2.57 (±11.84) n.s."
#> CV     "0.8 (±2.36) n.s."     "-5.77 (±6.53) n.s." 
#> 
#> $perGene.error
#>    LCWT[2] NDC[2]
#> 26   0.657  0.678
#> 29   0.689  0.678
#> 47   0.689  0.665
#> 53   0.689  0.657
#> 59   0.654  0.678
#> 64   0.654  0.665
#> 72   0.689  0.678
#> 
#> $acrGenes.error
#>        LCWT[2] NDC[2]
#> Sum      0.657  0.665
#> Mean     0.657  0.665
#> Median   0.000  0.000
#> Mode     0.000  0.000
#> CV       0.657  0.665

thomomys$results$legend
#> [1] "Differences between the posterior and the posterior predictive distributions per locus and across loci. Each cell contains the following information in said order: mean, standard deviation, significance level. Error rates (if estimated with option error.rate=TRUE) are based on differences between the pods and the posterior predictive distributions. Codes in square brackets indicate the number of tails. Alpha values are automatically adjusted for the number of tails."

References

Bouckaert, R., Vaughan, T.G., Barido-Sottani, J., Duchêne, S., Fourment, M., Gavryushkina, A., Heled, J., Jones, G., Kühnert, D., De Maio, N., Matschiner, M., Mendes, F.K., Müller, N.F., Ogilvie, H.A., du Plessis, L., Popinga, A., Rambaut, A., Rasmussen, D., Siveroni, I., Suchard, M.A., Wu, C.H., Xie, D., Zhang, C., Stadler, T., Drummond, A.J. (2019) BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 15(4), e1006650.

Gruenstaeudl, M., Reid, N.M., Wheeler, G.R., Carstens, B.C. (2016) Posterior predictive checks of coalescent models: P2C2M, an R package. Molecular Ecology Resources, 6, 193–205.

Ogilvie, H.A., Bouckaert, R.R., Drummond, A.J. (2017) StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Molecular Biology and Evolution, 34(8), 2101-–2114.

Reid, N.M., Brown, J.M., Satler, J.D., Pelletier, T.A., McVay, J.D., Hird, S.M., Carstens, B.C. (2014) Poor fit to the multi-species coalescent model is widely detectable in empirical data. Systematic Biology, 63, 322–333.

Citation

Villamil, J., Morando, M., Avila, L.J., Lanna, F.M., Fonseca, E.M., Sites, Jr., J.W., Camargo, A. Revisiting the problem of Multispecies Coalescent Model fit with an example from a complete molecular phylogeny of the Liolaemus wiegmannii species group (Squamata: Liolaemidae), in review.

p2c2m2's People

Contributors

arleyc avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.