P2C2M2

P2C2M2 provides functions to read default output from BEAST 2 (Bouckaert et al. 2019) and StarBEAST2 (Ogilvie et al. 2017) and conduct posterior predictive checks of coalescent models (Reid et al. 2014) based on the functions originally provided in the P2C2M package (Gruenstaeudl et al. 2016) to parse tree and xml files from BEAST and *BEAST. It also implements estimation of type-I (false-positive) error rates using pseudo-observed datasets (“pods”) sampled from the posterior predictive distribution.

Installation

You can install the development version of P2C2M2 from GitHub with:

# install.packages("devtools")
devtools::install_github("arleyc/P2C2M2")

Input files

In order to generate the tree files annotated with the required branch rate estimates, the XML file has to be modified by hand to include the following code before running BEAST 2:

#For strict clocks:

Insert the following:
substitutions=“true” branchratemodel=“@StrictClock.c:Locus_Name”
after id=“TreeWithMetaDataLogger.t:Locus_Name” and before spec=“beast.base.evolution.TreeWithMetaDataLogger”

#For UCLN or RLC clocks:

Insert the following:
substitutions=“true”
after id=“TreeWithMetaDataLogger.t:Locus_Name” and before spec=“beast.base.evolution.TreeWithMetaDataLogger”

Example

Basic example showing how to run the full pipeline with a single function using the provided dataset to test the multispecies coalescent model and to estimate false positive error rates per locus and across loci:

library(P2C2M2)

# The absolute path to the input directory is set
inPath <- system.file("extdata", "thomomys/", package="P2C2M2")

# The name of the xml-file generated by BEAUTi and located in 
# "inPath" is set
inFile <- "thomomys.xml"

# Posterior predictive simulations with a setting of 2 simulation 
# replicates are preformed
suppressWarnings(thomomys <- p2c2m2.complete(inPath, inFile, num.reps=2, error.rate = TRUE))
#> 
#> bindings, loci, and alignments read
#> 
#> locus 26 is done
#> 
#> locus 29 is done
#> 
#> locus 47 is done
#> 
#> locus 53 is done
#> 
#> locus 59 is done
#> 
#> locus 64 is done
#> 
#> locus 72 is done
#> 
#> gene trees done
#> 
#> species trees done
#> 
#> Calculating metrics for 3 trees of the posterior distribution
#> 
#> Calculating metrics for the trees of the post. predictive distribution
#> 
#> Calculating differences btw. the posterior and the post. predictive distribution
#> 
#> Calculating differences btw. 1000 pods and the post. predictive distribution to estimate error rates

thomomys$results$alpha0.01
#> $perGene
#>    LCWT[2]                NDC[2]               
#> 26 "-14.27 (±37.45) n.s." "-2 (±7.21) n.s."    
#> 29 "-20.41 (±38.11) n.s." "-2.33 (±10.41) n.s."
#> 47 "9.85 (±2.45) *"       "2.33 (±4.16) n.s."  
#> 53 "-20.49 (±2.29) *"     "5.33 (±2.08) *"     
#> 59 "-3.14 (±7.53) n.s."   "-2.33 (±3.06) n.s." 
#> 64 "-22.06 (±7.76) *"     "10 (±6) *"          
#> 72 "10.64 (±28.26) n.s."  "-8.67 (±7.23) *"    
#> 
#> $acrGenes
#>        LCWT[2]                NDC[2]               
#> Sum    "-43.29 (±93.14) n.s." "-0.57 (±20.15) n.s."
#> Mean   "-6.18 (±13.31) n.s."  "-0.08 (±2.88) n.s." 
#> Median "-14.21 (±18.06) n.s." "0.86 (±5.52) n.s."  
#> Mode   "8.04 (±26.61) n.s."   "-2.57 (±11.84) n.s."
#> CV     "0.8 (±2.36) n.s."     "-5.77 (±6.53) n.s." 
#> 
#> $perGene.error
#>    LCWT[2] NDC[2]
#> 26   0.657  0.678
#> 29   0.689  0.678
#> 47   0.689  0.665
#> 53   0.689  0.657
#> 59   0.654  0.678
#> 64   0.654  0.665
#> 72   0.689  0.678
#> 
#> $acrGenes.error
#>        LCWT[2] NDC[2]
#> Sum      0.657  0.665
#> Mean     0.657  0.665
#> Median   0.000  0.000
#> Mode     0.000  0.000
#> CV       0.657  0.665

thomomys$results$legend
#> [1] "Differences between the posterior and the posterior predictive distributions per locus and across loci. Each cell contains the following information in said order: mean, standard deviation, significance level. Error rates (if estimated with option error.rate=TRUE) are based on differences between the pods and the posterior predictive distributions. Codes in square brackets indicate the number of tails. Alpha values are automatically adjusted for the number of tails."

References

Bouckaert, R., Vaughan, T.G., Barido-Sottani, J., Duchêne, S., Fourment, M., Gavryushkina, A., Heled, J., Jones, G., Kühnert, D., De Maio, N., Matschiner, M., Mendes, F.K., Müller, N.F., Ogilvie, H.A., du Plessis, L., Popinga, A., Rambaut, A., Rasmussen, D., Siveroni, I., Suchard, M.A., Wu, C.H., Xie, D., Zhang, C., Stadler, T., Drummond, A.J. (2019) BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 15(4), e1006650.

Gruenstaeudl, M., Reid, N.M., Wheeler, G.R., Carstens, B.C. (2016) Posterior predictive checks of coalescent models: P2C2M, an R package. Molecular Ecology Resources, 6, 193–205.

Ogilvie, H.A., Bouckaert, R.R., Drummond, A.J. (2017) StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Molecular Biology and Evolution, 34(8), 2101-–2114.

Reid, N.M., Brown, J.M., Satler, J.D., Pelletier, T.A., McVay, J.D., Hird, S.M., Carstens, B.C. (2014) Poor fit to the multi-species coalescent model is widely detectable in empirical data. Systematic Biology, 63, 322–333.

Citation

Villamil, J., Morando, M., Avila, L.J., Lanna, F.M., Fonseca, E.M., Sites, Jr., J.W., Camargo, A. Revisiting the problem of Multispecies Coalescent Model fit with an example from a complete molecular phylogeny of the Liolaemus wiegmannii species group (Squamata: Liolaemidae), in review.

arleyc / p2c2m2 Goto Github PK

p2c2m2's Introduction

P2C2M2

Installation

Input files

Example

References

Citation

p2c2m2's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent