skembel / picante Goto Github PK
View Code? Open in Web Editor NEWR tools for integrating phylogenies and ecology
R tools for integrating phylogenies and ecology
Hi there
Excellent package!
Its running nicely on my data with the exception of the phylogenetic signal!
When I am using the phylocom data to identify traits following the phylogenetic signal it runs fine - but when I am usingmy data I get the following error:
Error in Initialize.corSymm(X[[i]], ...) :
initial values for "corSymm" must be between -1 and 1
Which I do not get because I think I am doing everything right; my tree is rooted, with tip labels and everything
traits are in a dataframe and species are in the same order as the tree...
Thanks in advance for helping me figure out whats wrong :)
function match.phylo.data fails if the supplied data is a data.frame with a single column
Kcalc and phylosignal fail if the supplied data are a data.frame and not a vector
I don't know if I understand pd()
correctly, but I think I should be able to calculate PD with ultramteric tree. However, when I calculated PD using pd()
function of picante
using my ultrametric tree that I made using upgma()
function from phangorn
, I got the following error message:
Error in ages[n] <- anc.age + phy$edge.length[n] : replacement has length zero
Based on the error message, the problem might be on the internal function node.age
. If I understood the function correctly, it tries to return the age of the nodes based on its branch length, but when it tries to identify the root node using rootN = phy$edge[1, 1]
, I found that my ultrametric tree's root node is not in phy$edge[1, 1]
. Instead, the root node is on the last row of the first column in phy$edge
(maybe something like phy$edge[nrow(phy$edge),1]
). I don't know how to fix this as I am not confident I identify how the function works correctly.
When I tried to calculate the same sample with the same ultrametric tree but do not include root (include.root=FALSE
), the pd()
function works, but it returns the same value of PD that I got when I use pd()
with the same sample but with tree made using neighbor-joining method.
Hi @skembel!
I have had a few requests for this over the years as it was in Vellend et al. 2011 but functionality never made it into picante
for reasons that have been lost to history.
Anyway, recently found the code to do it, just have to discuss how to merge it in?
See pull request #5 - add option to include/exclude intra-specific distances when calculating MPD
Hello
Could you please advise if rarefaction affects the PD (Phylogentic diversity) or not? have you come across that? why yes or no?
Thanks
When using phyEstimateDisc
I am getting the warning: "In xtfrm.data.frame(x) : cannot xtfrm data frames" and the predicted values are the same for all species --- estimated.state.support = 0.5
.
This is happening with the following code with picante version 1.8.2 and R version 4.2.0:
predicted_values = picante::phyEstimateDisc(trait = data,phy=tree)
where tree is a phylo object and data contains the trait information. I have tried two formats of data
(data frame, Named int) and get the same issue every time. When I substitute phyEstimateDisc
for phyEstimate
in the above code the function works without any errors and I get an output that seems reasonable.
Thanks for the great package! I discovered this while trying to write tests for a function that uses randomizeMatrix()
.
5 x 5 seems to be the minimum size for the community when null.model = "independentswap"
.
This works:
library(picante)
data(phylocom)
randomizeMatrix(phylocom$sample[1:5,1:5], null.model="independentswap")
But this doesn't:
library(picante)
data(phylocom)
randomizeMatrix(phylocom$sample[1:4,1:5], null.model="independentswap")
It just hangs without finishing or generating an error message (I have to manually force R to quit).
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] picante_1.8.2 nlme_3.1-152 vegan_2.5-7 lattice_0.20-44 permute_0.9-5 ape_5.5
loaded via a namespace (and not attached):
[1] MASS_7.3-54 compiler_4.1.0 Matrix_1.3-4 parallel_4.1.0 tools_4.1.0 mgcv_1.8-36
[7] Rcpp_1.0.7 splines_4.1.0 grid_4.1.0 cluster_2.1.2
I've been working my way through the picante vignette, and found some lines of code which are too long to fit on the PDF page.
Specifically, Lines 100, 145, and 147 could be wrapped.
The warning from line 178 also has this issue, but that's probably less important to include.
Thanks for continuing to support this software package! My colleagues have used it to great success, and I hope to do the same.
Edit: added another line.
Are you accepting pull requests? This seems like a pretty simple fix, but I'm not sure if you are looking for PRs right now.
Hi everyone,
I write to report a bug in the ses.pd function. As you may know, the arguments of ses.pd are passed through the function pd, which means that one can specify whether to include the distance from the most recent common ancestor of the species and the root of the tree (MRCA – root distance). The default setting (in the pd function) is "include.root = TRUE". However, I have noted that ses.pd computes null PD values without including the MRCA – root distance regardless of the logical value that is specified in the include.root argument. This is, if include.root = TRUE (default = TRUE), the observed PD will include the MRCA – root distance, but the null PD values will not. For example:
require(picante)
data(phylocom)
phylocom$sample -> Sample
phylocom$phylo -> Tree
prune.sample(Sample,Tree)->Tree_F
Sample <- Sample[,Tree_F$tip.label]
set.seed(12345) # fix the seed to make the analyses reproducible
ses.pd(Sample,Tree_F,null.model="taxa.labels",runs=999,include.root=FALSE) ->
No_root_1 # the MRCA – root distance should be excluded from all the computations
set.seed(12345) # back to the same seed
ses.pd(Sample,Tree_F,null.model="taxa.labels",runs=999,include.root=TRUE) ->
root_1 # the MRCA – root distance should be included in all the computations
No_root_1
root_1
A quick look to the results reveals that as expected, the observed PD values between "No_root_1" and "root_1" are different for communities "clump1" and "clump2a" (the only two communities where the root of the tree is not traversed by the minimum spanning path connecting the species in the phylogeny), because in "No_root_1" the include.root argument was set to FALSE. HOWEVER, the descriptors of the null distributions in "No_root_1" and "root_1" are identical, and this is because the null PD values are computed without including the MRCA – root distance in all cases regardless of the logical value that is specified in the include.root argument.
The good news are that this anomaly only significantly affects communities with low species richness (less than four species). For more details, you may take a look to the following link, which includes a preprint with detailed information on the issue (a step by step tutorial) along with a simulation exercise (which informs about the communities that may be affected) and two alternatives to fix the problem (all the R code is fully available in the Supplementary Material of the preprint): https://www.biorxiv.org/content/10.1101/579300v1
Rafael Molina Venegas
Hi Steve,
I have a very basic question about the phylogeny. After reviewing some literature, I've seen four types of trees that people uses as their phylogeny input files:
Do you have a recommendation about the phylogeny approach?
I'm working with a genus of ectomycorrhizal fungi, the diversity of this genus is ~150 species. In my study, I found 24 OTUS across all sampled sites.
Unless I'm missing something here, the trialswap
and independentswap
algorithms in randomizeMatrix()
don't seem to "maintain species occurrence frequency and sample species richness" as described in the docs (or original publications Gotelli 2000 and Miklos & Podani 2004, for that matter). In either case, only species occurrence frequency (total of each column) is preserved, not sample species richness (total of each row).
library(picante)
#> Loading required package: ape
#> Loading required package: vegan
#> Loading required package: permute
#> Loading required package: lattice
#> This is vegan 2.5-7
#> Loading required package: nlme
data(phylocom)
comm <- phylocom$sample
set.seed(12345)
# Randomize community data matrix abundances within species (maintains species occurrence frequency)
comm_rand_f <- randomizeMatrix(comm, null.model = "frequency")
isTRUE(all.equal(rowSums(comm_rand_f), rowSums(comm))) # total abundance per site: expect different
#> [1] FALSE
isTRUE(all.equal(colSums(comm_rand_f), colSums(comm))) # total individuals per species: expect same
#> [1] TRUE
# Randomize community data matrix abundances within samples (maintains sample species richness)
comm_rand_r <- randomizeMatrix(comm, null.model = "richness")
isTRUE(all.equal(rowSums(comm_rand_r), rowSums(comm))) # total abundance per site: expect same
#> [1] TRUE
isTRUE(all.equal(colSums(comm_rand_r), colSums(comm))) # total individuals per species: expect different
#> [1] FALSE
# Randomize community data matrix with the trial-swap algorithm (Miklos & Podani 2004) maintaining species occurrence frequency and sample species richness
comm_rand_t <- randomizeMatrix(comm, null.model = "trialswap", iterations = 1000)
isTRUE(all.equal(rowSums(comm_rand_t), rowSums(comm))) # total abundance per site: expect same
#> [1] FALSE
isTRUE(all.equal(colSums(comm_rand_t), colSums(comm))) # total individuals per species: expect same
#> [1] TRUE
# Randomize community data matrix with the independent swap algorithm (Gotelli 2000) maintaining species occurrence frequency and sample species richness
comm_rand_i <- randomizeMatrix(comm, null.model = "independentswap", iterations = 1000)
isTRUE(all.equal(rowSums(comm_rand_i), rowSums(comm))) # total abundance per site: expect same
#> [1] FALSE
isTRUE(all.equal(colSums(comm_rand_i), colSums(comm))) # total individuals per species: expect same
#> [1] TRUE
Created on 2021-10-21 by the reprex package (v2.0.0)
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.1.1 (2021-08-10)
#> os macOS Catalina 10.15.7
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Asia/Tokyo
#> date 2021-10-21
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> ape * 5.5 2021-04-25 [1] CRAN (R 4.1.0)
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
#> cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
#> cluster 2.1.2 2021-04-17 [1] CRAN (R 4.1.1)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
#> digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
#> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
#> knitr 1.36 2021-09-29 [1] CRAN (R 4.1.0)
#> lattice * 0.20-44 2021-05-02 [1] CRAN (R 4.1.1)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
#> MASS 7.3-54 2021-05-03 [1] CRAN (R 4.1.1)
#> Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.1)
#> mgcv 1.8-36 2021-06-01 [1] CRAN (R 4.1.1)
#> nlme * 3.1-152 2021-02-04 [1] CRAN (R 4.1.1)
#> permute * 0.9-5 2019-03-12 [1] CRAN (R 4.1.0)
#> picante * 1.8.2 2020-06-10 [1] CRAN (R 4.1.0)
#> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
#> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.1.0)
#> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.1.0)
#> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.0)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.0)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
#> stringi 1.7.5 2021-10-04 [1] CRAN (R 4.1.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
#> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.0)
#> tibble 3.1.5 2021-09-30 [1] CRAN (R 4.1.0)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
#> vegan * 2.5-7 2020-11-28 [1] CRAN (R 4.1.0)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
#> xfun 0.27 2021-10-18 [1] CRAN (R 4.1.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
Dear author,
Thanks for developing the powerful picante, which is really helpful. I am a beginner in using picante. Based my understanding, picante integrates a lot of functions from phylocom, but I found that there seems no direct way to measure beta NRI and NTI in picante? May I know if misunderstand anythings or there is some other ways (instead of using phylocom) in picante?
Thanks for your help.
I wanted to ask two questions-
If not, then can we use log-transformed data or log-ratio transformed data instead of counts?
Thanks!
node.age
assumes the root node in a tree tr
is tr$edge[1, 1]
, but this is not necessarily the case. The documentation for ape::read.tree
describes edge
as:
a two-column matrix of mode numeric where each row represents an edge of the tree; the nodes and the tips are symbolized with numbers; the tips are numbered 1, 2, ..., and the nodes are numbered after the tips. For each row, the first column gives the ancestor.
So I think length(tr$tip.label) + 1
is always the root node, but not tr$edge[1, 1]
. The sample code below encounters the bug for one of the trees used in Jetz et al., 2012 (Nature). If you change the first line of node.age
to rootN = length(phy$tip.label) + 1
you don't get the error.
library(picante)
#> Loading required package: ape
#> Loading required package: vegan
#> Loading required package: permute
#> Loading required package: lattice
#> This is vegan 2.5-6
#> Loading required package: nlme
library(rotl)
tr <- get_study_tree("ot_809", "tree2")
node.age(tr)
#> Error in ages[n] <- anc.age + phy$edge.length[n]: replacement has length zero
Created on 2020-05-31 by the reprex package (v0.3.0)
Hey!
I was wondering if I want to examine if there is a difference in phylogenetic diversity of traits among monophyletic groups within the phylogenetic tree, would it make sense to use these groups as independent categories and measure significant association by Kuskal-Wallis t-test (I have 5 monophyletic groups)?
For example,
kruskal.test(comm.sesmpd$mpd.obs.z ~ as.factor(phy$clades)) #phy$clades will be stored as a column in a dataframe
I'm periodically running into an error when attempting to run ses.pd() on a large community matrix. The error reads: "function node.age only works with a rooted phylo object." However, my tree object is a rooted, ultrametric tree of 'phylo' class (is.rooted() and is.ultrametric() both return TRUE). Also, I'm having no trouble running pd() using the same 'phylo' tree object. Any insights into what might be going on?
The function I'm running is as follows:
passerine.1.ses.pd.subsample <- ses.pd(samp = passerine.1.picCleanComm.subsample,
tree = passerine.picCleanTree,
null.model = "taxa.labels",
runs = 100)
Here, the samp object is a matrix with 25000 rows and 295 columns; the tree object is a phylogenetic tree with 297 tips, which is rooted and has branch lengths retained.
Please let me know also if more information is needed to help. Thank you very much for your help.
I have a dataset that I preprocessed with mothur and also calculated PD with it.
I additionally wanted to calculate different phylogenetic metrics with picante.
I used exactly the same dataset (otu table and the tree). I wanted to compare mothur and picante PD just to be on the safe side.
Out of 239 samples the results were identical for 168. However, for the rest, picante gave very low estimates of PD.
I double-checked the input data. I then extracted only incriminated samples and the OTUs that have non-zero abundances in those samples. When I run picante PD with this subset, the results are concordant with mothur!
Do you have any idea what is going on here? I will be very grateful for any kind of help. Thanks in advance,
G
I'm new at this community. Hope you can help me, I've been trying to run an analysis of null models, I'm using a matrix of 2 places and approx 90 species, is an presence-absence matrix and I'm using null.model= richness, but I've also tried with other models and I keep getting this error:
Error in randomizeMatrix(t(dat2), null.model = "richness", iterations = 100) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In randomizeMatrix(t(dat2), null.model = "richness", iterations = 100) :
NAs introduced by coercion
There's no blank spaces in the matrix, just 0 and 1.
I dont know whats my mistake.
function matrix2sample fails if the supplied matrix/data.frame lacks rownames
Hello,
I need to use ses.mpd
function with minor modifications. To do so, I looked at the function and I have a question about the null model in the switch part of ses.mpd
(I noted it's the same for ses.pd
and ses.mntd
).
What is the difference between the richness
and the sample.pool
null model, as its the same code in both cases ?
mpd.rand <- switch(null.model, taxa.labels = t(replicate(runs,
mpd(samp, taxaShuffle(dis), abundance.weighted = abundance.weighted))),
richness = t(replicate(runs, mpd(randomizeMatrix(samp,
null.model = "richness"), dis, abundance.weighted))),
frequency = t(replicate(runs, mpd(randomizeMatrix(samp,
null.model = "frequency"), dis, abundance.weighted))),
sample.pool = t(replicate(runs, mpd(randomizeMatrix(samp,
null.model = "richness"), dis, abundance.weighted))),
phylogeny.pool = t(replicate(runs, mpd(randomizeMatrix(samp,
null.model = "richness"), taxaShuffle(dis), abundance.weighted))),
independentswap = t(replicate(runs, mpd(randomizeMatrix(samp,
null.model = "independentswap", iterations), dis,
abundance.weighted))), trialswap = t(replicate(runs,
mpd(randomizeMatrix(samp, null.model = "trialswap",
iterations), dis, abundance.weighted))))
It's said to be different in the help.
richness
Randomize community data matrix abundances within samples (maintains sample species richness)
sample.pool
Randomize community data matrix by drawing species from pool of species occurring in at least one community (sample pool) with equal probability
After tests, I assume the richness
null.model does what is described in the help, but not the sample.pool
option.
I have the 1.8.1 version of picante installed from CRAN.
Maxime Jaunatre
Hey
Is it possible to plot a biplot for Inter-Community Mean Pairwise Distance for traits data? For example, to see which traits are driving the community difference.
see:
[root@centos8 picante]# ls
build data DESCRIPTION inst man MD5 NAMESPACE R README.md src test_picante_writesample.R vignettes
[root@centos8 picante]# cat test_picante_writesample.R
library('picante')
# all samples of size 4
# from a population of size 10.
w=writesample(4,10)
# the samples are
t(apply(w,1,function(x) (1:ncol(w))[x==1]))
[root@centos8 picante]# Rscript test_picante_writesample.R
Loading required package: ape
Loading required package: vegan
Loading required package: permute
Loading required package: lattice
This is vegan 2.5-7
Loading required package: nlme
Error in dimnames(x) <- dnx : 'dimnames' applied to non-array
Calls: writesample ... matrix2sample -> data.frame -> expand.grid -> provideDimnames
Execution halted
Why does this happen? How can we solve this problem? How to Test xx.R
After perusing the code of the function, I found that on L24 an object V
is used but it is never defined. Maybe this a good starting point ?
There is an error in the abundance weighted MPD calculation, caused by the fact that in the current implementation, distances of the individuals and themselves are also included (the diagonal in the distance matrix). I have implemented a correction for an ongoing project, and can fork it and add a pull request. Additionally, I would like to question why mpd gives NA when there is only one species, if 0 could be meaningfull in this case. Especially given 0 distances are included in the abundance weighted case.
#5 Edit:added the pull request
Hi,
I'm trying to run pd within a command to set up a dataframe (see below) but I'm having issues with pd. A quick breakdown:
So when I run the command:
data.frame("Observed" = estimate_richness(AlphaCombRare, measures = "Observed"),
"Shannon" = estimate_richness(AlphaCombRare, measures = "Shannon"),
"PD" = pd(samp = data.frame(t(data.frame(otu_table(AlphaCombRare)))),
tree = phy_tree(AlphaCombRare),include.root = TRUE)[,1],
"Region" = sample_data(AlphaCombRare)$REGION)
I get the error:
Error in UseMethod("is.rooted") : no applicable method for 'is.rooted' applied to an object of class "NULL"
I also tried with include.root = FALSE
just in case that could help. The dataframe is made however, the pd calculation ends up being 0 for all my isolates:
Observed Shannon PD Region
P25307 4728 7.598848 0 Fife
P25708 4442 7.402825 0 Fife
P25825 4836 7.632431 0 Fife
P27584 4655 7.432215 0 Fife
P35790 4823 7.466226 0 London
P35791 5055 7.596144 0 London
However, I also get these warnings:
In drop.tip(tree, treeabsent) :
drop all tips of the tree: returning NULL
4: In drop.tip(tree, treeabsent) :
drop all tips of the tree: returning NULL
5: In drop.tip(tree, treeabsent) :
drop all tips of the tree: returning NULL
6: In drop.tip(tree, treeabsent) :
drop all tips of the tree: returning NULL
7: In drop.tip(tree, treeabsent) :
drop all tips of the tree: returning NULL
8: In drop.tip(tree, treeabsent) :
drop all tips of the tree: returning NULL
So I dont know what could be causing this. Any suggestions?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.