Code Monkey home page Code Monkey logo

picante's People

Contributors

colinbrislawn avatar jarioksa avatar jmaspons avatar mrhelmus avatar peterdc avatar skembel avatar theussl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

picante's Issues

Strange PD results

I have a dataset that I preprocessed with mothur and also calculated PD with it.
I additionally wanted to calculate different phylogenetic metrics with picante.

I used exactly the same dataset (otu table and the tree). I wanted to compare mothur and picante PD just to be on the safe side.
Out of 239 samples the results were identical for 168. However, for the rest, picante gave very low estimates of PD.

I double-checked the input data. I then extracted only incriminated samples and the OTUs that have non-zero abundances in those samples. When I run picante PD with this subset, the results are concordant with mothur!

Do you have any idea what is going on here? I will be very grateful for any kind of help. Thanks in advance,
G

Can AITCHISON distance be used in MNTD and MPD calculations?

I wanted to ask two questions-

  1. Similar to Phylogenetic beta-diversity calculations, can we use Euclidean or any other distance matrix for MNTD and MPD measurements?

If not, then can we use log-transformed data or log-ratio transformed data instead of counts?

  1. A follow up to the #issue I opened before this. If my "comm" data frame is actually AITCHISON distance between species rather than a data frame of species and sites. Would it make sense to perform grouping of monophyletic clades on such a dataset for among species comparison?

Thanks!

Can monophyletic taxa in the tree be used as groups for differential testing?

Hey!

I was wondering if I want to examine if there is a difference in phylogenetic diversity of traits among monophyletic groups within the phylogenetic tree, would it make sense to use these groups as independent categories and measure significant association by Kuskal-Wallis t-test (I have 5 monophyletic groups)?

For example,

  1. MNTD or MPD-
    phy.dist<-cophenetic(phy)
    comm.sesmpd<-ses.mpd(comm,phy.dist)

kruskal.test(comm.sesmpd$mpd.obs.z ~ as.factor(phy$clades)) #phy$clades will be stored as a column in a dataframe

randomizeMatrix with null.model = "independentswap" fails on small communities

Thanks for the great package! I discovered this while trying to write tests for a function that uses randomizeMatrix().

5 x 5 seems to be the minimum size for the community when null.model = "independentswap".

This works:

library(picante)
data(phylocom)
randomizeMatrix(phylocom$sample[1:5,1:5], null.model="independentswap")

But this doesn't:

library(picante)
data(phylocom)
randomizeMatrix(phylocom$sample[1:4,1:5], null.model="independentswap")

It just hangs without finishing or generating an error message (I have to manually force R to quit).

sessionInfo()
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] picante_1.8.2   nlme_3.1-152    vegan_2.5-7     lattice_0.20-44 permute_0.9-5   ape_5.5        

loaded via a namespace (and not attached):
 [1] MASS_7.3-54    compiler_4.1.0 Matrix_1.3-4   parallel_4.1.0 tools_4.1.0    mgcv_1.8-36   
 [7] Rcpp_1.0.7     splines_4.1.0  grid_4.1.0     cluster_2.1.2 

trialswap and independentswap algorithms don't preserve sample species richness

Unless I'm missing something here, the trialswap and independentswap algorithms in randomizeMatrix() don't seem to "maintain species occurrence frequency and sample species richness" as described in the docs (or original publications Gotelli 2000 and Miklos & Podani 2004, for that matter). In either case, only species occurrence frequency (total of each column) is preserved, not sample species richness (total of each row).

library(picante)
#> Loading required package: ape
#> Loading required package: vegan
#> Loading required package: permute
#> Loading required package: lattice
#> This is vegan 2.5-7
#> Loading required package: nlme

data(phylocom)
comm <- phylocom$sample

set.seed(12345)

# Randomize community data matrix abundances within species (maintains species occurrence frequency)
comm_rand_f <- randomizeMatrix(comm, null.model = "frequency")

isTRUE(all.equal(rowSums(comm_rand_f), rowSums(comm))) # total abundance per site: expect different
#> [1] FALSE
isTRUE(all.equal(colSums(comm_rand_f), colSums(comm))) # total individuals per species: expect same
#> [1] TRUE

# Randomize community data matrix abundances within samples (maintains sample species richness)
comm_rand_r <- randomizeMatrix(comm, null.model = "richness")

isTRUE(all.equal(rowSums(comm_rand_r), rowSums(comm))) # total abundance per site: expect same
#> [1] TRUE
isTRUE(all.equal(colSums(comm_rand_r), colSums(comm))) # total individuals per species: expect different
#> [1] FALSE

# Randomize community data matrix with the trial-swap algorithm (Miklos & Podani 2004) maintaining species occurrence frequency and sample species richness
comm_rand_t <- randomizeMatrix(comm, null.model = "trialswap", iterations = 1000)

isTRUE(all.equal(rowSums(comm_rand_t), rowSums(comm))) # total abundance per site: expect same
#> [1] FALSE
isTRUE(all.equal(colSums(comm_rand_t), colSums(comm))) # total individuals per species: expect same
#> [1] TRUE

# Randomize community data matrix with the independent swap algorithm (Gotelli 2000) maintaining species occurrence frequency and sample species richness
comm_rand_i <- randomizeMatrix(comm, null.model = "independentswap", iterations = 1000)

isTRUE(all.equal(rowSums(comm_rand_i), rowSums(comm))) # total abundance per site: expect same
#> [1] FALSE
isTRUE(all.equal(colSums(comm_rand_i), colSums(comm))) # total individuals per species: expect same
#> [1] TRUE

Created on 2021-10-21 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.1 (2021-08-10)
#>  os       macOS Catalina 10.15.7      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Asia/Tokyo                  
#>  date     2021-10-21                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  ape         * 5.5     2021-04-25 [1] CRAN (R 4.1.0)
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 4.1.0)
#>  cli           3.0.1   2021-07-17 [1] CRAN (R 4.1.0)
#>  cluster       2.1.2   2021-04-17 [1] CRAN (R 4.1.1)
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.1.0)
#>  digest        0.6.28  2021-09-23 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.0)
#>  knitr         1.36    2021-09-29 [1] CRAN (R 4.1.0)
#>  lattice     * 0.20-44 2021-05-02 [1] CRAN (R 4.1.1)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
#>  MASS          7.3-54  2021-05-03 [1] CRAN (R 4.1.1)
#>  Matrix        1.3-4   2021-06-01 [1] CRAN (R 4.1.1)
#>  mgcv          1.8-36  2021-06-01 [1] CRAN (R 4.1.1)
#>  nlme        * 3.1-152 2021-02-04 [1] CRAN (R 4.1.1)
#>  permute     * 0.9-5   2019-03-12 [1] CRAN (R 4.1.0)
#>  picante     * 1.8.2   2020-06-10 [1] CRAN (R 4.1.0)
#>  pillar        1.6.4   2021-10-18 [1] CRAN (R 4.1.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.10.1  2020-08-26 [1] CRAN (R 4.1.0)
#>  Rcpp          1.0.7   2021-07-07 [1] CRAN (R 4.1.0)
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.1.0)
#>  rlang         0.4.12  2021-10-18 [1] CRAN (R 4.1.0)
#>  rmarkdown     2.11    2021-09-14 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
#>  stringi       1.7.5   2021-10-04 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.6.2   2021-09-23 [1] CRAN (R 4.1.0)
#>  tibble        3.1.5   2021-09-30 [1] CRAN (R 4.1.0)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
#>  vegan       * 2.5-7   2020-11-28 [1] CRAN (R 4.1.0)
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun          0.27    2021-10-18 [1] CRAN (R 4.1.0)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

PD rooted/unrooted NULL error

Hi,
I'm trying to run pd within a command to set up a dataframe (see below) but I'm having issues with pd. A quick breakdown:

  1. My phylogenetic tree was generated using a R package called 'mia' and then added to my data object (with otu etc)
  2. The combined phylo data was then converted into a phyloseq object for analysis
  3. The phylogenetic tree was rooted using ape and a function to determine an outgroup

So when I run the command:

data.frame("Observed" = estimate_richness(AlphaCombRare, measures = "Observed"),
                  "Shannon" = estimate_richness(AlphaCombRare, measures = "Shannon"),
                  "PD" = pd(samp = data.frame(t(data.frame(otu_table(AlphaCombRare)))), 
                  tree = phy_tree(AlphaCombRare),include.root = TRUE)[,1],
                  "Region" = sample_data(AlphaCombRare)$REGION)

I get the error:
Error in UseMethod("is.rooted") : no applicable method for 'is.rooted' applied to an object of class "NULL"

I also tried with include.root = FALSE just in case that could help. The dataframe is made however, the pd calculation ends up being 0 for all my isolates:

       Observed  Shannon PD Region
P25307     4728 7.598848  0   Fife
P25708     4442 7.402825  0   Fife
P25825     4836 7.632431  0   Fife
P27584     4655 7.432215  0   Fife
P35790     4823 7.466226  0 London
P35791     5055 7.596144  0 London

However, I also get these warnings:

In drop.tip(tree, treeabsent) :
  drop all tips of the tree: returning NULL
4: In drop.tip(tree, treeabsent) :
  drop all tips of the tree: returning NULL
5: In drop.tip(tree, treeabsent) :
  drop all tips of the tree: returning NULL
6: In drop.tip(tree, treeabsent) :
  drop all tips of the tree: returning NULL
7: In drop.tip(tree, treeabsent) :
  drop all tips of the tree: returning NULL
8: In drop.tip(tree, treeabsent) :
  drop all tips of the tree: returning NULL

So I dont know what could be causing this. Any suggestions?

pd cannot use ultrametric tree(?)

I don't know if I understand pd() correctly, but I think I should be able to calculate PD with ultramteric tree. However, when I calculated PD using pd() function of picante using my ultrametric tree that I made using upgma() function from phangorn, I got the following error message:

Error in ages[n] <- anc.age + phy$edge.length[n] : replacement has length zero

Based on the error message, the problem might be on the internal function node.age. If I understood the function correctly, it tries to return the age of the nodes based on its branch length, but when it tries to identify the root node using rootN = phy$edge[1, 1], I found that my ultrametric tree's root node is not in phy$edge[1, 1]. Instead, the root node is on the last row of the first column in phy$edge (maybe something like phy$edge[nrow(phy$edge),1]). I don't know how to fix this as I am not confident I identify how the function works correctly.

When I tried to calculate the same sample with the same ultrametric tree but do not include root (include.root=FALSE), the pd() function works, but it returns the same value of PD that I got when I use pd() with the same sample but with tree made using neighbor-joining method.

Faulty assumption in node.age picks wrong root

node.age assumes the root node in a tree tr is tr$edge[1, 1], but this is not necessarily the case. The documentation for ape::read.tree describes edge as:

a two-column matrix of mode numeric where each row represents an edge of the tree; the nodes and the tips are symbolized with numbers; the tips are numbered 1, 2, ..., and the nodes are numbered after the tips. For each row, the first column gives the ancestor.

So I think length(tr$tip.label) + 1 is always the root node, but not tr$edge[1, 1]. The sample code below encounters the bug for one of the trees used in Jetz et al., 2012 (Nature). If you change the first line of node.age to rootN = length(phy$tip.label) + 1 you don't get the error.

library(picante)
#> Loading required package: ape
#> Loading required package: vegan
#> Loading required package: permute
#> Loading required package: lattice
#> This is vegan 2.5-6
#> Loading required package: nlme
library(rotl)

tr <- get_study_tree("ot_809", "tree2")
node.age(tr)
#> Error in ages[n] <- anc.age + phy$edge.length[n]: replacement has length zero

Created on 2020-05-31 by the reprex package (v0.3.0)

rarefaction and PD?

Hello

Could you please advise if rarefaction affects the PD (Phylogentic diversity) or not? have you come across that? why yes or no?

Thanks

Error in dimnames(x) <- dnx : 'dimnames' applied to non-array

see:

[root@centos8 picante]# ls
build  data  DESCRIPTION  inst  man  MD5  NAMESPACE  R  README.md  src  test_picante_writesample.R  vignettes
[root@centos8 picante]# cat test_picante_writesample.R
library('picante')
# all samples of size 4
# from a population of size 10.
w=writesample(4,10)
# the samples are
t(apply(w,1,function(x) (1:ncol(w))[x==1]))
[root@centos8 picante]# Rscript test_picante_writesample.R
Loading required package: ape
Loading required package: vegan
Loading required package: permute
Loading required package: lattice
This is vegan 2.5-7
Loading required package: nlme
Error in dimnames(x) <- dnx : 'dimnames' applied to non-array
Calls: writesample ... matrix2sample -> data.frame -> expand.grid -> provideDimnames
Execution halted

Why does this happen? How can we solve this problem? How to Test xx.R

abundance weighted MPD error

There is an error in the abundance weighted MPD calculation, caused by the fact that in the current implementation, distances of the individuals and themselves are also included (the diagonal in the distance matrix). I have implemented a correction for an ongoing project, and can fork it and add a pull request. Additionally, I would like to question why mpd gives NA when there is only one species, if 0 could be meaningfull in this case. Especially given 0 distances are included in the abundance weighted case.
#5 Edit:added the pull request

Phylogeny input file

Hi Steve,

I have a very basic question about the phylogeny. After reviewing some literature, I've seen four types of trees that people uses as their phylogeny input files:

  1. Pruned tree from an already published phylogeny (only contain species present in their site of study).
  2. A published phylogeny where they added their species if not present (this can be a huge phylogeny, however, the species present in their city of study represent a small fraction of the big phylogeny).
  3. Simple tree containing only one OTU/ species.
  4. Tree containing one OTU/site, therefore, the tree will contain more than one sequence per OTU.

Do you have a recommendation about the phylogeny approach?

I'm working with a genus of ectomycorrhizal fungi, the diversity of this genus is ~150 species. In my study, I found 24 OTUS across all sampled sites.

Difference between richness and sample.pool null.model in ses.mpd/ses.pd/ses.mntd ?

Hello,

I need to use ses.mpd function with minor modifications. To do so, I looked at the function and I have a question about the null model in the switch part of ses.mpd (I noted it's the same for ses.pd and ses.mntd).

What is the difference between the richness and the sample.pool null model, as its the same code in both cases ?

mpd.rand <- switch(null.model, taxa.labels = t(replicate(runs, 
        mpd(samp, taxaShuffle(dis), abundance.weighted = abundance.weighted))), 
        richness = t(replicate(runs, mpd(randomizeMatrix(samp, 
            null.model = "richness"), dis, abundance.weighted))),
        frequency = t(replicate(runs, mpd(randomizeMatrix(samp, 
            null.model = "frequency"), dis, abundance.weighted))), 
        sample.pool = t(replicate(runs, mpd(randomizeMatrix(samp, 
            null.model = "richness"), dis, abundance.weighted))), 
        phylogeny.pool = t(replicate(runs, mpd(randomizeMatrix(samp, 
            null.model = "richness"), taxaShuffle(dis), abundance.weighted))), 
        independentswap = t(replicate(runs, mpd(randomizeMatrix(samp, 
            null.model = "independentswap", iterations), dis, 
            abundance.weighted))), trialswap = t(replicate(runs, 
            mpd(randomizeMatrix(samp, null.model = "trialswap", 
                iterations), dis, abundance.weighted))))

It's said to be different in the help.

richness
Randomize community data matrix abundances within samples (maintains sample species richness)
sample.pool
Randomize community data matrix by drawing species from pool of species occurring in at least one community (sample pool) with equal probability

After tests, I assume the richness null.model does what is described in the help, but not the sample.pool option.

I have the 1.8.1 version of picante installed from CRAN.

Maxime Jaunatre

Bug in ses.pd function

Hi everyone,

I write to report a bug in the ses.pd function. As you may know, the arguments of ses.pd are passed through the function pd, which means that one can specify whether to include the distance from the most recent common ancestor of the species and the root of the tree (MRCA – root distance). The default setting (in the pd function) is "include.root = TRUE". However, I have noted that ses.pd computes null PD values without including the MRCA – root distance regardless of the logical value that is specified in the include.root argument. This is, if include.root = TRUE (default = TRUE), the observed PD will include the MRCA – root distance, but the null PD values will not. For example:

require(picante)
data(phylocom)
phylocom$sample -> Sample
phylocom$phylo -> Tree
prune.sample(Sample,Tree)->Tree_F
Sample <- Sample[,Tree_F$tip.label]
set.seed(12345) # fix the seed to make the analyses reproducible
ses.pd(Sample,Tree_F,null.model="taxa.labels",runs=999,include.root=FALSE) ->
No_root_1 # the MRCA – root distance should be excluded from all the computations
set.seed(12345) # back to the same seed
ses.pd(Sample,Tree_F,null.model="taxa.labels",runs=999,include.root=TRUE) ->
root_1 # the MRCA – root distance should be included in all the computations

No_root_1
root_1

A quick look to the results reveals that as expected, the observed PD values between "No_root_1" and "root_1" are different for communities "clump1" and "clump2a" (the only two communities where the root of the tree is not traversed by the minimum spanning path connecting the species in the phylogeny), because in "No_root_1" the include.root argument was set to FALSE. HOWEVER, the descriptors of the null distributions in "No_root_1" and "root_1" are identical, and this is because the null PD values are computed without including the MRCA – root distance in all cases regardless of the logical value that is specified in the include.root argument.

The good news are that this anomaly only significantly affects communities with low species richness (less than four species). For more details, you may take a look to the following link, which includes a preprint with detailed information on the issue (a step by step tutorial) along with a simulation exercise (which informs about the communities that may be affected) and two alternatives to fix the problem (all the R code is fully available in the Supplementary Material of the preprint): https://www.biorxiv.org/content/10.1101/579300v1

Rafael Molina Venegas

abundance weighted PD

Hi @skembel!

I have had a few requests for this over the years as it was in Vellend et al. 2011 but functionality never made it into picante for reasons that have been lost to history.

Anyway, recently found the code to do it, just have to discuss how to merge it in?

Long lines are cut off in vignette PDF

I've been working my way through the picante vignette, and found some lines of code which are too long to fit on the PDF page.

screen shot 2017-09-19 at 3 52 52 pm

Specifically, Lines 100, 145, and 147 could be wrapped.

The warning from line 178 also has this issue, but that's probably less important to include.

Thanks for continuing to support this software package! My colleagues have used it to great success, and I hope to do the same.

Edit: added another line.
Are you accepting pull requests? This seems like a pretty simple fix, but I'm not sure if you are looking for PRs right now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.