reptalex / phylofactor Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 9.0 17.52 MB

R 98.17% C 1.83%

phylofactor's People

Contributors

Stargazers

Watchers

Forkers

dombennett jsilve24 leffj mortonjt tankmermaid erikaganda chrismitbiz nermin-ghith khalidtab

phylofactor's Issues

cmultrepl and Zeros

Hi @reptalex !

I started using cmultrepl (CZM method) to deal with zeros based Gregory Gloor's compositional biplot tutorial, and have kept using it with other packages to stay consistent. I downloaded your r scripts from the first phylofactor paper and noticed you cautioned against it because "it can lead to negative-valued compositions." I was hoping you could expand a little more on what this means and why its inappropriate. Thanks for any insight!

parameter "glms"

Teacher, I can't find the parameter "glms" of Phylofactor. According to the tutorial, when I run this parameter, it will be null. How can I check the regression of each factor？

Null Simulations for GPF

that would be sweet!

log-ratios are negative

Hey @reptalex

I am comparing the log-ratios of two genes from metagenomes. And I have some values that are negative as gene in the denominator can be more abundant than the one in the numerator.

When I run these log-ratios in PhyloFactor glm analysis, I get an error-
#my code-

pf_PhyloFactor<-PhyloFactor(wood1$Current_Log_Ratio,pruned.trees_diet_continuous,wood1$N15,frmla = Data~N15,nfactors=2,choice = "F")

#error-
some tips in tree are not found in dataset - output PF$tree will contain a trimmed treedrop all tips of the tree: returning NULLError in PhyloFactor(wood1$Current_Log_Ratio, pruned.trees_diet_continuous, :
For log-transformed data analysis, all entries of Data must be greater than or equal to 0

Is there a way around it? Log-ratios between gene families can be positive or negative. That's not sequencing artifact.

pf.tree error

A big thanks also from me for making this package available! I am currently analysing data (my second paper which will reference your awesome package ;0) and have been using the pf.tree function frequently when suddenly it spit out an unexpected error. It seems the mapping of factors does not link to nodes anymore. Details and session info below.

I am not sure if this is a ggtree or phylofactor issue, nor if I am missing anything...?
Thanks for any advice.
Cheers, Chris

Phylofactor object

> PF_ALL_var
      phylofactor object from function PhyloFactor
       --------------------------------------------       
Method                    : glm
Choice                    : var
Formula                   : Data ~ X
Number of species         : 4834
Number of factors         : 50
Frac Explained Variance   : 0.0191
Largest non-remainder bin : 3382
Number of singletons      : 8
Paraphyletic Remainder    : 1025 species
                  
-------------------------------------------------------------
Factor Table:
                                  Group1                         Group2     ExpVar       F     Pr(>F)
Factor 1     5 member Monophyletic clade 4829 member Monophyletic clade 0.00114330  7.7276 5.2887e-08
Factor 2    31 member Monophyletic clade 4798 member Paraphyletic clade 0.00097505  2.7940 7.8446e-03
Factor 3  3382 member Paraphyletic clade 1416 member Paraphyletic clade 0.00105700  7.4302 1.0322e-07
Factor 4     9 member Monophyletic clade 1407 member Paraphyletic clade 0.00078515 17.2670 7.7716e-16

pf.tree function error

gtree <- phylofactor::pf.tree(PF_ALL_var)

Found more than one class "phylo" in cache; using the first, from namespace 'phyloseq'
Also defined by ‘tidytree’

Error: mapping and node can't be NULL simultaneously, we can't get the 
              data to be displayed in this layer, please provide a data or subset 
              (we will extract the data from tree data.), or provide node!

Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
mapping and node can't be NULL simultaneously, we can't get the 
              data to be displayed in this layer, please provide a data or subset 
              (we will extract the data from tree data.), or provide node!
Backtrace:
 1. phylofactor::pf.tree(PF_ALL_var)
 2. ggplot2:::`+.gg`(...)
 3. ggplot2:::add_ggplot(e1, e2, e2name)
 5. ggtree:::ggplot_add.hilight(object, p, objectname)
Run `rlang::last_trace()` to see the full context.

> rlang::last_trace()
<error/rlang_error>
mapping and node can't be NULL simultaneously, we can't get the 
              data to be displayed in this layer, please provide a data or subset 
              (we will extract the data from tree data.), or provide node!
Backtrace:
    █
 1. └─phylofactor::pf.tree(PF_ALL_var)
 2.   └─ggplot2:::`+.gg`(...)
 3.     └─ggplot2:::add_ggplot(e1, e2, e2name)
 4.       ├─ggplot2::ggplot_add(object, p, objectname)
 5.       └─ggtree:::ggplot_add.hilight(object, p, objectname)

Session info


> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggtree_2.4.1      phylofactor_0.0.1 Matrix_1.3-0      data.table_1.13.6 magrittr_2.0.1    ape_5.4-1         microbiome_1.12.0
 [8] phyloseq_1.34.0   vegan_2.5-7       lattice_0.20-41   permute_0.9-5     ggpubr_0.4.0      forcats_0.5.0     stringr_1.4.0    
[15] dplyr_1.0.2       purrr_0.3.4       readr_1.4.0       tidyr_1.1.2       tibble_3.0.4      ggplot2_3.3.3     tidyverse_1.3.0  
[22] qiime2R_0.99.23  

loaded via a namespace (and not attached):
  [1] readxl_1.3.1        backports_1.2.1     Hmisc_4.4-2         plyr_1.8.6          igraph_1.2.6        lazyeval_0.2.2     
  [7] splines_4.0.3       digest_0.6.27       foreach_1.5.1       htmltools_0.5.0     viridis_0.5.1       fansi_0.4.1        
 [13] checkmate_2.0.0     cluster_2.1.0       openxlsx_4.2.3      Biostrings_2.58.0   graphlayouts_0.7.1  modelr_0.1.8       
 [19] prettyunits_1.1.1   jpeg_0.1-8.1        colorspace_2.0-0    rvest_0.3.6         ggrepel_0.9.0       haven_2.3.1        
 [25] xfun_0.19           crayon_1.3.4        phylosmith_1.0.5    jsonlite_1.7.2      survival_3.2-7      iterators_1.0.13   
 [31] glue_1.4.2          polyclip_1.10-0     gtable_0.3.0        zlibbioc_1.36.0     XVector_0.30.0      car_3.0-10         
 [37] Rhdf5lib_1.12.0     BiocGenerics_0.36.0 abind_1.4-5         scales_1.1.1        DBI_1.1.0           rstatix_0.6.0      
 [43] Rcpp_1.0.5          viridisLite_0.3.0   progress_1.2.2      htmlTable_2.1.0     units_0.6-7         tidytree_0.3.3     
 [49] foreign_0.8-81      Formula_1.2-4       stats4_4.0.3        DT_0.16             htmlwidgets_1.5.3   httr_1.4.2         
 [55] RColorBrewer_1.1-2  ellipsis_0.3.1      pkgconfig_2.0.3     farver_2.0.3        nnet_7.3-14         dbplyr_2.0.0       
 [61] labeling_0.4.2      tidyselect_1.1.0    rlang_0.4.10        reshape2_1.4.4      munsell_0.5.0       cellranger_1.1.0   
 [67] tools_4.0.3         cli_2.2.0           generics_0.1.0      ade4_1.7-16         broom_0.7.3         evaluate_0.14      
 [73] biomformat_1.18.0   yaml_2.2.1          fuzzyjoin_0.1.6     fs_1.5.0            knitr_1.30          tidygraph_1.2.0    
 [79] zip_2.1.1           ggraph_2.0.4        nlme_3.1-151        aplot_0.0.6         xml2_1.3.2          compiler_4.0.3     
 [85] rstudioapi_0.13     curl_4.3            png_0.1-7           e1071_1.7-4         ggsignif_0.6.0      reprex_0.3.0       
 [91] treeio_1.14.3       tweenr_1.0.1        stringi_1.5.3       classInt_0.4-3      multtest_2.46.0     vctrs_0.3.6        
 [97] pillar_1.4.7        lifecycle_0.2.0     rhdf5filters_1.2.0  BiocManager_1.30.10 patchwork_1.1.1     R6_2.5.0           
[103] latticeExtra_0.6-29 KernSmooth_2.23-18  gridExtra_2.3       rio_0.5.16          IRanges_2.24.1      codetools_0.2-18   
[109] MASS_7.3-53         assertthat_0.2.1    rhdf5_2.34.0        withr_2.3.0         S4Vectors_0.28.1    mgcv_1.8-33        
[115] parallel_4.0.3      hms_0.5.3           grid_4.0.3          rpart_4.1-15        class_7.3-17        rmarkdown_2.6      
[121] rvcheck_0.1.8       carData_3.0-4       Rtsne_0.15          sf_0.9-6            ggforce_0.3.2       Biobase_2.50.0     
[127] lubridate_1.7.9.2   base64enc_0.1-3

Error in GLMs[[winner]] : attempt to select less than one element in get1index

Hey @reptalex ,

I keep getting the above error with 2 different 16S rarefied data sets (no error with raw data) and hoped you could help me understand why.

pf <- PhyloFactor(table,tree,timestamp,choice='F',stop.fcn='KS',stop.early=TRUE)

Thanks for any help!

FTmicrobiome$PF$choice not existing

When I try to follow the tutorial

pf <- FTmicrobiome$PF

produces the error Error in if (PF$choice == "var") { : argument is of length zero. I think it is because PF does not have choice.

package release on CRAN/bioconductor

As far as I can tell, phylofactor is not currently available on CRAN or bioconductor. This lack of availability of phylofactor from these standard package repositories makes it difficult to install phylofactor (eg., via CRAN or anaconda) and integrate it into pipelines and reproducible research environments. Do you have plans to release phylofactor on CRAN or bioconductor anytime soon?

Problem with example code for pf.heatmap() using predict()

Thanks for this very useful package. In experimenting I noticed that the documentation for pf.heatmap() gives the example

pred <- predict(PF,factors=1:3)
predicted <- pf.heatmap(PF,Data=pred,factors=1:3,width=3)

But running this example code shows all factors on the heatmap. The problem seems to be that predict() wants factor = max_factor (as described in it's documentation), rather than factors = list_of_factors. This works as expected:

pred <- predict(PF,factor=3)
predicted <- pf.heatmap(PF,Data=pred,factors=1:3,width=3)

Application questions?

summary of Phylofactor with method='max.var' throws error.

Just to let you know that I get the following error.

It works when I use the PhyCA function.

pf= PhyloFactor(as.matrix(data),tree,method='max.var' )
summary(pf)

Error in paste("       phylofactor object from function ", PF$phylofactor.fcn,  : 
  cannot coerce type 'closure' to vector of type 'character'

I don't think the warnings are relevant:

1: In PhyloFactor(as.matrix(data), tree, method = "max.var", ncores = 4,  :
  some tips in tree are not found in dataset - output PF$tree will contain a trimmed tree
2: In PhyloFactor(as.matrix(data), tree, method = "max.var", ncores = 4,  :
  rows of data are in different order of tree tip-labels - use output$data for downstream analysis, or set Data <- Data[output$tree$tip.label,]

CLR transformed Abundances

Hi @reptalex!

Sorry to hit you with two questions as once. I have several identical taxa that are partitioned in different factors. Instead of showing multiple plots for the same taxon, I was hoping to aggregate the data so I can show it in a single plot, but I'm not sure my approach is appropriate. Essentially, I took same-taxon sOTUs that phylofactor identified as changing in the same direction and combined their CLR transformed values into a single plot. I understand that showing how ratios change with respect to the geometric mean is fundamentally different than how phylofactor works, so I was hoping you might have some advice on how to proceed. Thanks for any help!

Getting eigenvalues for ILR ordination

Hi, thanks so much for creating PhyloFactor - it's a great package which has helped enormously with my data analysis!

I wonder if I could ask whether it is possible to retrieve the Eigenvalues for ILR ordinations created using the pf.ILRprojection command? Eg to create a scree plot or add to the axes of an ordination plot?

Many thanks,

David

conda recipe for phylofactor

It would be awesome if there could be a conda recipe on bioconda for phylofactor.
https://github.com/bioconda/bioconda-recipes

It'll make installation super easy.

error in pf.ILRprojection

Is this deprecated?

> pf.ILRprojection(pf)
Error in match.fun(FUN) : object 'amalg.ILR' not found

Error when defining colors for pf.tree

I assume this is also a small internal error

pf.tree(pf,colors = factor_colors)
Error in structure(list(node = node, fill = fill, alpha = alpha, extend = extend,  : 
  object 'cols' not found

Raw data input to pylofactor

Hey @reptalex

I started to experiment with your package, I really like it.
I wanted to ask if it is necessary to add the relative abundant data into PhyloFactor (data that sums to 1). Because you anyway use a log transformation.

As I understand Microbiome Datasets Are Compositional: And This Is Not Optional, centic log transformation of the raw data would be more appropriate. What do you think?

bigglm does not work for X~Data with family=binomial(link='logit')

WTF?!

Extract OTUs for each factor

Hey @reptalex,

I continue to try out phylofactor for 16S data. I build a tree by aligning the sequences for each OTU.

I run phlofactor on the raw counts, which gives me a number of interesting factors.
I would now find out wihch OTUs belong to which factor (smaller group, Group1?).

How can I extract which subtree are selected by pylofactor and wihc OTUs are on that subtree?

Phylofactorization with random variables

Hello,
I have an OTU count table. I want performed phylofactorization on a two-level independent variable Control and Sick(AIA) that I extracted from my metadata. However, I have an another structures of my data by age of the subject (4, 11 and 28) as random variable that I want to take into account.
How can I construct the analysis to do that ?

glm is not working

Hello
I was following the tutorial, and the R keeps saying that "no glm was found".
I attached print screens from either "my tutorial results" and the "web site results", below, regarding to the construction of the PF for factor 1. The "glm" does not appear in my results and, then, I cannot run any function that requires the glm.
Could you please help me with this issue?

Thanks!

My results

Web site results

phylofactor will not install using R studio on Mac

Dear reptalex/Alex:

Do you know what tools are necessary to build/install phylofactor on a Mac?

Intallation of phylofactor fails no matter how I try to perform it. You can see below that the error involves lack of tools necessary to compile the package. I am running R version 4.0.1 and have installed the latest versions of Xcode and gfortran. Calling pkgbuild to diagnose the problem just brings up some dumb website (https://www.cnet.com/how-to/install-command-line-developer-tools-in-os-x/). All other recommended packages are running.

I followed installation issue #18, who has/had a similar difficulty. You recommendation to go to rstudio/tensorflow#133 led to a dead end resolution.

Please suggest a way around this issue because I do not have access to any other computer, etc. to run phylofactor.

Thank you, Howard

BiocManager::install('reptalex/phylofactor')
Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.1 (2020-06-06)
Installing github package(s) 'reptalex/phylofactor'
Downloading GitHub repo reptalex/phylofactor@master
Error: Failed to install 'phylofactor' from GitHub:
Could not find tools necessary to compile a package
Call pkgbuild::check_build_tools(debug = TRUE) to diagnose the problem.

devtools::install_github('reptalex/phylofactor')
Downloading GitHub repo reptalex/phylofactor@master
Error: Failed to install 'phylofactor' from GitHub:
Could not find tools necessary to compile a package
Call pkgbuild::check_build_tools(debug = TRUE) to diagnose the problem.

Unequal Number of Samples

Hi @reptalex ,

I've really enjoyed using this package, but I was wondering how Phylofactor deals with uneven sampling. For example, if I had 50 samples from one community and 25 from another, should I attempt to normalize the number of samples before comparing them? Thanks!

Predictions using new data

Hi @reptalex ,

I have a new issue related with predicting with new data:

First, I created this object:

pf_PhyloFactor <- PhyloFactor(train_OTUs, filtered_tree, train_MetaData, frmla = PHENOTYPE~Data, nfactors=factors, choice='F' ,ncores=ncores, family='binomial')

After running phylofactor and generating the object "pf_PhyloFactor" (in this case, from using an OTU table named train_OTUs), I can perform predictions without problems if I use this command:

predict(pf_PhyloFactor, newdata = NULL, type = "response")

In this case, I get predictions based on the data used for generating pf_PhyloFactor (i.e. train_OTUs). However, when I try to use a new data (e.g. test_OTUs) with the same structure as train_OTUs (same OTUs or Species but different samples) and I use:

predict(pf_PhyloFactor, newdata = test_OTUs, type = "response")

I have the following error:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): invalid type (list) for variable 'Data'
Traceback:

1. predict(pf_PhyloFactor, newdata = test_OTUs, type = "response")
2. predict.phylofactor(pf_PhyloFactor, newdata = test_OTUs, type = "response")
3. do.call(stats::glm, args) %>% stats::predict(...)
4. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
5. eval(quote(`_fseq`(`_lhs`)), env, env)
6. eval(quote(`_fseq`(`_lhs`)), env, env)
7. `_fseq`(`_lhs`)
8. freduce(value, `_function_list`)
9. withVisible(function_list[[k]](value))
10. function_list[[k]](value)
11. stats::predict(., ...)
12. predict.glm(., ...)
13. predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == 
  .     "link", "response", type), terms = terms, na.action = na.action)
14. model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels)
15. model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels)

I also read in the tutorial that I have to create a new column with 'Species'.

I tried this:

sp_ <- test_OTUs
sp_['Species'] <- rownames(test_OTUs)
predict(pf_PhyloFactor, newdata = sp_, type = "response")

And I got exactly the same error as before.

Do I need any additional step to perform predictions with the new data?

Thank you!

Error in h(simpleError(msg, call)

The following error occured, right after it computed all phylofactors:

Error in h(simpleError(msg, call)) : s. Estimated time of completion: 2023-07-16 08:57:21.786461 error in evaluating the argument 'x' in selecting a method for function 'which': 'length = 524' in coercion to 'logical(1)' In addition: Warning message: In phylofactor::PhyloFactor(Data, tree_ur, X, frmla = Data ~ X, : Data has zeros and will receive default modification of zeros. Zeros will be replaced column wise with delta*min(x[x>0]), default delta=0.65

Using function:
PF <- phylofactor::PhyloFactor(Data,tree_ur,X, frmla=Data~ X, ncores = 2, stop.early = NULL, transform.fcn=log, KS.Pthreshold = 0.01, nfactors = 13, choice = "var")

The issue started to appear after updating R and R studio. It ran fine beforehand.

Issue can be replicated using code from chapter 9.2.2 – 9.2.4 here, using the phyloseq object ps_ProjectX_2022July.

session info:
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] ape_5.7-1 ggnewscale_0.4.9 ggtree_3.8.0 ggtreeExtra_1.10.0 colorspace_2.1-0 lubridate_1.9.2
[7] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[13] tibble_3.2.1 tidyverse_2.0.0 ggpubr_0.6.0 ggplot2_3.4.2 phyloseq_1.44.0

loaded via a namespace (and not attached):
[1] bitops_1.0-7 gridExtra_2.3 permute_0.9-7 rlang_1.1.1 magrittr_2.0.3
[6] ade4_1.7-22 compiler_4.3.0 mgcv_1.9-0 systemfonts_1.0.4 vctrs_0.6.3
[11] reshape2_1.4.4 rvest_1.0.3 pkgconfig_2.0.3 crayon_1.5.2 fastmap_1.1.1
[16] backports_1.4.1 XVector_0.40.0 utf8_1.2.3 rmarkdown_2.23 tzdb_0.4.0
[21] xfun_0.39 zlibbioc_1.46.0 aplot_0.1.10 GenomeInfoDb_1.36.1 jsonlite_1.8.7
[26] biomformat_1.28.0 rhdf5filters_1.12.1 Rhdf5lib_1.22.0 broom_1.0.5 parallel_4.3.0
[31] cluster_2.1.4 R6_2.5.1 stringi_1.7.12 car_3.1-2 Rcpp_1.0.11
[36] bookdown_0.34 iterators_1.0.14 knitr_1.43 IRanges_2.34.0 Matrix_1.6-0
[41] splines_4.3.0 igraph_1.5.0 timechange_0.2.0 tidyselect_1.2.0 rstudioapi_0.15.0
[46] abind_1.4-5 yaml_2.3.7 viridis_0.6.3 vegan_2.6-4 codetools_0.2-19
[51] lattice_0.21-8 plyr_1.8.8 treeio_1.24.1 Biobase_2.60.0 withr_2.5.0
[56] evaluate_0.21 gridGraphics_0.5-1 survival_3.5-5 phylofactor_0.0.1 xml2_1.3.5
[61] Biostrings_2.68.1 pillar_1.9.0 carData_3.0-5 foreach_1.5.2 stats4_4.3.0
[66] ggfun_0.1.1 generics_0.1.3 RCurl_1.98-1.12 S4Vectors_0.38.1 hms_1.1.3
[71] tidytree_0.4.4 munsell_0.5.0 scales_1.2.1 glue_1.6.2 lazyeval_0.2.2
[76] tools_4.3.0 data.table_1.14.8 webshot_0.5.5 ggsignif_0.6.4 rhdf5_2.44.0
[81] grid_4.3.0 patchwork_1.1.2 nlme_3.1-162 GenomeInfoDbData_1.2.10 cli_3.6.1
[86] kableExtra_1.3.4 fansi_1.0.4 viridisLite_0.4.2 svglite_2.1.1 gtable_0.3.3
[91] yulab.utils_0.0.6 rstatix_0.7.2 digest_0.6.33 BiocGenerics_0.46.0 ggplotify_0.1.1
[96] htmltools_0.5.5 multtest_2.56.0 lifecycle_1.0.3 httr_1.4.6 MASS_7.3-60

Error when running phylofactor with 40+ factors

When I was running an analysis with over 40 factors this error showed up, so the analysis was not completed. When I run it with less than 40 factors I had no errors.

My dataset has ~7k samples and 6.5k OTUs

Error in unserialize(node$con) : error reading from connection
Calls: PhyloFactor ... FUN -> recvData -> recvData.SOCKnode -> unserialize

The specs of where I run the analysis are:
16 cores
8 GB of ram

pf.sumary doesn't work on twoSampleFactor

I used twoSampleFactor to construct a partition and would like to see the summary per factor.

Is there a reason why this doesn't work.

> pf.summary(pf,taxonomy ,1)
Error in `[.data.frame`(taxG, , 2) : undefined columns selected

```

Comparing tow microbiomes

Hey @reptalex,

I have again a special application for phylofactor.
My goal is to compare two microbiomes from two different host organisms. I have count data for both microbiomes.

The problem, there are almost no shared species, only at the genus level, the two microbiomes are comparable.

I assume it is difficult to work with missing data in philofactor and filling in 0 for all the missing species, will be kind of a bias.

Do you have an idea how to summarize the counts on genus level?

Installation issue

Thanks for this great tool. I was trying to install the package but failed both for Mac and window. I first installed Biostrings (v.2.50.2), ggtree (v.1.14.6), BioManager(v.1.30.4), devtools(v.2.0.1) and was ready to install phylofactor. But it only shows a popup saying 'builing R package from source requires installation of additional build tools.' And when i click yes, there's an error message "Error: Could not find tools necessary to compile a package."
What else do i need to intall?

cast taxonomies to character or warn?

pf.taxa requires the taxonomy column of taxonomies to be a character vector, but R loves to represent things as factors.

If you try and run pf.taxa with a factor, you get this error:

Error in base::endsWith(output, ";"): non-character object(s)
Traceback:

1. pf.taxa(pf_ExPlate_var, taxonomy, factor = i)
2. uniqueTaxa(t1, t2) %>% unique
3. eval(lhs, parent, parent)
4. eval(lhs, parent, parent)
5. uniqueTaxa(t1, t2)
6. base::endsWith(output, ";")

It would be nice to have a more explicit test to warn (or coerce).

Multiple Coefficients?

Hi @reptalex,

I am using Phylofactor to look at 16s balances associated with viral loads. I am using a phyloseq object to do this, am my code looks something similar to this -

pf <- PhyloFactor(phyloseq@otu_table, phyloseq@phy_tree, phyloseq@sam_data$ViralLoad)

When I look at the model, though, there is a coefficient for each value in the viral load metadata. For one metadata category, shouldn't there only be one coefficient for vial load? That is the case with your tutorial regarding latitude, but I can't seem to replicate that with my own data even if I try to format mine similar to how you had yours in the tutorial. Thanks for any help!

Choice between var and F

Hi Alex,
I really enjoy your package and even created a tutorial for students or anyone interested. https://chrismitbiz.github.io/ABlab-workflows/phylofactor.html#phylofactor

This not an issue but more of a method question. My stat background is limited, although I understand the concepts and have done a lot of regression modelling. One of the questions I get sometimes, is how to decide between choice = "var" or "F". In the tutorial you state "The two default options for regression-based phylofactorization are choice='var', which maximizes the explained variance, and choice='F', which maximizes the F-statistic from regression (the ratio of explained to unexplained variance).

Can you help to explain in simple terms, under what circumstances it is appropriate to choose either "var" or "F"?

Thank you !
Cheers, Chris

How to deal with nested data?

Hi Alex,

Phylofactor is a great tool to incorporate phylogeny into the analysis of amplicon data in a CoDA manner.

I want to apply Phylofactor to identify balances that distinguish the gut microbiota of wild tilapia to farmed tilapia. The farmed fish were taken from different ponds and the wild fish were taken from different locations in a lake. Thus, the experiment is a nested design. Is there a way to deal with this kind of dependency structure in the metadata when running the Phylofactor for variables of interests?

Regards,
Yanxian

Display of PhyloFactor summary object fails when computed with `output.signal=FALSE`

It looks like this is due to a line that expects a 'signal' column to be present:

ERROR while rich displaying an object: Error in `[.data.frame`(s$signal.table$Group1, 1:(min(3, n1)), c("Taxon", : undefined columns selected

Traceback:
1. FUN(X[[i]], ...)
2. tryCatch(withCallingHandlers({
 .     rpr <- mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler), error = outer_handler)
3. tryCatchList(expr, classes, parentenv, handlers)
4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
5. doTryCatch(return(expr), name, parentenv, handler)
6. withCallingHandlers({
 .     rpr <- mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler)
7. mime2repr[[mime]](obj)
8. repr_text.default(obj)
9. paste(capture.output(print(obj)), collapse = "\n")
10. capture.output(print(obj))
11. evalVis(expr)
12. withVisible(eval(expr, pf))
13. eval(expr, pf)
14. eval(expr, pf)
15. print(obj)
16. print.phylofactor.summary(obj)
17. paste(capture.output(print.data.frame(s$signal.table$Group1[1:(min(3, 
  .     n1)), c("Taxon", "nSpecies", "signal")])), collapse = "\n")
18. capture.output(print.data.frame(s$signal.table$Group1[1:(min(3, 
  .     n1)), c("Taxon", "nSpecies", "signal")]))
19. evalVis(expr)
20. withVisible(eval(expr, pf))
21. eval(expr, pf)
22. eval(expr, pf)
23. print.data.frame(s$signal.table$Group1[1:(min(3, n1)), c("Taxon", 
  .     "nSpecies", "signal")])
24. row.names(x)
25. s$signal.table$Group1[1:(min(3, n1)), c("Taxon", "nSpecies", 
  .     "signal")]
26. `[.data.frame`(s$signal.table$Group1, 1:(min(3, n1)), c("Taxon", 
  .     "nSpecies", "signal"))
27. stop("undefined columns selected")