Code Monkey home page Code Monkey logo

microviz's Introduction

microViz

R-CMD-check codecov GitHub R package version GitHub release (including pre-releases) Docker Image Version (latest by date) r-universe microViz status JOSS article Citations Zenodo DOI

Overview

📦 microViz is an R package for analysis and visualization of microbiome sequencing data.

🔨 microViz functions are intended to be beginner-friendly but flexible.

🔬 microViz extends or complements popular microbial ecology packages, including phyloseq, vegan, & microbiome.

Learn more

📎 This website is the best place for documentation and examples: https://david-barnett.github.io/microViz/

Installation

microViz is not (yet) available from CRAN. You can install microViz from R Universe, or from GitHub.

I recommend you first install the Bioconductor dependencies using the code below.

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install(c("phyloseq", "microbiome", "ComplexHeatmap"), update = FALSE)

Installation of microViz from R Universe

install.packages(
  "microViz",
  repos = c(davidbarnett = "https://david-barnett.r-universe.dev", getOption("repos"))
)

I also recommend you install the following suggested CRAN packages.

install.packages("ggtext") # for rotated labels on ord_plot() 
install.packages("ggraph") # for taxatree_plots()
install.packages("DT") # for tax_fix_interactive()
install.packages("corncob") # for beta binomial models in tax_model()

Installation of microViz from GitHub

# Installing from GitHub requires the remotes package
install.packages("remotes")
# Windows users will also need to have RTools installed! http://jtleek.com/modules/01_DataScientistToolbox/02_10_rtools/

# To install the latest version:
remotes::install_github("david-barnett/microViz")

# To install a specific "release" version of this package, e.g. an old version 
remotes::install_github("david-barnett/[email protected]") 

Installation notes

🍎 macOS users - might need to install xquartz to make the heatmaps work (to do this with homebrew, run the following command in your mac’s Terminal: brew install --cask xquartz

📦 I highly recommend using renv for managing your R package installations across multiple projects.

🐳 For Docker users an image with the main branch installed is available at: https://hub.docker.com/r/barnettdavid/microviz-rocker-verse

📅 microViz is tested to work with R version 4.* on Windows, MacOS, and Ubuntu 20. R version 3.6.* should probably work, but I don’t fully test this.

Interactive ordination exploration

library(microViz)
#> microViz version 0.12.1 - Copyright (C) 2021-2024 David Barnett
#> ! Website: https://david-barnett.github.io/microViz
#> ✔ Useful?  For citation details, run: `citation("microViz")`
#> ✖ Silence? `suppressPackageStartupMessages(library(microViz))`

microViz provides a Shiny app for an easy way to start exploring your microbiome data: all you need is a phyloseq object.

# example data from corncob package
pseq <- microViz::ibd %>%
  tax_fix() %>%
  phyloseq_validate()
ord_explore(pseq) # gif generated with microViz version 0.7.4 (plays at 1.75x speed)

Example analyses (on HITChip data)

library(phyloseq)
library(dplyr)
library(ggplot2)
# get some example data
data("dietswap", package = "microbiome")

# create a couple of numerical variables to use as constraints or conditions
dietswap <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  )
# add a couple of missing values to show how microViz handles missing data
sample_data(dietswap)$african[c(3, 4)] <- NA

Looking at your data

You have quite a few samples in your phyloseq object, and would like to visualize their compositions. Perhaps these example data differ by participant nationality?

dietswap %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 15, other_name = "Other",
    taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
    palette = distinct_palette(n = 15, add = "grey90"),
    merge_other = FALSE, bar_outline_colour = "darkgrey"
  ) +
  coord_flip() +
  facet_wrap("nationality", nrow = 1, scales = "free") +
  labs(x = NULL, y = NULL) +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
#> Registered S3 method overwritten by 'seriation':
#>   method         from 
#>   reorder.hclust vegan

htmp <- dietswap %>%
  ps_mutate(nationality = as.character(nationality)) %>%
  tax_transform("log2", add = 1, chain = TRUE) %>%
  comp_heatmap(
    taxa = tax_top(dietswap, n = 30), grid_col = NA, name = "Log2p",
    taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
    colors = heat_palette(palette = viridis::turbo(11)),
    row_names_side = "left", row_dend_side = "right", sample_side = "bottom",
    sample_anno = sampleAnnotation(
      Nationality = anno_sample_cat(
        var = "nationality", col = c(AAM = "grey35", AFR = "grey85"),
        box_col = NA, legend_title = "Nationality", size = grid::unit(4, "mm")
      )
    )
  )

ComplexHeatmap::draw(
  object = htmp, annotation_legend_list = attr(htmp, "AnnoLegends"),
  merge_legends = TRUE
)

Example ordination plot workflow

Ordination methods can also help you to visualize if overall microbial ecosystem composition differs markedly between groups, e.g. BMI.

Here is one option as an example:

  1. Aggregate the taxa into bacterial families (for example) - use tax_agg()
  2. Transform the microbial data with the centered-log-ratio transformation - use tax_transform()
  3. Perform PCA with the clr-transformed features (equivalent to Aitchison distance PCoA) - use ord_calc()
  4. Plot the first 2 axes of this PCA ordination, colouring samples by group and adding taxon loading arrows to visualize which taxa generally differ across your samples - use ord_plot()
  5. Customise the theme of the ggplot as you like and/or add features like ellipses or annotations
# perform ordination
unconstrained_aitchison_pca <- dietswap %>%
  tax_agg("Family") %>%
  tax_transform("clr") %>%
  ord_calc()
# ord_calc will automatically infer you want a "PCA" here
# specify explicitly with method = "PCA", or you can pick another method

# create plot
pca_plot <- unconstrained_aitchison_pca %>%
  ord_plot(
    plot_taxa = 1:6, colour = "bmi_group", size = 1.5,
    tax_vec_length = 0.325,
    tax_lab_style = tax_lab_style(max_angle = 90, aspect_ratio = 1),
    auto_caption = 8
  )

# customise plot
customised_plot <- pca_plot +
  stat_ellipse(aes(linetype = bmi_group, colour = bmi_group), linewidth = 0.3) + # linewidth not size, since ggplot 3.4.0
  scale_colour_brewer(palette = "Set1") +
  theme(legend.position = "bottom") +
  coord_fixed(ratio = 1, clip = "off") # makes rotated labels align correctly

# show plot
customised_plot

PERMANOVA

You visualised your ordinated data in the plot above. Below you can see how to perform a PERMANOVA to test the significance of BMI’s association with overall microbial composition. This example uses the Family-level Aitchison distance to correspond with the plot above.

# calculate distances
aitchison_dists <- dietswap %>%
  tax_transform("identity", rank = "Family") %>%
  dist_calc("aitchison")

# the more permutations you request, the longer it takes
# but also the more stable and precise your p-values become
aitchison_perm <- aitchison_dists %>%
  dist_permanova(
    seed = 1234, # for set.seed to ensure reproducibility of random process
    n_processes = 1, n_perms = 99, # you should use at least 999!
    variables = "bmi_group"
  )
#> 2024-04-03 14:06:14.500859 - Starting PERMANOVA with 99 perms with 1 processes
#> 2024-04-03 14:06:14.55802 - Finished PERMANOVA

# view the permanova results
perm_get(aitchison_perm) %>% as.data.frame()
#>            Df SumOfSqs         R2        F Pr(>F)
#> bmi_group   2  109.170 0.04104336 4.686602   0.01
#> Residual  219 2550.700 0.95895664       NA     NA
#> Total     221 2659.869 1.00000000       NA     NA

# view the info stored about the distance calculation
info_get(aitchison_perm)
#> psExtra info:
#> tax_agg = "Family" tax_trans = "identity" dist_method = "aitchison"

Constrained partial ordination

You could visualise the effect of the (numeric/logical) variables in your permanova directly using the ord_plot function with constraints (and conditions).

perm2 <- aitchison_dists %>%
  dist_permanova(variables = c("weight", "african", "sex"), seed = 321)
#> Dropping samples with missings: 2
#> 2024-04-03 14:06:14.574135 - Starting PERMANOVA with 999 perms with 1 processes
#> 2024-04-03 14:06:16.223648 - Finished PERMANOVA

We’ll visualise the effect of nationality and bodyweight on sample composition, after first removing the effect of sex.

perm2 %>%
  ord_calc(constraints = c("weight", "african"), conditions = "female") %>%
  ord_plot(
    colour = "nationality", size = 2.5, alpha = 0.35,
    auto_caption = 7,
    constraint_vec_length = 1,
    constraint_vec_style = vec_constraint(1.5, colour = "grey15"),
    constraint_lab_style = constraint_lab_style(
      max_angle = 90, size = 3, aspect_ratio = 0.8, colour = "black"
    )
  ) +
  stat_ellipse(aes(colour = nationality), linewidth = 0.2) + # linewidth not size since ggplot 3.4.0
  scale_color_brewer(palette = "Set1") +
  coord_fixed(ratio = 0.8, clip = "off", xlim = c(-4, 4)) +
  theme(legend.position = c(0.9, 0.1), legend.background = element_rect())
#> 
#> Centering (mean) and scaling (sd) the constraints and/or conditions:
#>  weight
#>  african
#>  female
#> Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
#> 3.5.0.
#> ℹ Please use the `legend.position.inside` argument of `theme()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

Correlation Heatmaps

microViz heatmaps are powered by ComplexHeatmap and annotated with taxa prevalence and/or abundance.

# set up the data with numerical variables and filter to top taxa
psq <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  ) %>%
  tax_filter(
    tax_level = "Genus", min_prevalence = 1 / 10, min_sample_abundance = 1 / 10
  ) %>%
  tax_transform("identity", rank = "Genus")
#> Proportional min_prevalence given: 0.1 --> min 23/222 samples.

# randomly select 30 taxa from the 50 most abundant taxa (just for an example)
set.seed(123)
taxa <- sample(tax_top(psq, n = 50), size = 30)
# actually draw the heatmap
cor_heatmap(
  data = psq, taxa = taxa,
  taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
  tax_anno = taxAnnotation(
    Prev. = anno_tax_prev(undetected = 50),
    Log2 = anno_tax_box(undetected = 50, trans = "log2", zero_replace = 1)
  )
)

Citation

😇 If you find any part of microViz useful to your work, please consider citing the JOSS article:

Barnett et al., (2021). microViz: an R package for microbiome data visualization and statistics. Journal of Open Source Software, 6(63), 3201, https://doi.org/10.21105/joss.03201

Contributing

Bug reports, questions, suggestions for new features, and other contributions are all welcome. Feel free to create a GitHub Issue or write on the Discussions page.

This project is released with a Contributor Code of Conduct and by participating in this project you agree to abide by its terms.

Session info

sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.4.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Amsterdam
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.5.0   dplyr_1.1.4     phyloseq_1.46.0 microViz_0.12.1
#> [5] testthat_3.2.1  devtools_2.4.5  usethis_2.2.3  
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3      shape_1.4.6.1           rstudioapi_0.16.0      
#>   [4] jsonlite_1.8.8          magrittr_2.0.3          farver_2.1.1           
#>   [7] rmarkdown_2.26          GlobalOptions_0.1.2     fs_1.6.3               
#>  [10] zlibbioc_1.48.0         vctrs_0.6.5             multtest_2.58.0        
#>  [13] memoise_2.0.1           Cairo_1.6-2             RCurl_1.98-1.14        
#>  [16] htmltools_0.5.8         curl_5.2.1              Rhdf5lib_1.24.2        
#>  [19] rhdf5_2.46.1            htmlwidgets_1.6.4       plyr_1.8.9             
#>  [22] cachem_1.0.8            commonmark_1.9.1        igraph_2.0.3           
#>  [25] mime_0.12               lifecycle_1.0.4         iterators_1.0.14       
#>  [28] pkgconfig_2.0.3         Matrix_1.6-5            R6_2.5.1               
#>  [31] fastmap_1.1.1           clue_0.3-65             GenomeInfoDbData_1.2.11
#>  [34] shiny_1.8.1.1           digest_0.6.35           selectr_0.4-2          
#>  [37] colorspace_2.1-0        S4Vectors_0.40.2        ps_1.7.6               
#>  [40] pkgload_1.3.4           seriation_1.5.4         vegan_2.6-4            
#>  [43] labeling_0.4.3          fansi_1.0.6             httr_1.4.7             
#>  [46] mgcv_1.9-1              compiler_4.3.2          remotes_2.5.0          
#>  [49] doParallel_1.0.17       withr_3.0.0             viridis_0.6.5          
#>  [52] pkgbuild_1.4.4          highr_0.10              MASS_7.3-60.0.1        
#>  [55] sessioninfo_1.2.2       rjson_0.2.21            biomformat_1.30.0      
#>  [58] permute_0.9-7           tools_4.3.2             chromote_0.2.0         
#>  [61] ape_5.7-1               httpuv_1.6.15           glue_1.7.0             
#>  [64] nlme_3.1-164            rhdf5filters_1.14.1     promises_1.2.1         
#>  [67] gridtext_0.1.5          grid_4.3.2              Rtsne_0.17             
#>  [70] cluster_2.1.6           reshape2_1.4.4          ade4_1.7-22            
#>  [73] generics_0.1.3          gtable_0.3.4            microbiome_1.24.0      
#>  [76] ca_0.71.1               tidyr_1.3.1             websocket_1.4.1        
#>  [79] data.table_1.15.4       xml2_1.3.6              utf8_1.2.4             
#>  [82] XVector_0.42.0          BiocGenerics_0.48.1     markdown_1.12          
#>  [85] foreach_1.5.2           pillar_1.9.0            stringr_1.5.1          
#>  [88] later_1.3.2             circlize_0.4.16         splines_4.3.2          
#>  [91] ggtext_0.1.2            lattice_0.22-6          survival_3.5-8         
#>  [94] tidyselect_1.2.1        registry_0.5-1          ComplexHeatmap_2.18.0  
#>  [97] Biostrings_2.70.2       miniUI_0.1.1.1          knitr_1.45             
#> [100] gridExtra_2.3           IRanges_2.36.0          stats4_4.3.2           
#> [103] xfun_0.43               Biobase_2.62.0          matrixStats_1.2.0      
#> [106] brio_1.1.4              stringi_1.8.3           yaml_2.3.8             
#> [109] evaluate_0.23           codetools_0.2-20        tibble_3.2.1           
#> [112] cli_3.6.2               xtable_1.8-4            munsell_0.5.1          
#> [115] processx_3.8.4          Rcpp_1.0.12             GenomeInfoDb_1.38.7    
#> [118] png_0.1-8               parallel_4.3.2          ellipsis_0.3.2         
#> [121] profvis_0.3.8           urlchecker_1.0.1        bitops_1.0-7           
#> [124] viridisLite_0.4.2       scales_1.3.0            purrr_1.0.2            
#> [127] crayon_1.5.2            GetoptLong_1.0.5        rlang_1.1.3            
#> [130] TSP_1.2-4               rvest_1.0.4

microviz's People

Contributors

david-barnett avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

microviz's Issues

girafe_html

Hello - can't wait to get to work using this package. Thank you for offering it!

I'm able to initiate a new browser window (in firefox or safari) but when the new window comes up, clicking the edit menu produces no options. My meta data are available in drop down boxes, but there are no images in the view window.

When I stop running the command in Rstudio it provides an error with traceback as follows:

Listening on http://127...
object 'girafe_html' not found

  1. | execCallbacks(timeoutSecs, all, loop$id)

  2. | run_now(timeoutMs/1000, all = FALSE)

  3. | service(timeout)

  4. | serviceApp()

  5. | ..stacktracefloor..(serviceApp())

  6. | withCallingHandlers(expr, error = doCaptureStack)

  7. | domain$wrapSync(expr)

  8. | promises::with_promise_domain(createStackTracePromiseDomain(), expr)

  9. | captureStackTraces({ while (!.globals$stopped) { ..stacktracefloor..(serviceApp()) } ...

  10. | ..stacktraceoff..(captureStackTraces({ while (!.globals$stopped) { ..stacktracefloor..(serviceApp()) } ...

  11. | runApp(x)

  12. | print.shiny.appobj(x)

  13. | (function (x, ...) UseMethod("print"))(x)

Thank you for any insights you may be able to provide!

Andrew Gustin

Error in (function (edges, n = max(edges), directed = TRUE) : At type_indexededgelist.c:117 : cannot create empty graph with negative number of vertices, Invalid value

Hi,

I encountered this error when running taxatree_plots() function on my lm model. I checked around and it seems that the it is looking for numeric edge format but being provided with other format. But I'm not sure where to adjust my edge format to numeric? And I didn't encounter this error when I run this function in other datasets.

Thanks!
Leran

Error in if (!all(x >= 0)) stop("Negative distances not supported!") :

I am getting this issue am I missing something?

> clean_Bacteria %>% 
+   tax_fix() %>% 
+   #ps_arrange("Treatments") %>% 
+   tax_agg("Class") %>% 
+   cor_heatmap(vars = c("p_h", "om", "t_o_c", "c_n_ratio", "nitrate_n_ppm", "clay"),
+               cor = "spearman")


Error in if (!all(x >= 0)) stop("Negative distances not supported!") : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In stats::cor(x = meta_mat, y = otu_mat, use = cor_use, method = cor) :
  the standard deviation is zero

dist_calc() Error

Hi,

I was trying to run below command:

ps0.UC.CD.HHC.fix %>% 
  tax_transform(trans = "clr", rank = "Genus") %>% 
  dist_calc("wunifrac")

And I got this error message:

unifrac distances require un-aggregated taxa and a phylogenetic tree.
 Show Traceback
Error in access(physeq, "phy_tree", errorIfNULL) : phy_tree slot is empty.

I'm sure I have a tree in the phyloseq object:

ps0.UC.CD.HHC.fix
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 5284 taxa and 518 samples ]
sample_data() Sample Data:       [ 518 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 5284 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 5284 tips and 5282 internal nodes ]

Could you let me know how to fix this issue?

Thanks so much!
Leran

ps_dedupe removes all rows when using multiple variables

Hi David,

Thank you for sharing this package, it has a lot of really helpful functions for working with phyloseq objects! It's made my life a lot easier in the past couple of weeks.

I've come across an issue while using ps_dedupe to prune my dataset. When I dedup by Lab ID (single variable), it seems to work as expected, leaving me with one entry per Lab ID. I then try to dedup by multiple variables (Individual ID and date, i.e. vars = c("Individual.ID", "year", "month", "day")), and it seems to be removing ALL rows from groups with >1 row, rather than leaving me just one row per Individual/date. I'm not sure if this is user error or something in the code, but I wondered if anyone else has had the same issue in the past.

#dedup LabID
ps.sp <- ps_dedupe(ps.sp, "LabID", method = "readcount")
nsamples(ps.sp)
test1 <- sample_data(ps.sp)

#dedup by Ind and date
ps.sp2 <- ps_dedupe(ps.sp, vars = c("Individual.ID", "year", "month", "day"), method = "readcount")
nsamples(ps.sp2)
test2 <- sample_data(ps.sp2)

I can send an .Rdata file with ps.sp separately, in case reproducing the issue will help. Any thoughts on what might be happening would be very appreciated!

Thank you,
Amy

Hierarchical clustering in relative abundances plots

Hello David,

In the correlation heatmaps from the microViz package, hierarchical clustering using euclidean distances is shown on top/next the heatmap.
Is it also possible to include the hierarchical clustering in the relative abundances plots? And would it be possible to get the clusters with the use of the Bray-Curtis dissimilarity (which is already used for the order of the relative abundances plot)?

Kind regards,
Brigitte

Statement of need

Hi David,

Upon reading through your statement of need, I had some questions. I went by each statement to examine the novelty added by your tool, and maybe you can comment on these:

  • Statement of need: microViz extends the set of distance measures (including e.g. Generalized UniFrac and Aitchison) of phyloseq and microbiome
    • Question: phyloseq has 44 supported distance methods 
including UniFrac, what is provided that is not already included, and what additional uses do those distance measures provide?
  • Statement of need: microViz makes use of further ordination calculation and visualization methods more accessible, such as (distance-based) redundancy analysis (RDA) and canonical correspondence analysis  (CCA)
    • Question: the microbiome R package has options for RDA and CCA
, what additional functionality does microViz provide?
  • Statement of need: microViz generates bi-plots and tri-plots with the ggplot2 R package
    • Question: the other standard microbiome packages generate bi-plots, what additional functionality does microViz provide?

Fast Unifrac

Hey,

(great package and awesome tutorial!)

It would be great to have a fast version of calculating unifrac distances...

How to understand the "grouping" parameter in taxatree_stats_p_adjust() function?

Hi,

I have a questions regarding to the "grouping" parameter in taxatree_stats_p_adjust() function. I can find there is explanation in tutorial page:

Define how to group the p values for adjustment with the grouping argument. The default is to adjust the p values in groups at each taxonomic rank, but you could also adjust per "model" / "taxon" or per "term". Or even group by a combination of rank and term with c("rank", "term")

But still feel confused. Does "groups at each taxonomic rank" mean it adjusts p values based on the number of unique taxa in phylum level (for example)? And does "group by taxon" mean that the calculation is based on the number of entire taxonomy?

And is there an instruction on how to choose grouping parameter?

Thanks so much!
Leran

Sorting taxa by abundance per sample in comp_barplot

Hi!

First of all, thank you for creating and sharing this great package, it makes plot generation a lot more easy and convenient.

I have a question regarding the comp_barplot function, more specifically the tax_order argument. Is there any way to sort the lets say 10 taxa by their abundance in each sample, by example, the most abundant taxa would be at the bottom, second most abundant taxa above that and so forth, with optionally the "Other" taxa being at the top, regardless of its abundance.
Unfortunately i could not achieve this with the suggested sorting methods.

Sorry if i have missed or misunderstood something in the documentation.

heat_grid default lwd of 0.5 not respected on png device

?png indicates that lwd parameter of less than 1 will be treated as 1, and that 1 pixel is also a minimum width.
Large heatmaps (many cells) look better with lwd < 1, so this is an annoying issue.

png device is default for R markdown html output, on Windows at least, it seems.
The solution is to use another device, such as CairoPNG or ragg_png.

It seems the default device cannot be changed for rmd notebook inline chunk output.

I should note solutions/workarounds somewhere in the comp_heatmap documentation.

  • knitr::opts_chunk$set(dev = "ragg_png") is a good fix for rmd documents.
  • The default device for rstudio can also be changed in global options

Cannot install from github, gives timeout error.

Following the installation instructions I typed the command:

> devtools::install_github("david-barnett/[email protected]")

Which gives the output:

Error: Failed to install 'unknown package' from GitHub:
  Timeout was reached: [api.github.com] Resolving timed out after 10000 milliseconds

I already tried with libcurl and wininet methods, but doesnt seem to work.

could not find function "sampleAnnotation" in v0.90

Hello,
Thanks for writing such a fantastic package. Gone are the days of manually extracting vegan objects and forcing them into ggplot!!

I'm following the heatmaps vignette and having this issue in v0.90. Is this a bug or user error (always more likely)?

Code:

psf %>%

glom to Order

tax_glom("Order") %>%
tax_transform("compositional") %>%
comp_heatmap(colors = heat_palette(palette = "Blue-Red"),
sample_anno = sampleAnnotation(
State1 = anno_sample("metformin"),
col = list(State1 = cols), border = FALSE)
)

sessionInfo:

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] RColorBrewer_1.1-2 ggvegan_0.1-0 ggpubr_0.4.0
[4] rstatix_0.7.0 dada2_1.20.0 Rcpp_1.0.8
[7] UpSetR_1.4.0 ComplexHeatmap_2.10.0 shiny_1.7.1
[10] microViz_0.9.0 phyloseq_1.38.0 forcats_0.5.1
[13] stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4
[16] readr_2.1.2 tidyr_1.2.0 tibble_3.1.6
[19] ggplot2_3.3.5 tidyverse_1.3.1 vegan_2.5-7
[22] lattice_0.20-45 permute_0.9-7

loaded via a namespace (and not attached):
[1] utf8_1.2.2 tidyselect_1.1.1
[3] htmlwidgets_1.5.4 TSP_1.1-11
[5] BiocParallel_1.26.2 Rtsne_0.15
[7] devtools_2.4.3 munsell_0.5.0
[9] codetools_0.2-18 DT_0.20
[11] withr_2.4.3 colorspace_2.0-2
[13] Biobase_2.54.0 rstudioapi_0.13
[15] stats4_4.1.2 ggsignif_0.6.3
[17] MatrixGenerics_1.4.3 labeling_0.4.2
[19] GenomeInfoDbData_1.2.7 hwriter_1.3.2
[21] farver_2.1.0 rhdf5_2.38.0
[23] rprojroot_2.0.2 vctrs_0.3.8
[25] generics_0.1.2 R6_2.5.1
[27] doParallel_1.0.17 GenomeInfoDb_1.30.1
[29] clue_0.3-60 seriation_1.3.1
[31] isoband_0.2.5 bitops_1.0-7
[33] rhdf5filters_1.6.0 microbiome_1.16.0
[35] cachem_1.0.6 DelayedArray_0.18.0
[37] assertthat_0.2.1 promises_1.2.0.1
[39] scales_1.1.1 gtable_0.3.0
[41] Cairo_1.5-14 processx_3.5.2
[43] rlang_1.0.1 GlobalOptions_0.1.2
[45] splines_4.1.2 broom_0.7.12
[47] yaml_2.2.2 reshape2_1.4.4
[49] abind_1.4-5 modelr_0.1.8
[51] crosstalk_1.2.0 backports_1.4.1
[53] httpuv_1.6.5 tools_4.1.2
[55] usethis_2.1.5 ellipsis_0.3.2
[57] jquerylib_0.1.4 biomformat_1.22.0
[59] BiocGenerics_0.40.0 sessioninfo_1.2.2
[61] plyr_1.8.6 zlibbioc_1.40.0
[63] RCurl_1.98-1.6 ps_1.6.0
[65] prettyunits_1.1.1 GetoptLong_1.0.5
[67] S4Vectors_0.32.3 SummarizedExperiment_1.22.0
[69] haven_2.4.3 ggrepel_0.9.1
[71] cluster_2.1.2 fs_1.5.2
[73] magrittr_2.0.2 data.table_1.14.2
[75] circlize_0.4.14 reprex_2.0.1
[77] matrixStats_0.61.0 pkgload_1.2.4
[79] hms_1.1.1 mime_0.12
[81] xtable_1.8-4 jpeg_0.1-9
[83] readxl_1.3.1 IRanges_2.28.0
[85] gridExtra_2.3 shape_1.4.6
[87] testthat_3.1.2 compiler_4.1.2
[89] crayon_1.4.2 htmltools_0.5.2
[91] mgcv_1.8-38 later_1.3.0
[93] tzdb_0.2.0 RcppParallel_5.1.5
[95] lubridate_1.8.0 DBI_1.1.2
[97] dbplyr_2.1.1 MASS_7.3-55
[99] ShortRead_1.50.0 Matrix_1.4-0
[101] ade4_1.7-18 car_3.0-12
[103] brio_1.1.3 cli_3.1.1
[105] parallel_4.1.2 igraph_1.2.11
[107] GenomicRanges_1.44.0 pkgconfig_2.0.3
[109] GenomicAlignments_1.28.0 registry_0.5-1
[111] xml2_1.3.3 foreach_1.5.2
[113] bslib_0.3.1 multtest_2.50.0
[115] XVector_0.34.0 rvest_1.0.2
[117] callr_3.7.0 digest_0.6.29
[119] Biostrings_2.62.0 cellranger_1.1.0
[121] curl_4.3.2 Rsamtools_2.8.0
[123] rjson_0.2.21 lifecycle_1.0.1
[125] nlme_3.1-155 jsonlite_1.7.3
[127] Rhdf5lib_1.16.0 carData_3.0-5
[129] desc_1.4.0 fansi_1.0.2
[131] pillar_1.7.0 fastmap_1.1.0
[133] httr_1.4.2 pkgbuild_1.3.1
[135] survival_3.2-13 glue_1.6.1
[137] remotes_2.4.2 png_0.1-7
[139] iterators_1.0.14 stringi_1.7.6
[141] sass_0.4.0 latticeExtra_0.6-29
[143] memoise_2.0.1 ape_5.6-1

comp_barplot groups in "other" category

  • works when bar outlines drawn only
  • not useful in dense iris plots?
  • another way to indicate diversity?
    • internal diversity heatmap ring could work for iris plot?
    • and just counting observed taxa wouldn't need helper funs

ord_get() extract Eigenvalues for all the samples

Hi,

I'm trying to extract eigenvalues for every sample with ord_get() function, so that I can use PC1 values of each sample to build a lm(diagnosis ~ PC1) model to get a p value for the PCA plot. But when I run ord_get() on ps_extra object, I only get:

Call: rda(formula = OTU ~ 1, data = data)

              Inertia Rank
Total           309.8     
Unconstrained   309.8  116
Inertia is variance 

Eigenvalues for unconstrained axes:
  PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8 
50.56 26.64 20.40 14.73 11.55 10.95  9.00  8.43 
**(Showing 8 of 116 unconstrained eigenvalues)**

So I wondered how can I get PC values of other 115 samples? Or do you have any suggestion on how to get p value for PCA plot?

Thanks!
Leran

labeling issue with comp_barplot() and facet_grid()

I am having trouble getting proper labeling of my x axis when making a bar plot. I am faceting by timepoint, and since I have some subjects who do not have samples for every timepoint, I want there to be a blank space if there is a missing sample, so that you can still read across the faceted plots and easily follow each subject over time.

It seems to me like facet_grid() is the only function that will get me what I need in terms of having blank spaces for empty samples. (facet_wrap(), facet_nested(), and the built in "facet_by=" in comp_barplot() did not work).

However, if I use facet_grid() and specify aes(x=Subject), the subject numbers are not appearing on the axis. If I comment out the aes() and instead use "label=" in comp_barplot(), this results in a hugely messy graph because it repeats each subject on the axis.

Not sure I explained that well. Here is my code, and an example of the plot using aes() specifications and using "label="
specifications.

genusplot <- phygenus_plot %>%
comp_barplot(
tax_level = "Genus",
n_taxa = 24,
label = "Subject",
sample_order = samples
) +
geom_bar(stat = "identity")+
coord_flip() +
facet_grid(~Timepoint, scales = "free") +
#aes(x = Subject) +
guides(fill = guide_legend(reverse = TRUE, keywidth = 0.4, keyheight = 0.4, nrow=5)) +
labs(y = "Relative Abundance", fill = "Genus") +
theme_classic() +
theme(plot.title = element_text(hjust=0.5, size = 15),
axis.text.x = element_text(size = 9, hjust = 0.5, colour=1),
axis.text.y = element_text(size = 9, colour=1),
axis.title.x = element_text(size=13),
axis.title.y = element_text(size=13),
legend.text = element_text(size = 8),
legend.key.size = unit(0.4, "cm"),
legend.key.width = unit(0.4,"cm"),
legend.position = "bottom",
strip.text.x = element_text(size=10),
strip.text.y = element_text(size=10),
panel.border = element_rect(colour = "black", fill=NA),
axis.text = element_text(colour = 1, size = 9)
)
genusplot

image
image

Heatmap error: heat_palette() palette argument must be either a: vector of colours, or....

Hi,

I'm trying the example heatmap code on my study,

cols <- distinct_palette(n = 3, add = NA)
names(cols) <- unique(samdat_tbl(psq)$DiseaseState)

psq %>% 
  tax_transform("compositional", rank = "Class") %>% 
  comp_heatmap(
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm"))
    ),
    sample_anno = sampleAnnotation(
      State1 = anno_sample("DiseaseState"),
      col = list(State1 = cols), border = FALSE,
      State2 = anno_sample_cat("DiseaseState", col = cols)
    )
  ) 

But I kept getting error says:

Error: 
heat_palette() palette argument must be either a:
- vector of colours, or
- palette name from colorspace::hcl_palettes(type = 'diverging')
- palette name from colorspace::divergingx_palettes()
- palette name from colorspace::hcl_palettes(type = 'sequential')

After I added
colors = heat_palette(sym = TRUE) after the "sampleAnnotation()" part in comp_heatmap(), the error is gone.

Is this line necessary here or did I misunderstood anything?

Thanks!
Leran

ord_explore shape errors

part of JOSS review issue #25

Shape gives cryptic error (from ggiraph) when set to some variables:

  • ggiraph interactive plots cannot handle composite shapes
  • solution is to manually specify palette without composite shapes
    • to include shapes of only one type this is limited to 5 shapes: circle filled, triangle filled, square filled, diamond filled, triangle down filled
    • mapping NA to: circle open (instead of blank)
    • identify variables with only max 5 unique values (and possibly NA)

Installation from official source

Congratulations on this cool package. The only thing I could wish for is to have it installable from an official repository.
Is this planned?

I managed to build it but I want to make sure that others can also build it easily.

Alternatively can I somehow the complied binary share with my collegues?

easy way to generate fixed palette for taxa

  • useful for comp_barplot (and ord_plot_iris)
  • currently too tricky for a user to manually specify a palette, but being able to generate a template vector for further modification should help a lot
  • should be easy enough with tax_top and distinct_palette
  • could be called tax_palette?

Errors making correlation heatmaps

Hi David,

I want to make correlation heatplots for my thesis about the sample data (age, gender, BMI, ect) and microbiome data.
First i made the data numeric (object ps.minzeros.compositional.num).
I tried to make the correlation heatmap according to your tutorial, but I got several errors.

This code:
cor_heatmap(
data = ps.minzeros.compositional.num,
taxa = tax_top(ps.minzeros.compositional.num, 15, by = max, rank = "Genus"),
vars=c('female', 'older', 'highWeight', 'obesitas', 'highRatio', 'highBodyFat', 'highTG', 'highScore'),
cor='spearman')

gives this error:
Error in as(x, "matrix")[i, j, drop = FALSE] : subscript out of bounds
In addition: Warning message:
In ps_counts(data, warn = TRUE) : otu_table of counts is NOT available!
Available otu_table contains non-zero values that are less than 1

And another attempt:
ps.minzeros.compositional.num %>%
tax_agg("Genus") %>%
cor_heatmap(vars = c('female', 'older', 'highWeight', 'obesitas', 'highRatio', 'highBodyFat', 'highTG', 'highScore'))

gives this error:
Registered S3 method overwritten by 'seriation':
method from
reorder.hclust vegan
Error in if (!all(x >= 0)) stop("Negative distances not supported!") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In stats::cor(x = meta_mat, y = otu_mat, use = cor_use, method = cor) :
the standard deviation is zero

Can you help me? I cannot share my data because that is confidental, so I hope you have enough to help me.

Kind regards, Brigitte

p values in correlation heatmaps

Hello David,

I want to ask another question; how can I show the p-values of the spearman correlation in the correlation heatmap?

Mixed models using `taxatree_models()`

Hi @david-barnett,

Thanks again for your effort in developping the package.
How would you recommend to run mixed models thourgh taxatree_models?

Following your tutorial, I am not able specify type = lmer

lm_models <- phylo %>% 
  tax_fix() %>% 
  tax_prepend_ranks() %>% 
  # it makes sense to perform the compositional transformation BEFORE filtering
  tax_transform("compositional", rank = "Genus", keep_counts = TRUE) %>% 
  tax_filter(min_prevalence = 0.1, undetected = 0, use_counts = TRUE) %>% 
  tax_transform(
    trans = "log2", chain = TRUE, zero_replace = "halfmin"
  ) %>% 
  taxatree_models(
    type = lmer, 
    ranks = NULL, # uses every rank available except the first
    # variables = c("UC", "female", "antibiotics", "steroids", "age_scaled")
    formula = ~ UC    + (1| female)
    
  )
Proportional min_prevalence given: 0.1 --> min 7/67 samples.
2022-06-02 10:33:27 - modelling at rank: Phylum
Error in `[[<-`(`*tmp*`, "call", value = f) : 
  [[<- defined for objects of type "S4" only for subclasses of environment

Using the R random effect syntax and type = lm does not generate any issue.

lm_models <- phylo %>% 
  tax_fix() %>% 
  tax_prepend_ranks() %>% 
  # it makes sense to perform the compositional transformation BEFORE filtering
  tax_transform("compositional", rank = "Genus", keep_counts = TRUE) %>% 
  tax_filter(min_prevalence = 0.1, undetected = 0, use_counts = TRUE) %>% 
  tax_transform(
    trans = "log2", chain = TRUE, zero_replace = "halfmin"
  ) %>% 
  taxatree_models(
    type = lm, 
    ranks = NULL, # uses every rank available except the first
    # variables = c("UC", "female", "antibiotics", "steroids", "age_scaled")
    formula = ~ UC    + (1| female)
  )

However, the random effect is not evaluated - which is not surprising.

lm_models %>% 
  taxatree_models2stats() %>% 
  .["taxatree_stats"]


$taxatree_stats
# A tibble: 268 × 7
   term           taxon            rank  estimate std.error statistic  p.value
   <fct>          <chr>            <fct>    <dbl>     <dbl>     <dbl>    <dbl>
 1 UC             P: Firmicutes    Phyl…   -47.4     10.5       -4.50  2.84e-5
 2 1 | femaleTRUE P: Firmicutes    Phyl…    NA       NA         NA    NA      
 3 UC             P: Bacteroidetes Phyl…   -16.7      2.92      -5.72  2.97e-7
 4 1 | femaleTRUE P: Bacteroidetes Phyl…    NA       NA         NA    NA      

Thanks.

Feature request - order taxa in comp_barplot alphabetically

Hello again -
The default behavior of a ggplot2 barplot on melted OTU table is to order the taxa in the legend alphabetically, which (IMO) makes this much easier to read. Can this be added as an option in the comp_barplot, or is it possible to add as a ggplot2 layer? Thanks!

Example of ordered plot in ggplot2 with alphabetically ordered taxa in the legend-
image

Example of default comp_barplot -
image

comp_barplot() option not to scale to 1

Hi David,

Thanks for the awesome package. I was wondering if we could skip the scaling to barplot to 1 using comp_barplot(). The rationale behind would be to be able to plot the proportion of selected ASV/taxa and plot their actual proportions.

I could not find the options from the actual arguments.

Thanks.

tax_info convenience function?

Plan:

tax_info(ps, tax_names = NA, tax_select = NA, tibble = NA, ranks = FALSE, undetected = 0)

provide either

  • tax_names, 1 or vector
  • tax_select, 1 or vector for tax_select regex interface (default of 1 typo probably fine)

returns:

  • (some) ranks
  • prev (using prev_calc, respecting undetected arg)
  • min, mean, max etc.?

return format (for tibble = NA):

  • if one taxon, cat (unless tibble = TRUE)
  • if multiple, return tbl, unless tibble = FALSE

RShiny app problems

When testing out the RShiny app functionality with the corncob dataset, few of the Selections worked with the app or didn't change the plot. For example, for 'shape', the race option (in addition to many others) generated the error: 'Error: ids don't have the same length than str (most often, it occurs because of clipping)'. Also, the lasso it doesn't always allow accurate selection - the data points I encircle are not the only points selected. Not sure why some of the time it works and other times it doesn't.

Beta-binomial regression example error: Error in (function (formula, phi.formula, data, link = "logit", phi.link = "logit", : Model could not be optimized! Try changing initializations or simplifying your model.

Hi,

When I try this example code:

ps0 %>% 
  tax_fix() %>% 
  tax_prepend_ranks() %>% 
  tax_filter(min_prevalence = 0) %>% 
  taxatree_models(
    type = corncob::bbdml, 
    ranks = c("Phylum", "Class", "Order", "Family"),
    variables = c("Treatment","Sample_Site","Time")
  
  )

I got this error message:

Error in (function (formula, phi.formula, data, link = "logit", phi.link = "logit",  : 
  Model could not be optimized! Try changing initializations or simplifying your model.

I'm not sure what does this error mean?

Thanks!
Leran

tax_top function

like top_taxa but with options for how "top" is defined (e.g. prevalence/total/max/median)

  • useful in comp_barplot?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.