Code Monkey home page Code Monkey logo

miaviz's Introduction

miaViz

R-CMD-check-Bioc-devel R-CMD-check-bioc Bioc-release Codecov test coverage

Microbiome Analysis Plotting and Visualization

The scope of this package is the plotting and visualization of microbiome data. The main classes for interfacing is the TreeSummarizedExperiment class.

Using the package

Online tutorials and examples are available at:

Contribution

Feel free to contribute by forking and opening a pull request. Please make sure that required data wrangling should be designed as reusable as possible and potentially find a better home in the mia package.

Additionally, please make sure that working examples are included and that vignetted make use of added functions in either miaViz or the TreeSummarizedExperiment package.

Technical aspects

Let's use a git flow kind of approach. Development version should be done against the master branch and then merged to release for release. (https://guides.github.com/introduction/flow/)

Installation

Bioc-release

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("miaViz")

Bioc-devel

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# The following initializes usage of Bioc devel
BiocManager::install(version='devel')

BiocManager::install("miaViz")

Code of conduct

Please note that the miaViz project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

miaviz's People

Contributors

antagomir avatar bananacancer avatar chouaibb avatar daenarys8 avatar felixernst avatar himmil avatar jwokaty avatar microsud avatar nturaga avatar thpralas avatar tuomasborman avatar vivian-ginika avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

miaviz's Issues

neatsort & neatmap

Convert neatsort.R and neat.R from phyloseq/microbiome to miaverse.

These are related to visualization, miaViz is a good starting point. We can consider afterwards if they fit better mia or another package. Or if we keep it in miaViz.

The idea is explained in Rajaram & Oono (2010) and I have found these quite handy in pratice. The sorted matrices can then be visualized with ordinary heatmap techniques (with no additional row/col sorting!), and an example on this should be added as well.

Add citation to the original work, and check if there are recent works that have applied or customized this for microbiome data.

Perhaps good to cross check first that there are no SE wrappers for this available already.

Custom sorting to plotAbundanceDensity

The plotAbundanceDensity function works fine but I could not find a way to enforce a custom ordering for the taxa in this plot.

Example plot:

library(miaViz)
data(GlobalPatterns)
tse <- GlobalPatterns
plotAbundanceDensity(tse, abund_values = "relabundance")

Task: change the order of the taxa in a custom way.

Proposed solution: add a new function argument, e.g. orderTaxaBy and this could use custom vector matching with rownames(tse) or a named rowData column.

plotAbundanceDensity options

Can we consider case where plotAbundanceDensitycould be used to visualize a single feature (e.g. Prevotella), as a function of external covariate, such as BMI group (discrete) or BMI (continuous)? Or perhaps this requires its own function, or is too simple to implement separately.

As in

library(miaTime)
data(hitchip1006, package="miaTime")
tse <- hitchip1006
df <- as.data.frame(colData(tse))
df$Prevotella <- assay(tse)["Prevotella oralis et rel.",]
ggplot(df, aes(x = bmi_group, y = Prevotella)) + geom_jitter(width=0.2)

Boxplots significance

For examples with boxplots representing significance, the ggsnignif pkg is now supporting multiple testing correction:
const-ae/ggsignif#28 (comment)

-> Would be good to integrate to miaViz so that users can easily draw some standard comparisons that are readily corrected. Like alpha diversity or taxon abundance comparisons between groups for instance.

Add sample names to Abundance Plot

HI! I would like to ask if it is possible to add sample names or sample group names to Abundance Plots generated by PlotAbundance() function. I am using order_sample_by option to order the samples, but I do not see any option to indicate the sample names o sample group names in the x axis of the plot. Must this be done also using ggplot options?

Thanks in advance,

Asier

New function: plot_density ?

Can we add a function for quick visualization of adundance profiles as a density plot?

The microbiome pkg equivalent.

library(microbiome)
data(dietswap)
pseq.rel <- transform(dietswap, 'compositional')
library(ggplot2)
plot_density(pseq.rel, variable='Dialister') + scale_x_log10()

And after the basic functionality, it would be possible to add coloring schemes like in this example:

data(atlas1006)
pseq <- subset_samples(atlas1006, DNA_extraction_method == 'r')
pseq <- transform(pseq, 'compositional')
hotplot(pseq, 'Dialister', tipping.point=1.1, log10=TRUE)

sysreqs error

Error in GHA


Run Rscript -e "remotes::install_github('r-hub/sysreqs')"

   File /usr/local/lib/R/etc//Renviron.site contains invalid line(s)
      <html>
      <head><title>301 Moved Permanently</title></head>
      <body>
      <center><h1>301 Moved Permanently</h1></center>
      <hr><center>CloudFront</center>
      </body>
      </html>
   They were ignored

Using github PAT from envvar GITHUB_PAT
Skipping install of 'sysreqs' from a github remote, the SHA1 (f068afa9) has not changed since last install.
  Use `force = TRUE` to force installation
/bin/bash: eval: line 12: syntax error near unexpected token `('
/bin/bash: eval: line 12: `   File /usr/local/lib/R/etc//Renviron.site contains invalid line(s)      <html>      <head><title>301 Moved Permanently</title></head>      <body>      <center><h1>301 Moved Permanently</h1></center>      <hr><center>CloudFront</center>      </body>      </html>   They were ignoredexport DEBIAN_FRONTEND=noninteractive; apt-get -y update && apt-get install -y pandoc make libxml2-dev libicu-dev libgmp-dev libglpk-dev'
Error: Process completed with exit code 2.

plotAbundance returns NULL with order_sample_by

The following returns NULL. The plotting does work without the order_sample_by argument.

library(mia)
library(miaViz)
data(GlobalPatterns)
tse <- GlobalPatterns
tse <- relAbundanceCounts(tse)
p <- plotAbundance(tse,
       abund_values="relabundance",
       rank = "Phylum",
       order_sample_by = "SampleType") 
print(p)

Explained Variance

In several ordination (or dimensionality reduction) methods, explained variance is needed to be calculated and reported.
TreeSE object stores reduced dimensionality data, along with the data related to the ordination method itself (e.g. eigen values for PCoA or PCA).
Could a method/function for calculating the explained variance for k dimensions be handy among the miaViz package?

Biclustering

Biclustering is a relatively common modeling task in omics data exploration.

Add biclustering as a visualization technique to miaverse. Note that it has at least two uses:

  1. taxa x samples matrix biclustering to identify co-occurring elements on the abundance matrix; might benefit from data transformations but this depends on the specific biclustering algorithm
  • Model-based biclustering for overdispersed count data with application in microbial ecology by Aubert et al. (2021)
  1. correlation matrix biclustering, for instance taxa x metabolites correlations biclustered to reveal co-occurring elements

At the minimum, add some concrete examples in OMA. Can be its own subsection, close to heatmaps.

If it seems feasible, add support to one of the biclustering packages to miaViz. In practice, a wrapper that allows easy calculation of the biclusters, and visualization on heatmap. We may need to consider later if some of the functionality would better go to other packages (mia?) but this is easier once this has been implemented.

This was has been tested, and is a good start to test and to get idea of biclustering:

This is a recent one, and should be specifically tailored to microbiome data. Worth testing as well, and might be nice as a default method if it works well:

It might be worth looking, if these packages have any added value:

If there are clear differences, let us consider carefully which one/s to choose for wrappers and examples.

In principle, there could additionally possibility to obtain simulated biclusters as in e.g. FABIA as was done in e.g. Röttjers & Faust (2020). These can be used to validate methods. Should be moved to its own issue, if it seems worth implementing. Not urgent.

In the wrapper function manpages or in OMA, cite at least these publications related to microbiome-related biclustering tasks:

Paired boxplot (e.g. diversity, taxa abundances..)

Add a method that facilitates handy visualization of paired samples on a box/jitterplot, pairs indicated by (coloured lines) as shown in the figure.

Tentative gists for

Current implementation depends on scater and needs to use colData also for assay data. If this can be simplified for a user, such method might have good use in exploratory data analysis.

Figure:
tmp

Abundance heatmaps

Can we add function to facilitate the plotting of abundance heatmaps. This is a need we often encounter in studies.

Starting point could be for instance existing, similar, plotting functionality from microbiome pkg (see below for an example). Or another source.

The heatmap would visualize abundance matrix (assay data) as a heatmap.

Some data scalings/transformations, and sample/taxon sorting could be provided (as is already done in plotAbundance).

Once the basic function is implemented, we can consider adding more advanced sorting schemes (neatsort, neatmap) as these are useful in the context of heatmap visualization (see Rajaram & Oono 2010).

An example with microbiome package:

library(microbiome)
data(atlas1006)
plot_composition(transform(transform(core(atlas1006, detection = 0.1, prevalence = 0.01), "clr"), "Z", target = "sample"), sample.sort = "Prevotella melaninogenica et rel.", otu.sort = "abundance", verbose = TRUE, plot.type = "heatmap") + coord_flip()

argument names

Argument order_rank_by could be better replace with order_feature_by following other similar changes across the mia framework

New function: plot_spread ?

I suggest to add a new function that visualizes distribution of abundances per taxonomic group as a dotplot. We have used this in some publications and found it useful. Usefulness depends on the number of taxa, samples etc, as for any visualization.

Similar functionality has been implemented in microbiome and works as follows:

library(microbiome)
data(dietswap)
spreadplot(transform(dietswap, "compositional"))

plotRDA ellipse's confidence level

Documentation does not say about the confidence level of ellipse. (plotRDA uses stat_ellipse with default level which is 0.95) Add information to documentation. Consider adding parameter to specify level.

Visualizing merged values

library(mia)
library(miaViz)
# Computing relative abundance
tse <- microbiomeDataSets::dietswap


# Merge columns
tse2 <- mergeCols(tse, colData(tse)$group)

# Calculate relabundances for the merged data;
# first we need to extract the treeSE object from tse2!
tse3 <- assay(tse2, "relabundance")
tse4 <- transformSamples(tse3, abund_values="sum", method="relabundance")

# Try to plot
plotAbundance(tse4, abund_values="relabundance")

Error: 'rank' must be a value from 'taxonomyRanks()'

taxonomyRanks(tse4)
character(0)

Color selection in plotAbundanceDensity()

I am trying to obtain a Abundance Density plot using the plotAbundanceDensity() function. I want to colour the points according to a variable that is a column from colData. There is no problem when doing this with the colour_by function, but I cannot select which specific colors I want for each of the groups. I have seen that there is an additional argument point_colour but this seems to work only when colour_by function is not used. Is there a way to select the colors of each the groups in the Abundance Density plot?. I am specifically pointing to the plotAbundanceDensity() function but this can be extended to any other function that allows to use the argument colour_by.

Thanks in advance,

Asier

plotAbundanceDensity colors

The plotAbundanceDensity function currently allows coloring the points by a colData variable:

library(miaViz)
library(miaTime)
data("hitchip1006", package="miaTime")
tse <- hitchip1006
plotAbundanceDensity(tse)

Sometimes we need to map colors of the features (not points) to other external figures that use the same color coding for those features. For this it would be useful if we could also have a custom color for each visualized features (i.e. the rows in the above jitter plot). Something like:

plotAbundanceDensity(tse, assay.type = "relabundance", color_features_by=brewer.pal(n=nrow(tse)))

Sometimes it would be useful to color

plotAssay

Not urgent.

If we currently want to visualize abundance of certain taxa in certain groups for example (e.g. box plot - x axis: group - y axis: taxa), we have to first

  1. melt assay
  2. ggplot

We could have plotAssay function that streamlines this.

print x-axis label in plotAbundance

Hi,
This might be a very basic question. How can I print x-axis label in plotAbundance?

P_g <- plotAbundance(tse, abund_values="relabundance", rank = "Genus", ncol = 1, layout = "bar", order_rank_by="abund")

Plot by group

The plotSeries function can be used to plot and compare several taxa over e.g. time.

But sometimes we need to plot and compare several individuals / subjects over time w.r.t. one taxa, or another variable like alpha diversity.

Suggestion to implement a function that allows handy comparison of multiple time series from different groups.

Ordination plots with trajectories: add examples

Ordination plots can be complemented with trajectories over points when time series are available for some or all subjects, other entities. It is not yet clear whether this would deserve its own function but illustrative examples for basic use cases can be useful.

Examples to implement:

  1. Paired data with multiple time points:
  • plot data with some ordination method (PCA, MDS, or other)
  • overlay trajectory for (a) all and (b) one or more selected subjects (based on subject ID/s, based on top10% changes, based on group indicated in colData)
  • indicate the strength of change by color
  • should be shown for a general case where the subjects may have differing number of time points, including singletons (all samples shown on ordination for a default example)
  1. Paired data with two time points. This is already included in (1) but is a simpler case and could potentially come with some extra capabilities that are feasible for general time series.

The most suitable open data set for this could be discussed.

Default ranks

In plotAbundance() the default rank is set to rank = taxonomyRanks(x)[1].

This can be confusing since the data is aggregated to higher than original level without user intervention.

Perhaps better to have no rank specified by default (using the highest resolution data available): rank=NULL?

Strange example

The first example on the plotAbundance reference page asks to plot "counts" assay but the figure is relative abundances. This makes the example confusing.

Explain the default behavior, if the relabundance plotting is automated as a default; or (even better?) update the figure so that it plots by default what is asked (here, counts)

plotAbundanceDensity not working when counts assay is missing

Hi! I get an error when I try to use plotAbundanceDensity to plot relabundance density of a se that doesn’t have the counts assay.

> assayNames(se)
[1] "relabund" "nucdiv"   "clr"

> plotAbundanceDensity(se,
                     layout = "jitter",
                     abund_values = "relabund",
                     n = 100) + 
    scale_x_log10(label = scales::percent) 
    scale_x_log10(label = scales::percent)

Error in assay(x, abund_values) : 
'assay(<SummarizedExperiment>, i="character", ...)' invalid subscript 'i'
'counts' not in names(assays(<SummarizedExperiment>))

Is this a bug? Thanks in advance!

Update the pkg in Bioc devel

For the named package maintainer:

Time to make sure that the latest updates to Bioc packages are pushed to Bioc devel branch (note the approaching DLs):
https://www.bioconductor.org/developers/release-schedule/

And checked for builds
https://bioconductor.org/checkResults/devel/bioc-LATEST/miaViz/lconway-buildsrc.html

See "Sync an existing GitHub repository with Bioconductor" in
https://contributions.bioconductor.org/git-version-control.html#sync-existing-repositories

Fix GHA config

Lets kick out Windows and macOS from the GitHub Action for now. @TuomasBorman: Are you interested?

See config in mia for example.

plotAbundance problem

The function plotAbundance is used for handy visualization of community composition.

library(mia)
data(GlobalPatterns)
tse <- GlobalPatterns[1:1000,]
plotAbundance(tse, abund_values="counts", rank = "Phylum")

It only works with the defined ranks, however. This leads to error:

rowData(tse)$Strain <- sample(letters[1:5], nrow(tse), replace=TRUE)
plotAbundance(tse, abund_values="counts", rank = "Strain")

-> Error: 'rank' must be a value from 'taxonomyRanks()'

This is restrictive because it is often useful to visualize also other types of groupings.

Just not providing the rank argument (relying on the default) did not seem to help either.

Is this intended, or is there a way to circumvent this issue?

Related to microbiome/mia#219

Include more tests

Currently only minimal set of tests is included. All the internals need to be included in test to make this package airtight.

plotAbundance error?

The plotAbundance method provides the features argument.

This is intended to highlight colData variables below the plot. According to manpage:

"features: a single ‘character’ value defining a column from ‘colData’
to be plotted below the abundance plot. Continuous numeric
values will be plotted as point, whereas factors and
character will be plotted as colour-code bar. (default:
‘features = NULL’)"

The following commands output ggplot object p, which is not a single figure with colData highlighted below the figure. Instead, this outputs two different figures. My impression is that either the function or the manpage has some mistake. The expected behaviour (to me) would be something like that explained on the manpage.

data(GlobalPatterns)
se <- GlobalPatterns
p <- plotAbundance(se, abund_values = "counts", rank = "Phylum", features = "SampleType")
names(p)

[1] "abundance" "SampleType"

On the same go, I would like to see an example in manpages showing how one can add sampleIDs below the plot. This would be a common use case for this.

RDA influences from microViz

The microViz package has nice capabilities for phyloseq.

Explore the possibilities for providing visualizations similar to this and beyond using miaViz, e.g ord_plot

library(mia)

# Get demo data
data(hitchip1006, package="miaTime")
ps <- makePhyloseqFromTreeSE(hitchip1006)
ps <- microbiome::transform(ps, "clr")

# Move to numeric
sample_data(ps)$sex <- as.numeric(as.factor(sample_data(ps)$sex))
sample_data(ps)$nationality <- as.numeric(as.factor(sample_data(ps)$nationality))

library(microViz)
p <- ordplot <- ps %>% 
  ord_calc(constraints = c("age","sex")) %>%
  microViz::ord_plot(colour = "diversity", plot_taxa = 1:8)

print(p)

Composition barplot

A typical way to visualize microbiome composition is the "Composition barplot" as shown e.g. in microbiome tutorial (see section with that name).

Consider implementing this as a function, or as an option in an existing function that visualizes the composition (if not readily available).

Add example in miaViz vignette and/or OMA.

vignette

Add at least a simple vignette with some demo examples on this package. Can link to OMA for further examples.

plotRowtree plots additional rectanglular tree when changing layout

When running the following example (OMA, Chapter 5) and changing the layout argument, the layout of the tree changes, but it plots an additional rectangular tree above the correct layout. This happens for all layouts.

library(mia)
library(miaViz)
data("GlobalPatterns")
se <- GlobalPatterns 


altExps(se) <- splitByRanks(se)

altExps(se) <-
  lapply(altExps(se),
         function(y){
           rowData(y)$prevalence <- 
             getPrevalence(y, detection = 1/100, sort = FALSE,
                           abund_values = "counts", as_relative = TRUE)
           y
         })
top_phyla <- getTopTaxa(altExp(se,"Phylum"),
                        method="prevalence",
                        top=10L,
                        abund_values="counts")
top_phyla_mean <- getTopTaxa(altExp(se,"Phylum"),
                             method="mean",
                             top=10L,
                             abund_values="counts")
x <- unsplitByRanks(se, ranks = taxonomyRanks(se)[1:6])
x <- addTaxonomyTree(x)

plotRowTree(x[rowData(x)$Phylum %in% top_phyla,],
            edge_colour_by = "Phylum",
            tip_colour_by = "prevalence",
            node_colour_by = "prevalence",layout="ellipse")

sparseMatrix breaks plotAbundance(Density)

Hi :)

SummarizedExperiment objects are allowed to contain sparseMatrix objects instead of base matrix objects, but this breaks plotAbundance and plotAbundanceDensity:

library(Matrix)
library(miaViz)

data(GlobalPatterns, package = "mia")
plotAbundanceDensity(GlobalPatterns) # works
plotAbundance(GlobalPatterns) # works

assay(GlobalPatterns) <- as(assay(GlobalPatterns), "sparseMatrix")
plotAbundanceDensity(GlobalPatterns)
# Error in t.default(mat) : argument is not a matrix
plotAbundance(GlobalPatterns)
# Error in as.data.frame.default(.) : 
#   cannot coerce class ‘structure("dgCMatrix", package = "Matrix")’ to a data.frame

Similar to microbiome/mia#151

Thank you!

Best,
Bela

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.