Code Monkey home page Code Monkey logo

biocompr's Introduction

BiocompR - Advanced visualizations for data comparison

GitHub repo size GitHub issues GitHub closed issues

BiocompR is an R package built upon ggplot2, and using data.table. It improves some visualisations commonly used in biology and genomics for data comparison and dataset exploration, introduces new kind of plots, provides a toolbox of functions to work with ggplot2 and grid objects, and ultimately, allows users to customize plots produced into publication ready figures.

Author: PAGEAUD Y.1
How to cite: Pageaud Y. et al., BiocompR - Advanced visualizations for data comparison.

GitHub R package version

GitHub last commit
GitHub

Ackowledgment

I would like to thank every people who contributed to the development of this package, with their code, their test datasets, their advices and feedbacks. - Yoann.
Contributors: Dr. Schefzik R.2; Mr. Hruska D.1; Mrs. Bitto V.1; Dr. Kurilov R.1; Mr. Beumer N.1; Mrs. Wursthorn A.3; Mrs. Qadeer R.1; Dr. Feuerbach L.1.
1. DKFZ - Division of Applied Bioinformatics, Germany.
2. Klinik für Anästhesiologie und Operative Intensivmedizin, Medizinische Fakultät Mannheim, Universität Heidelberg, Germany.
3. DKFZ - Clinical Cooperation Unit Translational Radiation Oncology, Germany.

Linux prerequisites (under Ubuntu & Debian)

From your terminal install the following libraries (these are devtools and magick packages dependencies):

sudo apt install libfontconfig1-dev libxml2-dev libharfbuzz-dev libfribidi-dev libcurl4-openssl-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libmagick++-dev

Install BiocompR

In R execute the following command:

devtools::install_github("YoannPa/BiocompR")

Content

Currently the package BiocompR contains 40 exported functions:

  • biopalette() - A color palette advisor for biology plots.
  • build_legends_layout() - Builds legends layout.
  • check_fun() - Checks if a function exists and package of origin.
  • ggasso.all_annot() - Draws association test results between all columns from a data.frame.
  • ggasso.annot_pc() - Plots association tests' results between some annotations and some PCs.
  • ggbipca() - Computes and draws a custom PCA biplot.
  • ggbivar() - Draws boxplots or violins from a variable values against ranges of a 2nd one.
  • ggcirclart() - Circlizes ggplot2 objects.
  • ggcoverage() - Plots an annotated stacked barplot.
  • ggcraviola() - Draws a craviola plot (half-splitted and percentile-binned violin plot).
  • ggdend() - Creates a dendogram in ggplot2.
  • ggdensity_map() - Plots a density color map from a matrix or a molten data.frame.
  • ggeigenvector() - Creates an eigenvector plot using ggplot2.
  • ggeva() - Computes eigenvectors, PC scores and correlations from a correlation test.
  • ggfusion.corr() - Draws 2 triangle matrices of computed pairwise correlations' results.
  • ggfusion.free() - Draws 2 triangle matrices fused together in a single plot.
  • ggheatmap() - Creates a custom heatmap with dendrograms and annotations.
  • gghist() - Plots an histogram using ggplot2 from a numeric or character vector.
  • ggpanel.corr() - Plots results of correlation test between a single variable and multiple others as jittered scatter plot divided into 4 different panels.
  • ggsidebar.basic() - Draws a ggplot2 of a basic sidebar.
  • ggsidebar.full() - Creates a colored side annotation bars in ggplot2.
  • ggstackbar() - Draws stacked barplots from an annotation table.
  • ggsunset() - Draws a sunset plot showing the completeness of a dataset.
  • ggtriangle() - Draws a triangle plot from a basic molten triangle matrix.
  • ggvolcano.corr() - Plots results of correlation test between a single variable and multiple others as volcano plot.
  • ggvolcano.free() - Plots any kind of results with P-values that can be displayed as a volcano plot.
  • ggvolcano.test() - Plots results of a Plots results of statistical tests as volcano plot.
  • ks.plot() - Computes pairwise Kolmogorov-Smirnov tests on a matrix and display results in a fused plot.
  • manage.na() - Keeps, removes or imputes missing values in a matrix or a data.frame based on sample groups.
  • pairwise.ks() - Computes a Kolmogrov-Smirnov test between all columns of a data.frame.
  • prepare_annot_asso() - Prepares annotations to be tested for associations.
  • prepare_pca_data() - Collects and computes needed metrics for PCA biplot.
  • raster.gg2grob() - Rasterize a gg plot into a raster grob.
  • raster.ggplot.to.grob() - Rasterize a gg plot into a raster grob.
  • resize.grob.oneway() - Resizes heights or widths of a grob based on the dimensions of another grob.
  • resize.grobs() - Resizes heights or widths of multiple grobs based on a given grob dimensions.
  • test.annots() - Tests association of an annotation with another one or with a PC.
  • test_asso_all_annot() - Write function description here.
  • test_asso_annot_pc() - Tests associations between a set of annotations and PCs from a prcomp object.
  • warn.handle() - Filters irrelevant warnings matching a regular expression.

Problems ? / I need help !

For any questions Not related to bugs or development please check the section "Known Issues" available below. If the issue you experience is not adressed in the known issues you can write me at [email protected].

Known Issues

❎ Error in UseMethod("depth")

Error in UseMethod("depth") : 
  no applicable method for 'depth' applied to an object of class "NULL"

This error seems to happen randomly when executing code using the ggplot2 and/or grid packages. Usually executing one more time the chunck of code solve the error. The current statues of this issue can be tracked here.

❎ Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto), : Viewport has zero dimension(s)

Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto), :
  Viewport has zero dimension(s)

This error can arise when using the ggbipca() function: if you define a legend with too many values, the plotting area becomes too small to print the plot in plotting panel of RStudio.
When it happens, you can try to manually increase the size of the plotting panel in your RStudio interface. If doing this doesn't solve the error, then it is advised to define a legend with fewer values for colors and/or shapes.

⚠️ Reached elapsed time limit.

Warning message:
In grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto),  :
  reached elapsed time limit.

This warning seems to happen randomly when executing code using the ggplot2 and/or grid packages. Usually after executing one more time the chunck of code the warning does not display anymore. The current statues of this issue can be tracked here.

⚠️ Using alpha for a discrete variable is not advised.

Warning message:
Using alpha for a discrete variable is not advised. 

This warning can arise when using the function ggvolcano.corr() with additionnal ggplot2 components. It doesn't compromise the printing of the plot, however you might feel annoyed by it.
A quick fix to suppress specifically this warning is to use the function warn.handle(), which filters out annoying warnings using pattern matching, as following:

#Create your correlation volcano plot (this will also print the 'default' volcano plot)
my_volcano <- ggvolcano.corr(
  data = dfrm_my_correlation_res, p.cutoff = 0.01, corr.cutoff = 0.1,
  title.corr.cutoff = "Samples default correlation",
  corr.label.cutoff = c(-0.35,0.40)) +
  scale_color_manual(values = ggsci::pal_npg("nrc", alpha = 1)(10)) +
  xlab("Spearman correlation") + ylab("Spearman P-value") +
  ggtitle("Spearman correlation between multiple variables and my variable of interest")

#Print your volcano plot without displaying the annoying warning
warn.handle(
  pattern = "Using alpha for a discrete variable is not advised.",
  print(my_volcano)) 

Nevertheless, using ggvolcano.corr() without additionnal ggplot2 components should not raise this warning.

⚠️ ggrepel: ## unlabeled data points (too many overlaps). Consider increasing max.overlaps.

Warning message:
ggrepel: ## unlabeled data points (too many overlaps). Consider increasing max.overlaps

This warning can arise when using the function ggbipca(). If the scale is too small, and you want to display too many loadings labels, then those overlapping will not be displayed, and this warning will be printed. Using this function has shown that this specific warning can persist, and be printed randomly afterward when running other commands. It is unclear why this is happening. But it can be fixed by executing the following command, once you ran ggbipca():

assign("last.warning", NULL, envir = baseenv())

The current statues of this issue can be tracked here.

⚠️ In min(x) : no non-missing arguments to min; returning Inf / In max(x) : no non-missing arguments to max; returning -Inf

Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf

This warning can arise when using the function sunset()when there is only 1 label displayed on the right Y axis. This warning does not compromise the result and should be ignored.
The current statues of this issue can be tracked here.

Technical questions / Development / Feature request

If you encounters issues or if a feature you would expect is not available in a BiocompR function, please check if an existing issue adresses your point here. If not, create a new issue here.

References

⚠️ Work in progress !

  1. Share a legend between two ggplot2 graphs - Mara Averick
  2. Align two plots on a page - Mara Averick
  3. ggfortify: Data Visualization Tools for Statistical Analysis Results
  4. ggfortify: Plotting PCA (Principal Component Analysis
  5. Loadings vs eigenvectors in PCA: when to use one or another?
  6. What is the proper association measure of a variable with a PCA component?

Licence

BiocompR is currently under the GPL-3.0 licence.

biocompr's People

Contributors

romanschefzik avatar vemabi avatar yoannpa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

biocompr's Issues

ggcoverage: Bars appear to be plotted on top of each other when log scale is used.

Hi Yoann,
I tried out the ggcoverage function in R 3.6.2.
When I set the log.scaled argument to T, the bars appeared to be placed on top of each other rather than next to each other.
Here is a minimal example:

library(ggplot2)
library(data.table)

code of ggcoverage function copied from GitHub

data_for_plotting <- as.data.table(data.frame(col1 = c("C", "B", "A", "D", "E"), col2 = 100:104, col3 = 35:39))
plot <- ggcoverage(data_for_plotting, log.scaled = T)
print(plot)

I hope this helps with improving the package :).
ggcoverage_trial_plot.pdf

Best wishes
Niklas

Heatmap on chromosomes

Would it be possible to add a function that can create a heatmap along chromosomes? Thank you :)

Missing Dependency "Hmisc"

When attempting to install BiocompR package (using R 4.0.2) following instructions in the Readme
i get the following error.

devtools::install()
Skipping 1 packages not available: IRanges
✓ checking for file ‘/home/david/Downloads/BiocompR-master/DESCRIPTION’ ...
─ preparing ‘BiocompR’:
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘BiocompR_0.0.68.tar.gz’
Running /usr/lib/R/bin/R CMD INSTALL /tmp/Rtmpk6eJ9o/BiocompR_0.0.68.tar.gz --install-tests
installing to library ‘/home/david/R/x86_64-pc-linux-gnu-library/4.0’
installing source package ‘BiocompR’ ...
using staged installation
R
byte-compile and prepare package for lazy loading
Error in FUN(X[[i]], ...) : there is no package called ‘Hmisc’
Error: unable to load R code in package ‘BiocompR’
Execution halted
ERROR: lazy loading failed for package ‘BiocompR’
removing ‘/home/david/R/x86_64-pc-linux-gnu-library/4.0/BiocompR’
Error in (function (command = NULL, args = character(), error_on_status = TRUE, :
System command 'R' failed, exit status: 1, stdout & stderr were printed
Type .Last.error.trace to see where the error occured
.Last.error.trace
Stack trace:
devtools::install()
pkgbuild::with_build_tools(required = FALSE, callr::rcmd("INSTALL", ...
callr::rcmd("INSTALL", c(install_path, opts), echo = !quiet, ...
callr:::run_r(options)
base:::with(options, with_envvar(env, do.call(processx::run, ...
base:::with.default(options, with_envvar(env, do.call(processx::run, ...
base:::eval(substitute(expr), data, enclos = parent.frame())
base:::eval(substitute(expr), data, enclos = parent.frame())
callr:::with_envvar(env, do.call(processx::run, c(list(bin, args = real_cmdargs, ...
base:::force(code)
base:::do.call(processx::run, c(list(bin, args = real_cmdargs, ...
(function (command = NULL, args = character(), error_on_status = TRUE, ...
throw(new_process_error(res, call = sys.call(), echo = echo, ...
x System command 'R' failed, exit status: 1, stdout & stderr were printed

Installing package "Hmisc" using install.packages("Hmisc") solved the problem.

ggbipca() gradient color scale

Hi Yoann,

Description
I'm trying to color the points with a gradient color scale and not by groups.
Below, I used the mtcars data set to replicate my issue. After performing the PCA, I wanted to create a biplot with the points colored by weight on a gradient scale. Instead, the separate values are treated as individual groups and the colors do not represent the gradient.

I think it would be beneficial to add an option to color the points on a gradient scale.

To replicate
mtcars.pca <- prcomp(mtcars[,c(1:5,7,10,11)], center = TRUE, scale. = TRUE)
ggbipca(prcomp.res = mtcars.pca, data = mtcars, color.data = "wt")

Thank you!

ggvolcano.free(,force.label=c()) shows no labels if overlapping

Hi Yoann,

Description
just a little feedback on the ggvolcano.free() function. Currently I am trying to display 34 genes within the volcano plot using the force.label=c() argument. As those labels seem to be to close together and ggrepel does not tolerate the overlap, non of the labels are returned.

Warning messages:
1: ggrepel: 17 unlabeled data points (too many overlaps). Consider increasing max.overlaps
2: ggrepel: 17 unlabeled data points (too many overlaps). Consider increasing max.overlaps
3: ggrepel: 17 unlabeled data points (too many overlaps). Consider increasing max.overlaps
4: ggrepel: 17 unlabeled data points (too many overlaps). Consider increasing max.overlaps
5: ggrepel: 17 unlabeled data points (too many overlaps). Consider increasing max.overlaps
6: ggrepel: 17 unlabeled data points (too many overlaps). Consider increasing max.overlaps
7: ggrepel: 17 unlabeled data points (too many overlaps). Consider increasing max.overlaps
8: ggrepel: 17 unlabeled data points (too many overlaps). Consider increasing max.overlaps
9: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
.....

To reproduce
volcano.plot<-ggvolcano.free(data=input.data, x.cutoff = 0.01, p.cutoff=0.05, force.label =c('REXO5', 'C18orf54','CCT6P1','CEP72','POLE2','MCM4','HELLS','CCT6P1','ANKRD20A11P','EHMT2','LIN9','PAXIP1' ))

You can find the complete script with input data under the following DFKZ path: /.../.../.../data/rathgeber/scripts/neuroblastoma/Methylation_Analysis/Make_gene_methylation_TelNet_genes_BiocompR_volcano_plot.R

Suggestion how to fix
You could tune the ggrepel argument "force" to increase the labels' distance

Session information
sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /usr/lib64/libblas.so.3.4.2
LAPACK: /usr/lib64/liblapack.so.3.4.2

Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dplyr_1.0.7 BiocompR_0.0.149 ggplot2_3.3.5 data.table_1.14.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 rstudioapi_0.13 magrittr_2.0.1 tidyselect_1.1.1 munsell_0.5.0
[6] colorspace_2.0-2 R6_2.5.0 rlang_0.4.11 fansi_0.5.0 tools_4.0.0
[11] grid_4.0.0 gtable_0.3.0 utf8_1.2.2 cli_3.0.1 DBI_1.1.1
[16] withr_2.4.2 ellipsis_0.3.2 digest_0.6.27 assertthat_0.2.1 tibble_3.1.3
[21] lifecycle_1.0.0 crayon_1.4.1 farver_2.1.0 purrr_0.3.4 vctrs_0.3.8
[26] ggrepel_0.9.1 glue_1.4.2 labeling_0.4.2 compiler_4.0.0 pillar_1.6.2
[31] generics_0.1.0 scales_1.1.1 pkgconfig_2.0.3
There were 12 warnings (use warnings() to see them)

warnings()
Warning messages:
1: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
2: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
3: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
4: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
5: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
6: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
7: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
8: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
9: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
10: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
11: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps
12: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing max.overlaps

Error in fancy.hist(): could not find function "mclapply"

library(BiocompR)
fancy.hist(rnorm(100, 5, 1), xmax = 10)

results in

Error in mclapply(seq(length(xbreaks) - 1), mc.cores = ncores, function(i) { :
could not find function "mclapply"

I guess the import of "parallel" package is missing when executing library(BiocompR)

executing library(parallel) prior to running above code solved the problem.

fill option

Good afternoon,

To the cross.biplot could you maybe add an option to indicate a 'fill' option, as for the PCAWG palette it works better if you use shape 21 (and up), but that requires "fill" and not "color".

Gr
Miranda

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.