Would it be possible to add a function that can create a heatmap along chromosomes? Th

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Heatmap on chromosomes about biocompr HOT 4 CLOSED

yoannpa commented on August 26, 2024

Heatmap on chromosomes

from biocompr.

Comments (4)

YoannPa commented on August 26, 2024

Hi Lena,

Yes. Your request will be adressed as soon as possible.
Best.

from biocompr.

YoannPa commented on August 26, 2024

Sorry for the long delay ! Tons of work to do in the PCAWG study.
I am now actively working your request. This feature should be available within the next 2 weeks.

from biocompr.

YoannPa commented on August 26, 2024

Hi @Lena-Vo ,
The last update on the BiocompR package now allow users to plot heatmaps on genomic scale data using the gg2heatmap() function.
You can do this by providing a chromosome annotation to annot.grps this way:
annot.grps = list("Chromosomes" = my_vector_of_chromosomes)
length(my_vector_of_chromosomes) must equals ncol(matrix).
Then you need to set facet to the name of your annotation (here it is "Chromosomes")
More importantly, if you want a chromosome heatmap, dendrograms must be set to FALSE and dist.method set to "none".

I must warn that the function is not yet memory efficient: Following the amount of data you want to plot, the amount of memory can be huge (if it is too much, the function will crash).

What I recommend to limit risks of crash when using gg2heatmap() on genomic-scale data is:

set raster to any of the magick filters available (to have the full list do: magick::filter_types()) to decrease the size the final plot. Lanczos filter is the one recommended.
set verbose to TRUE: this will be useful in case of error or warning raised for debugging.
set as many labeling parameters as possible that you feel unecessary to element_blank() (among them are: axis.title.x, axis.text.x, axis.ticks.x, axis.title.y.right, axis.text.y.left, ...)
set na.handle to "remove" to remove all incomplete data and decrease the amount of memory used.
set rank.fun to "sd" to sort data by higher standard deviation.
use top.rows to only display a subset of the data with the higher standard deviation.

Hopefully the coming updates should make these options more stable and more memory efficient.
If you experience any issue feel free to submit a new issue.
This one issue will remain open as long as options to draw this kind of plot are not stable.

from biocompr.

YoannPa commented on August 26, 2024

New improvements have been made on the function gg2heatmap():

The option to facet by chromosome and raster the heatmap are more stable.
The verbose option is more complete.
RAM usage has been improved overall (More data you will use, more RAM will be needed anyway, more processing time it will take).

Here is a small example of what can be achieved with gg2heatmap:

facet_meth_heatmap <- gg2heatmap(
  m = miss.sesame.cg.beta.mat, na.handle = "keep", raster = "Lanczos",
  rank.fun = "sd", ncores = 13, dist.method = "none", dendrograms = FALSE,
  lgd.space.width = 2, annot.grps = list("Chromosomes" = chromosomes),
  annot.pal = rainbow(n = 24), annot.size = 3, axis.text.x = element_blank(),
  axis.ticks.x = element_blank(), axis.title.y.left = element_text(size = 12),
  plot.title = "PCAWG missing methylation data distribution",
  row.type = "samples", x.lab = "CG probes", y.lab = "Samples",
  lgd.scale.name = "Methylation", facet = "Chromosomes", verbose = TRUE)

ggsave(filename = "Methylation_heatmap_with_missing_data.pdf",
       plot = facet_meth_heatmap, device = "pdf", width = 20, height = 10)

Herebefore, I used a sparse matrix of methylation beta values, with 1 360 samples by rows and 196 923 CG probes by columns distributed throughout 22 autosomes + X & Y chromosomes (Y is not really visible but it is at the very right end of the plot).
Missing data have been kept.
Rasterization used the Lanczos filter from the magick package.
Rows have been sorted by standard deviation along the genome.
I didn't cluster the data and hid all dendrograms (though one could have been applied on rows).
Only 1 annotation was used : "Chromosomes" = chromosomes.
I specified that the facets should be created based on the "Chromosomes" annotation.
And I set the verbose option to TRUE to display step by step progression of the process.

Processing the data and plotting the final heatmap took roughtly 5 hours to gg2heatmap() because of the rasterization (which is necessary to summarize so many data anyway).
The resulting PDF file size is 80MB large (instead of a full vector graphic file that could probably reach over a dozen of GB). Currently each facet has resolution of 1080x1080 by default. If requested, an option could be provided to reduce this default resolution in order to decrease processing time, and PDF file size.

(It is written on the capture that "1360 CG probes" have been used. This is a sentence issue: 1360 samples have been used by rows, with 196 923 CG probes per samples by columns).

Tip: once saved in a PDF file, one can investigate each facet independently using a
vector graphics editor such as Adobe Illustrator, or Photopea online. Every facet will appear as a bitmap object.

The feature request is now completed. I close the issue.

from biocompr.

Heatmap on chromosomes about biocompr HOT 4 CLOSED

Comments (4)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent