Code Monkey home page Code Monkey logo

Comments (4)

YoannPa avatar YoannPa commented on July 3, 2024

Hi Lena,

Yes. Your request will be adressed as soon as possible.
Best.

from biocompr.

YoannPa avatar YoannPa commented on July 3, 2024

Sorry for the long delay ! Tons of work to do in the PCAWG study.
I am now actively working your request. This feature should be available within the next 2 weeks.

from biocompr.

YoannPa avatar YoannPa commented on July 3, 2024

Hi @Lena-Vo ,
The last update on the BiocompR package now allow users to plot heatmaps on genomic scale data using the gg2heatmap() function.
You can do this by providing a chromosome annotation to annot.grps this way:
annot.grps = list("Chromosomes" = my_vector_of_chromosomes)
length(my_vector_of_chromosomes) must equals ncol(matrix).
Then you need to set facet to the name of your annotation (here it is "Chromosomes")
More importantly, if you want a chromosome heatmap, dendrograms must be set to FALSE and dist.method set to "none".

I must warn that the function is not yet memory efficient: Following the amount of data you want to plot, the amount of memory can be huge (if it is too much, the function will crash).

What I recommend to limit risks of crash when using gg2heatmap() on genomic-scale data is:

  • set raster to any of the magick filters available (to have the full list do: magick::filter_types()) to decrease the size the final plot. Lanczos filter is the one recommended.
  • set verbose to TRUE: this will be useful in case of error or warning raised for debugging.
  • set as many labeling parameters as possible that you feel unecessary to element_blank() (among them are: axis.title.x, axis.text.x, axis.ticks.x, axis.title.y.right, axis.text.y.left, ...)
  • set na.handle to "remove" to remove all incomplete data and decrease the amount of memory used.
  • set rank.fun to "sd" to sort data by higher standard deviation.
  • use top.rows to only display a subset of the data with the higher standard deviation.

Hopefully the coming updates should make these options more stable and more memory efficient.
If you experience any issue feel free to submit a new issue.
This one issue will remain open as long as options to draw this kind of plot are not stable.

from biocompr.

YoannPa avatar YoannPa commented on July 3, 2024

New improvements have been made on the function gg2heatmap():

  • The option to facet by chromosome and raster the heatmap are more stable.
  • The verbose option is more complete.
  • RAM usage has been improved overall (More data you will use, more RAM will be needed anyway, more processing time it will take).

Here is a small example of what can be achieved with gg2heatmap:

facet_meth_heatmap <- gg2heatmap(
  m = miss.sesame.cg.beta.mat, na.handle = "keep", raster = "Lanczos",
  rank.fun = "sd", ncores = 13, dist.method = "none", dendrograms = FALSE,
  lgd.space.width = 2, annot.grps = list("Chromosomes" = chromosomes),
  annot.pal = rainbow(n = 24), annot.size = 3, axis.text.x = element_blank(),
  axis.ticks.x = element_blank(), axis.title.y.left = element_text(size = 12),
  plot.title = "PCAWG missing methylation data distribution",
  row.type = "samples", x.lab = "CG probes", y.lab = "Samples",
  lgd.scale.name = "Methylation", facet = "Chromosomes", verbose = TRUE)

ggsave(filename = "Methylation_heatmap_with_missing_data.pdf",
       plot = facet_meth_heatmap, device = "pdf", width = 20, height = 10)

Herebefore, I used a sparse matrix of methylation beta values, with 1 360 samples by rows and 196 923 CG probes by columns distributed throughout 22 autosomes + X & Y chromosomes (Y is not really visible but it is at the very right end of the plot).
Missing data have been kept.
Rasterization used the Lanczos filter from the magick package.
Rows have been sorted by standard deviation along the genome.
I didn't cluster the data and hid all dendrograms (though one could have been applied on rows).
Only 1 annotation was used : "Chromosomes" = chromosomes.
I specified that the facets should be created based on the "Chromosomes" annotation.
And I set the verbose option to TRUE to display step by step progression of the process.

gg2heatmap_on_chromosome_with_raster

Processing the data and plotting the final heatmap took roughtly 5 hours to gg2heatmap() because of the rasterization (which is necessary to summarize so many data anyway).
The resulting PDF file size is 80MB large (instead of a full vector graphic file that could probably reach over a dozen of GB). Currently each facet has resolution of 1080x1080 by default. If requested, an option could be provided to reduce this default resolution in order to decrease processing time, and PDF file size.

(It is written on the capture that "1360 CG probes" have been used. This is a sentence issue: 1360 samples have been used by rows, with 196 923 CG probes per samples by columns).

Tip: once saved in a PDF file, one can investigate each facet independently using a
vector graphics editor such as Adobe Illustrator, or Photopea online. Every facet will appear as a bitmap object.

The feature request is now completed. I close the issue.

from biocompr.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.