Comments (4)
Hi Lena,
Yes. Your request will be adressed as soon as possible.
Best.
from biocompr.
Sorry for the long delay ! Tons of work to do in the PCAWG study.
I am now actively working your request. This feature should be available within the next 2 weeks.
from biocompr.
Hi @Lena-Vo ,
The last update on the BiocompR package now allow users to plot heatmaps on genomic scale data using the gg2heatmap() function.
You can do this by providing a chromosome annotation to annot.grps
this way:
annot.grps = list("Chromosomes" = my_vector_of_chromosomes)
length(my_vector_of_chromosomes)
must equals ncol(matrix)
.
Then you need to set facet
to the name of your annotation (here it is "Chromosomes"
)
More importantly, if you want a chromosome heatmap, dendrograms
must be set to FALSE
and dist.method
set to "none"
.
I must warn that the function is not yet memory efficient: Following the amount of data you want to plot, the amount of memory can be huge (if it is too much, the function will crash).
What I recommend to limit risks of crash when using gg2heatmap() on genomic-scale data is:
- set
raster
to any of the magick filters available (to have the full list do:magick::filter_types()
) to decrease the size the final plot. Lanczos filter is the one recommended. - set
verbose
toTRUE
: this will be useful in case of error or warning raised for debugging. - set as many labeling parameters as possible that you feel unecessary to
element_blank()
(among them are:axis.title.x
,axis.text.x
,axis.ticks.x
,axis.title.y.right
,axis.text.y.left
, ...) - set
na.handle
to"remove"
to remove all incomplete data and decrease the amount of memory used. - set
rank.fun
to"sd"
to sort data by higher standard deviation. - use
top.rows
to only display a subset of the data with the higher standard deviation.
Hopefully the coming updates should make these options more stable and more memory efficient.
If you experience any issue feel free to submit a new issue.
This one issue will remain open as long as options to draw this kind of plot are not stable.
from biocompr.
New improvements have been made on the function gg2heatmap():
- The option to
facet
by chromosome andraster
the heatmap are more stable. - The
verbose
option is more complete. - RAM usage has been improved overall (More data you will use, more RAM will be needed anyway, more processing time it will take).
Here is a small example of what can be achieved with gg2heatmap:
facet_meth_heatmap <- gg2heatmap(
m = miss.sesame.cg.beta.mat, na.handle = "keep", raster = "Lanczos",
rank.fun = "sd", ncores = 13, dist.method = "none", dendrograms = FALSE,
lgd.space.width = 2, annot.grps = list("Chromosomes" = chromosomes),
annot.pal = rainbow(n = 24), annot.size = 3, axis.text.x = element_blank(),
axis.ticks.x = element_blank(), axis.title.y.left = element_text(size = 12),
plot.title = "PCAWG missing methylation data distribution",
row.type = "samples", x.lab = "CG probes", y.lab = "Samples",
lgd.scale.name = "Methylation", facet = "Chromosomes", verbose = TRUE)
ggsave(filename = "Methylation_heatmap_with_missing_data.pdf",
plot = facet_meth_heatmap, device = "pdf", width = 20, height = 10)
Herebefore, I used a sparse matrix of methylation beta values, with 1 360 samples by rows and 196 923 CG probes by columns distributed throughout 22 autosomes + X & Y chromosomes (Y is not really visible but it is at the very right end of the plot).
Missing data have been kept.
Rasterization used the Lanczos filter from the magick package.
Rows have been sorted by standard deviation along the genome.
I didn't cluster the data and hid all dendrograms (though one could have been applied on rows).
Only 1 annotation was used : "Chromosomes" = chromosomes
.
I specified that the facets should be created based on the "Chromosomes" annotation.
And I set the verbose
option to TRUE
to display step by step progression of the process.
Processing the data and plotting the final heatmap took roughtly 5 hours to gg2heatmap() because of the rasterization (which is necessary to summarize so many data anyway).
The resulting PDF file size is 80MB large (instead of a full vector graphic file that could probably reach over a dozen of GB). Currently each facet has resolution of 1080x1080 by default. If requested, an option could be provided to reduce this default resolution in order to decrease processing time, and PDF file size.
(It is written on the capture that "1360 CG probes" have been used. This is a sentence issue: 1360 samples have been used by rows, with 196 923 CG probes per samples by columns).
Tip: once saved in a PDF file, one can investigate each facet independently using a
vector graphics editor such as Adobe Illustrator, or Photopea online. Every facet will appear as a bitmap object.
The feature request is now completed. I close the issue.
from biocompr.
Related Issues (14)
- Generate data for ggcoverage() examples HOT 3
- ggvolcano.free(,force.label=c()) shows no labels if overlapping HOT 4
- fill option HOT 2
- ggbipca() gradient color scale HOT 3
- Is there any demonstration example document? HOT 2
- Missing Dependency "Hmisc" HOT 2
- Errors in ggcraviola() function's help HOT 1
- Error in fancy.hist(): could not find function "mclapply" HOT 2
- fancy.hist() option to hide/show annotations HOT 1
- fancy.hist() add an "x.min" option HOT 1
- fancy.hist() - request to add an option to change label size HOT 1
- fancy.hist() - Make "show.annot" option more flexible HOT 1
- ggcoverage: Bars appear to be plotted on top of each other when log scale is used. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biocompr.