alexisvdb / singlecellhaystack Goto Github PK
View Code? Open in Web Editor NEWFinding surprising needles (=genes) in haystacks (=single cell transcriptome data).
Home Page: https://alexisvdb.github.io/singleCellHaystack/
License: Other
Finding surprising needles (=genes) in haystacks (=single cell transcriptome data).
Home Page: https://alexisvdb.github.io/singleCellHaystack/
License: Other
show_result_haystack
returns log.p.vals and log.p.adj. What is the base of the logarithm? Is this documented somewhere?
Running with 985k cells and 500GB of memory
### calling haystack_highD()...
### converting detection data from lgCMatrix to lgRMatrix
### scaling input data...
### deciding grid points...
### calculating Kullback-Leibler divergences...
|======================================================================| 100%
### performing randomizations...
*** caught segfault ***
address 0x2ab5cecb7044, cause 'memory not mapped'
Traceback:
1: asMethod(object)
2: as(.R.2.C(from), "matrix")
3: asMethod(object)
4: as(x, "matrix")
5: as.matrix.Matrix(X)
6: as.matrix(X)
7: apply(detection, 1, sum)
8: haystack_highD(x, detection = detection, use.advanced.sampling = use.advanced.sampling, dir.randomization = dir.randomization, scale = scale, grid.points = grid.points, grid.method = grid.method, ...)
9: haystack.matrix(x = scvi, detection = detect, use.advanced.sampling = gd)
10: haystack(x = scvi, detection = detect, use.advanced.sampling = gd)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault
I suggest you use Matrix::colSums
and Matrix::rowSum
instead of apply
to do sum operations as apply
transforms the sparse matrix into a full matrix.
I'm running Haystack
with 985k cells. It took about 8 hours to do the Kullback-Leibler divergences...
. After another 4 hours the randomizations
step was at around 30 percent. 12 more hours later we are at 48%. I think it took about 4 hours to go from 46 -> 48%.
It seems like the randomization
step slows as it progresses. My understanding was that this step should proceed at a linear, consistent rate as essentially it is just picking (semi) random genes to compare the DKL results against?
Anyways, I'm going to have to restart the job as I am running it on a HPC node with a walltime limit of 36 hours. Just wanted to check whether my observation makes any sense.
The content of singleCellHaystack-package.Rd is outdated (for example, still says "haystack" instead of "singleCellHaystack"). It should be updated.
This is a nice way in which I can give you comments using GitHub tools without polluting the code with them. You can browse the code in Github and choose a line and then copy a permalink or even directly create an issue that you can then review. Once this is "done", you can close it, helping you to keep track of things.
My comment:
When I tried your package it complain about the function bs
not found. You need to add a dependency to the package splines
so that you get access to its NAMESPACE (i.e. to its exported functions). The way to do so is to add a line in DESCRIPTION
(maybe on top of the line I referred in this issue):
Imports: splines
and then in NAMESPACE
:
import(splines)
The one in DESCRIPTION
states that your package uses splines
. The second imports all the NAMESPACE
. You may want to import just one function, in which case you use instead:
importFrom(splines, bs)
Tried using the sparse branch on ~985k cells and ... it appears that some matrix operations are still occurring.
*** caught segfault ***
address 0x2ab7a9216380, cause 'memory not mapped'
Traceback:
1: asMethod(object)
2: as(x, "matrix")
3: as.matrix.Matrix(X)
4: as.matrix(X)
5: apply(detect, 2, sum)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault
I've installed the sparse branch as follows (I have "0.3.2" but not certain if the version numbers have significance on this branch). Is there another way you suggest I check to make certain I've installed the correct version?
> remotes::install_github("alexisvdb/singleCellHaystack", branch = 'sparse')
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/local/intel/compilers_and_libraries_2019.1.144/linux/mkl/lib/intel64_lin/libmkl_rt.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] singleCellHaystack_0.3.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 assertthat_0.2.1 dplyr_0.8.5 crayon_1.3.4
[5] plyr_1.8.6 grid_3.6.3 R6_2.4.1 lifecycle_0.2.0
[9] gtable_0.3.0 magrittr_1.5 scales_1.1.1 ggplot2_3.3.0
[13] pillar_1.4.4 stringi_1.4.5 rlang_0.4.6 reshape2_1.4.4
[17] vctrs_0.3.0 ellipsis_0.3.0 splines_3.6.3 tools_3.6.3
[21] stringr_1.4.0 glue_1.4.1 purrr_0.3.4 munsell_0.5.0
[25] compiler_3.6.3 pkgconfig_2.0.3 colorspace_1.4-1 tidyselect_1.1.0
[29] tibble_3.0.1
Hello!
Thank you for providing such a convenient analysis tool! But I meet some problems when using singleCellHaystack.
When I run res.pc20 <- haystack(x = dat.pca, expression = dat.expression)
of this tutorial https://alexisvdb.github.io/singleCellHaystack/articles/examples/a02_example_scRNAseq.html using the provided example data, I received an error: the parameter is not valid. Also, I received the same error when using my own data.
My running environment is R 4.2.1, and all the dependency packages have been installed.
Looking forward to your reply. Thanks a lot!
Add Travis integration to have automatic testing when we push changes to the repository.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.