alexisvdb / singlecellhaystack Goto Github PK

View Code? Open in Web Editor NEW

76.0 5.0 9.0 80.05 MB

Finding surprising needles (=genes) in haystacks (=single cell transcriptome data).

Home Page: https://alexisvdb.github.io/singleCellHaystack/

License: Other

R 100.00%

bioinformatics single-cell r transcriptomics spatial-transcriptomics cite-seq pseudotime scatac-seq spatial-proteomics

singlecellhaystack's People

Contributors

Stargazers

Watchers

Forkers

hjanime junjuanzheng amrr101 qingnanl bharatm26 alishamay1305 nbahti wangdi2016

singlecellhaystack's Issues

base of the logarithm

show_result_haystack returns log.p.vals and log.p.adj. What is the base of the logarithm? Is this documented somewhere?

apply in randomization step

Running with 985k cells and 500GB of memory

### calling haystack_highD()...
### converting detection data from lgCMatrix to lgRMatrix
### scaling input data...
### deciding grid points...
### calculating Kullback-Leibler divergences...
  |======================================================================| 100%
### performing randomizations...

 *** caught segfault ***
address 0x2ab5cecb7044, cause 'memory not mapped'

Traceback:
 1: asMethod(object)
 2: as(.R.2.C(from), "matrix")
 3: asMethod(object)
 4: as(x, "matrix")
 5: as.matrix.Matrix(X)
 6: as.matrix(X)
 7: apply(detection, 1, sum)
 8: haystack_highD(x, detection = detection, use.advanced.sampling = use.advanced.sampling,     dir.randomization = dir.randomization, scale = scale, grid.points = grid.points,     grid.method = grid.method, ...)
 9: haystack.matrix(x = scvi, detection = detect, use.advanced.sampling = gd)
10: haystack(x = scvi, detection = detect, use.advanced.sampling = gd)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault

I suggest you use Matrix::colSums and Matrix::rowSum instead of apply to do sum operations as apply transforms the sparse matrix into a full matrix.

randomization non-linear runtime with >> numbers of input cells?

I'm running Haystack with 985k cells. It took about 8 hours to do the Kullback-Leibler divergences.... After another 4 hours the randomizations step was at around 30 percent. 12 more hours later we are at 48%. I think it took about 4 hours to go from 46 -> 48%.

It seems like the randomization step slows as it progresses. My understanding was that this step should proceed at a linear, consistent rate as essentially it is just picking (semi) random genes to compare the DKL results against?

Anyways, I'm going to have to restart the job as I am running it on a HPC node with a walltime limit of 36 hours. Just wanted to check whether my observation makes any sense.

singleCellHaystack-package.Rd needs update

The content of singleCellHaystack-package.Rd is outdated (for example, still says "haystack" instead of "singleCellHaystack"). It should be updated.

Missing splines dependency.

This is a nice way in which I can give you comments using GitHub tools without polluting the code with them. You can browse the code in Github and choose a line and then copy a permalink or even directly create an issue that you can then review. Once this is "done", you can close it, helping you to keep track of things.

My comment:

When I tried your package it complain about the function bs not found. You need to add a dependency to the package splines so that you get access to its NAMESPACE (i.e. to its exported functions). The way to do so is to add a line in DESCRIPTION (maybe on top of the line I referred in this issue):

Imports: splines

and then in NAMESPACE:

import(splines)

The one in DESCRIPTION states that your package uses splines. The second imports all the NAMESPACE. You may want to import just one function, in which case you use instead:

importFrom(splines, bs)

https://github.com/alexisvdb/single-cell-haystack/blob/66dfd1d37c9d210eb2f5de0816b11e41f02302e0/DESCRIPTION#L12

https://github.com/alexisvdb/single-cell-haystack/blob/66dfd1d37c9d210eb2f5de0816b11e41f02302e0/NAMESPACE#L2

Sparse matrix branch not sparse?

Tried using the sparse branch on ~985k cells and ... it appears that some matrix operations are still occurring.


 *** caught segfault ***
address 0x2ab7a9216380, cause 'memory not mapped'

Traceback:
 1: asMethod(object)
 2: as(x, "matrix")
 3: as.matrix.Matrix(X)
 4: as.matrix(X)
 5: apply(detect, 2, sum)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault

I've installed the sparse branch as follows (I have "0.3.2" but not certain if the version numbers have significance on this branch). Is there another way you suggest I check to make certain I've installed the correct version?

> remotes::install_github("alexisvdb/singleCellHaystack", branch = 'sparse')
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/local/intel/compilers_and_libraries_2019.1.144/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] singleCellHaystack_0.3.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5       assertthat_0.2.1 dplyr_0.8.5      crayon_1.3.4
 [5] plyr_1.8.6       grid_3.6.3       R6_2.4.1         lifecycle_0.2.0
 [9] gtable_0.3.0     magrittr_1.5     scales_1.1.1     ggplot2_3.3.0
[13] pillar_1.4.4     stringi_1.4.5    rlang_0.4.6      reshape2_1.4.4
[17] vctrs_0.3.0      ellipsis_0.3.0   splines_3.6.3    tools_3.6.3
[21] stringr_1.4.0    glue_1.4.1       purrr_0.3.4      munsell_0.5.0
[25] compiler_3.6.3   pkgconfig_2.0.3  colorspace_1.4-1 tidyselect_1.1.0
[29] tibble_3.0.1

problem when running haystack

Hello!
Thank you for providing such a convenient analysis tool! But I meet some problems when using singleCellHaystack.
When I run res.pc20 <- haystack(x = dat.pca, expression = dat.expression) of this tutorial https://alexisvdb.github.io/singleCellHaystack/articles/examples/a02_example_scRNAseq.html using the provided example data, I received an error: the parameter is not valid. Also, I received the same error when using my own data.

My running environment is R 4.2.1, and all the dependency packages have been installed.
Looking forward to your reply. Thanks a lot!

Add GitHub actions integration

Add Travis integration to have automatic testing when we push changes to the repository.

alexisvdb / singlecellhaystack Goto Github PK

singlecellhaystack's People

Contributors

Stargazers

Watchers

Forkers

singlecellhaystack's Issues

base of the logarithm

apply in randomization step

randomization non-linear runtime with >> numbers of input cells?

singleCellHaystack-package.Rd needs update

Missing splines dependency.

Sparse matrix branch not sparse?

problem when running haystack

Add GitHub actions integration

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent