simonetiberi / distinct Goto Github PK

distinct: a method for differential analyses via hierarchical permutation tests

R 39.53% C++ 60.47%

distinct's Introduction

distinct: a method for differential analyses via hierarchical permutation tests

distinct is a statistical method to perform differential testing between two or more groups of distributions; differential testing is performed via non-parametric permutation tests on the cumulative distribution functions (cdfs) of each sample. distinct is a general and flexible tool: due to its fully non-parametric nature, which makes no assumptions on how the data was generated, it can be applied to a variety of datasets. It is particularly suitable to perform differential state analyses on single cell data (i.e., differential analyses within sub-populations of cells), such as single cell RNA sequencing (scRNA-seq) and high-dimensional flow or mass cytometry (HDCyto) data. The method also allows for nuisance covariates (such as batch effects).

Simone Tiberi, Helena L Crowell, Pantelis Samartsidis, Lukas M Weber, and Mark D Robinson (2023).

distinct: a novel approach to differential distribution analyses.

The Annals of Applied Statistics. Available here

Bioconductor installation

distinct is available on Bioconductor and can be installed with the command:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("distinct")

Vignette

The vignette illustrating how to use the package can be accessed on Bioconductor or from R via:

vignette("distinct")

browseVignettes("distinct")

distinct's People

Contributors

Stargazers

Watchers

Forkers

jordansquair juyeongkim

distinct's Issues

Replicate sample naming guidelines

Hey Simone-
Giving this package a try now with our scTransform counts as we briefly discussed on Twitter. One thing I wanted to make sure of was whether the naming scheme of my replicates mattered at all. The documentation is not very clear on this. Currently, my samples are named things like "e14-WT9-2" or "p0-mutant1-1". Does that matter at all, or as long as it is paired correctly with my condition information in the colData table is there no important naming convention?

Thanks!

Jeremy

Result data frame is NULL

Hi Simone,

Thank you for this great tool.

I followed your vignette and tried to run a simple analysis comparing tumor versus normal cells from 3 samples.

Everything when smoothly up to this point:


set.seed(61217)
res <-  distinct_test(x = sce_lms_endo, 
                    name_assays_expression = "logcounts",
                    name_cluster = "classification_2",
                    name_sample = "sample",
                    design = design,
                    column_to_test = 2,
                    min_non_zero_cells = 20,
                    n_cores = 4)

2 groups of samples provided
Covariates detected
Data loaded, starting differential testing
Differential testing completed, returning results

Then then when I tried to calculate the log2FC I then got this error:


res = log2_FC(res = res,
               x = sce_lms_endo, 
               name_assays_expression = "cpm",
               name_group = "classification",
               name_cluster = "classification_2")

Error in log2_FC(res = res, x = sce_lms_endo, name_assays_expression = "cpm",  : 
  is.data.frame(res) is not TRUE

It seems like the "res" data.frame was not created when running distinct_test.

Could you provide some insight in why this might be happening?

Thank you,
Stefano

design matrix reference group

Hi Simone,
I am using distinct for my single-cell RNAseq data.
I specified the factors in my design:

print(design)
Intercept) grouprefractory_CR_C4D1
P105_CD3-           1                       0
P107_CD3-           1                       1
P138_CD3-           1                       0
P140_CD3-           1                       1

but the logFC result chooses C4D1 as a reference group.

log2FC_refractory_CR_C1D1/refractory_CR_C4D1

I changed the group levels both ways, it always chooses C4D1 as the reference.
Am I missing anything?

Thanks!
Tommy

log2_FC error: Error in pb_2[, i] : subscript out of bounds

I'm testing distinct on a new dataset, and ran the main distinct_test with seemingly no issues. However when I ran log2_FC, I get this error:

Error in pb_2[, i] : subscript out of bounds

After some testing, I determined that the issue is because two of my clusters are unique to one of my conditions:

Cluster	trt-Rep1	trt-Rep2	trt-Rep3	trt-Rep4	trt-Rep5	ctl-Rep1	ctl-Rep2	ctl-Rep3	ctl-Rep4	ctl-Rep5
38	18	0	1	45	1	0	0	0	0	0
39	15	1	0	38	0	0	0	0	0	0

and thus the dimensions of pb_1 and pb_2 are not equal.

> dim(pb_1)
[1] 39469    43
> dim(pb_2)
[1] 39469    41

I can get around this error by doing something like:
cluster_levels = intersect(colnames(pb_1),colnames(pb_2)) instead of cluster_levels = colnames(pb_1) on line 124 of this function.

Can you make this change, or add in a check/subset earlier on to make sure it doesn't fail here?

Thanks!

Mat::init(): requested size is too large

Just tried running the newest github version (v1.1.3) and am getting this error below:

res_e14_logct = distinct_test(x = e14.sce, name_assays_expression = "logcounts",name_cluster = "ident",design=design,name_sample = "orig.ident",P = 10^3, min_non_zero_cells = 20,n_cores=4)
2 groups of samples provided
Data loaded, starting differential testing
Error in { : task 1 failed - "Mat::init(): requested size is too large"

Note that I changed environments and this is from an instance on a Mac, running OS X 10.13.6.

Edited to add that the same is true for a fresh linux install, and true whether I use 4 cores or just 1.

Also note that I do not get this error for the test dataset, that seems to complete just fine.

Add log-fold-changes to res output table

I don't currently see this as an option, but it would be nice to be able to separate the results table into up- and down-regulated genes. Currently it seems I would need to compute my own LFCs. Can you add this calculation and include it in the output table?

object '.doSnowGlobals' not found

I can run distinct_test() on one core with no issue, but if I specify n_cores=4 I get an error:

object '.doSnowGlobals' not found`

To test, I re-installed doParallel to my typical custom Rlibs directory, which also appears at the top of my .libPaths():

.libPaths()
[1] "/proj/jmsimon/Rlibs40_new"            
[2] "/proj/jmsimon/R-4.0.2/lib64/R/library"

but the error persisted.

This thread suggests there's an issue with worker threads not having access to doParallels but if it's at the top of my .libPaths() I'm not sure what else I can do to ensure that.

simonetiberi / distinct Goto Github PK

distinct's Introduction

distinct: a method for differential analyses via hierarchical permutation tests

Bioconductor installation

Vignette

distinct's People

Contributors

Stargazers

Watchers

Forkers

distinct's Issues

Replicate sample naming guidelines

Result data frame is NULL

design matrix reference group

log2_FC error: Error in pb_2[, i] : subscript out of bounds

Mat::init(): requested size is too large

Add log-fold-changes to res output table

object '.doSnowGlobals' not found

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent