Code Monkey home page Code Monkey logo

distinct's Introduction

distinct: a method for differential analyses via hierarchical permutation tests

distinct is a statistical method to perform differential testing between two or more groups of distributions; differential testing is performed via non-parametric permutation tests on the cumulative distribution functions (cdfs) of each sample. distinct is a general and flexible tool: due to its fully non-parametric nature, which makes no assumptions on how the data was generated, it can be applied to a variety of datasets. It is particularly suitable to perform differential state analyses on single cell data (i.e., differential analyses within sub-populations of cells), such as single cell RNA sequencing (scRNA-seq) and high-dimensional flow or mass cytometry (HDCyto) data. The method also allows for nuisance covariates (such as batch effects).

Simone Tiberi, Helena L Crowell, Pantelis Samartsidis, Lukas M Weber, and Mark D Robinson (2023).

distinct: a novel approach to differential distribution analyses.

The Annals of Applied Statistics. Available here

Bioconductor installation

distinct is available on Bioconductor and can be installed with the command:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("distinct")

Vignette

The vignette illustrating how to use the package can be accessed on Bioconductor or from R via:

vignette("distinct")

or

browseVignettes("distinct")

distinct's People

Contributors

hpages avatar nturaga avatar simonetiberi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

distinct's Issues

Replicate sample naming guidelines

Hey Simone-
Giving this package a try now with our scTransform counts as we briefly discussed on Twitter. One thing I wanted to make sure of was whether the naming scheme of my replicates mattered at all. The documentation is not very clear on this. Currently, my samples are named things like "e14-WT9-2" or "p0-mutant1-1". Does that matter at all, or as long as it is paired correctly with my condition information in the colData table is there no important naming convention?

Thanks!

Jeremy

Result data frame is NULL

Hi Simone,

Thank you for this great tool.

I followed your vignette and tried to run a simple analysis comparing tumor versus normal cells from 3 samples.

Everything when smoothly up to this point:


set.seed(61217)
res <-  distinct_test(x = sce_lms_endo, 
                    name_assays_expression = "logcounts",
                    name_cluster = "classification_2",
                    name_sample = "sample",
                    design = design,
                    column_to_test = 2,
                    min_non_zero_cells = 20,
                    n_cores = 4)

2 groups of samples provided
Covariates detected
Data loaded, starting differential testing
Differential testing completed, returning results

Then then when I tried to calculate the log2FC I then got this error:


res = log2_FC(res = res,
               x = sce_lms_endo, 
               name_assays_expression = "cpm",
               name_group = "classification",
               name_cluster = "classification_2")

Error in log2_FC(res = res, x = sce_lms_endo, name_assays_expression = "cpm",  : 
  is.data.frame(res) is not TRUE

It seems like the "res" data.frame was not created when running distinct_test.

Could you provide some insight in why this might be happening?

Thank you,
Stefano

design matrix reference group

Hi Simone,
I am using distinct for my single-cell RNAseq data.
I specified the factors in my design:

print(design)
Intercept) grouprefractory_CR_C4D1
P105_CD3-           1                       0
P107_CD3-           1                       1
P138_CD3-           1                       0
P140_CD3-           1                       1

but the logFC result chooses C4D1 as a reference group.

log2FC_refractory_CR_C1D1/refractory_CR_C4D1

I changed the group levels both ways, it always chooses C4D1 as the reference.
Am I missing anything?

Thanks!
Tommy

log2_FC error: Error in pb_2[, i] : subscript out of bounds

I'm testing distinct on a new dataset, and ran the main distinct_test with seemingly no issues. However when I ran log2_FC, I get this error:

Error in pb_2[, i] : subscript out of bounds

After some testing, I determined that the issue is because two of my clusters are unique to one of my conditions:

Cluster	trt-Rep1	trt-Rep2	trt-Rep3	trt-Rep4	trt-Rep5	ctl-Rep1	ctl-Rep2	ctl-Rep3	ctl-Rep4	ctl-Rep5
38	18	0	1	45	1	0	0	0	0	0
39	15	1	0	38	0	0	0	0	0	0

and thus the dimensions of pb_1 and pb_2 are not equal.

> dim(pb_1)
[1] 39469    43
> dim(pb_2)
[1] 39469    41

I can get around this error by doing something like:
cluster_levels = intersect(colnames(pb_1),colnames(pb_2)) instead of cluster_levels = colnames(pb_1) on line 124 of this function.

Can you make this change, or add in a check/subset earlier on to make sure it doesn't fail here?

Thanks!

Mat::init(): requested size is too large

Just tried running the newest github version (v1.1.3) and am getting this error below:

res_e14_logct = distinct_test(x = e14.sce, name_assays_expression = "logcounts",name_cluster = "ident",design=design,name_sample = "orig.ident",P = 10^3, min_non_zero_cells = 20,n_cores=4)
2 groups of samples provided
Data loaded, starting differential testing
Error in { : task 1 failed - "Mat::init(): requested size is too large"

Note that I changed environments and this is from an instance on a Mac, running OS X 10.13.6.

Edited to add that the same is true for a fresh linux install, and true whether I use 4 cores or just 1.

Also note that I do not get this error for the test dataset, that seems to complete just fine.

Add log-fold-changes to res output table

I don't currently see this as an option, but it would be nice to be able to separate the results table into up- and down-regulated genes. Currently it seems I would need to compute my own LFCs. Can you add this calculation and include it in the output table?

object '.doSnowGlobals' not found

I can run distinct_test() on one core with no issue, but if I specify n_cores=4 I get an error:

object '.doSnowGlobals' not found`

To test, I re-installed doParallel to my typical custom Rlibs directory, which also appears at the top of my .libPaths():

.libPaths()
[1] "/proj/jmsimon/Rlibs40_new"            
[2] "/proj/jmsimon/R-4.0.2/lib64/R/library"

but the error persisted.

This thread suggests there's an issue with worker threads not having access to doParallels but if it's at the top of my .libPaths() I'm not sure what else I can do to ensure that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.