Code Monkey home page Code Monkey logo

scdcc's Introduction

scDCC -- Single Cell Deep Constrained Clustering

Clustering is a critical step in single cell-based studies. Most existing methods support unsupervised clustering without the a priori exploitation of any domain knowledge. When confronted by the high dimensionality and pervasive dropout events of scRNA-Seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicates cell type assignment. In such cases, the only recourse is for the user to manually and repeatedly tweak clustering parameters until acceptable clusters are found. Consequently, the path to obtaining biologically meaningful clusters can be ad hoc and laborious. Here we report a principled clustering method named scDCC, that integrates domain knowledge into the clustering step. Experiments on various scRNA-seq datasets from thousands to tens of thousands of cells show that scDCC can significantly improve clustering performance, facilitating the interpretability of clusters and downstream analyses, such as cell type assignment.

Table of contents

Network diagram

alt text

Requirements

Python --- 3.6.8
pytorch -- 1.5.1+cu101 (https://pytorch.org)
Scanpy --- 1.0.4 (https://scanpy.readthedocs.io/en/stable)
Nvidia Tesla P100

Usage

python scDCC_pairwise_CITE_PBMC.py
python scDCC_pairwise_Human_liver.py

Parameters

--n_clusters: number of clusters
--n_pairwise: number of pairwise constraints want to generate
--gamma: weight of clustering loss
--ml_weight: weight of must-link loss
--cl_weight: weight of cannot-link loss

Files

scDCC.py -- implementation of scDCC algorithm

scDCC_pairwise.py -- the wrapper to run scDCC on the datasets in Figure 2-4

scDCC_pairwise_CITE_PBMC.py -- the wrapper to run scDCC on the 10X CITE PBMC dataset (Figure 5)

scDCC_pairwise_Human_liver.py -- the wrapper to run scDCC on the human liver dataset (Figure 6)

In the folder scDCC_estimating_number_of_clusters I implement a version of scDCC that can be using for general datasets without knowning number of clusters.

Datasets

Datasets used in the study is available in: https://figshare.com/articles/dataset/scDCC_data/21563517

Reference

Tian, T., Zhang, J., Lin, X., Wei, Z., & Hakonarson, H. (2021). Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data. Nature communications, 12(1), 1873. https://doi.org/10.1038/s41467-021-22008-3.

Contact

Tian Tian [email protected]

scdcc's People

Contributors

ttgump avatar

Stargazers

Long Faning avatar  avatar  avatar  avatar 任佳旭 avatar Li Huifa avatar onurcanbektas avatar  avatar Fan Zhang avatar  avatar Yoshitaka Inoue avatar Chang Xiaoya avatar  avatar Jose Cohenca avatar Andrew Willems avatar  avatar Jasim K.B. avatar  avatar Dandan avatar  avatar Ibrahim Animashaun avatar  avatar gudeqing avatar  avatar Kun Qian avatar  avatar fred monroe avatar Zheng Wang avatar Lav-i avatar  avatar Jieli Zhou avatar  avatar lanf89 avatar  avatar  avatar Runzhe Li avatar  avatar  avatar Hansheng XUE avatar Musu Yuan avatar  avatar  avatar

Watchers

James Cloos avatar  avatar

scdcc's Issues

Hello, this is a great job. When such a problem occurs at runtime, what is the cause

Traceback (most recent call last):
File "D:/Code/scDCC/scDeepCluster.py", line 106, in
y_pred, _, _, , _ = model.fit(X=adata.X, X_raw=adata.raw.X, sf=adata.obs.size_factors, y=y, batch_size=args.batch_size, num_epochs=args.maxiter,
File "D:\Code\scDCC\scDCC.py", line 154, in fit
data = self.encodeBatch(X)
File "D:\Code\scDCC\scDCC.py", line 92, in encodeBatch
z,
, _, _, _ = self.forward(inputs)
File "D:\Code\scDCC\scDCC.py", line 69, in forward
h = self.encoder(x+torch.randn_like(x) * float(self.sigma))
File "C:\Users\ThinkPad.conda\envs\DeepL\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\ThinkPad.conda\envs\DeepL\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
input = module(input)
File "C:\Users\ThinkPad.conda\envs\DeepL\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\ThinkPad.conda\envs\DeepL\lib\site-packages\torch\nn\modules\linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "C:\Users\ThinkPad.conda\envs\DeepL\lib\site-packages\torch\nn\functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: expected scalar type Float but found Double

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.