Code Monkey home page Code Monkey logo

rescue's Introduction

README

This package provides a bootstrap imputation method for dropout events in scRNAseq data published here.

News

Jul. 12, 2020

  • Version 1.0.3.
  • Include k neighbors as a parameter for NN network.
  • Bug fixes.
  • Independent of scRNAseq pipeline.
  • Informative genes may be determined using the computeHVG function (also the default) or any other package (e.g. Seurat, scran) and indicated with select_genes.

Requirements

  • R (>= 3.4)
  • Python (>= 3.0)

The SNN clustering step still uses the Louvain Algorithm but now borrows from the implementation in the Giotto pipeline.

Installation

Install the rescue package using devtools.

install.packages("devtools", repos="http://cran.rstudio.com/")
library(devtools)
devtools::install_github("seasamgo/rescue")
library(rescue)

Required python modules

  • pandas
  • networkx
  • community (from python-louvain)
Automatic installation

The python modules will be installed automatically in a miniconda environment when installing rescue. However, it will ask you whether you want to install them and you can opt out and go for a manual installation if that is preferred.

Manual installation

Install with pip

pip install pandas
pip install networkx
pip install python-louvain

If you chose the manual installation and have multiple python versions installed, you may preemptively force the reticulate package to use the desired version by specifying the path to the python version you want to use. This can be done using the python_path parameter within the bootstrapImputation function or directly set at the beginning of your script. For example:

reticulate::use_python('~/anaconda2/bin/python', required = T)

Method

bootstrapImputation takes a log-normalized expression matrix and returns a list containing the imputed and original matrices.

bootstrapImputation(
  expression_matrix,                  # expression matrix
  select_cells = NULL,                # subset cells
  select_genes = NULL,                # informative genes
  proportion_genes = 0.6,             # proportion of genes to sample
  log_transformed = TRUE,             # whether expression matrix is log-transformed
  log_base = exp(1),                  # log base of log-transformation
  bootstrap_samples = 100,            # number of samples
  number_pcs = 8,                     # number of PC's to consider
  k_neighbors = 30,                   # number of neighbors for NN network
  snn_resolution = 0.9,               # clustering resolution
  impute_index = NULL,                # specify counts to impute, defaults to zero values
  use_mclapply = FALSE,               # run in parallel
  cores = 2,                          # number of parallel cores
  return_individual_results = FALSE,  # return sample means
  python_path = NULL,                 # path to the python version to use, defaults to default path
  verbose = FALSE                     # print progress to console
  )

Similar cells are determined with shared nearest neighbors clustering upon the principal components of informative gene expression (e.g. highly variable or differentially expressed genes). The names of these informative genes may be indicated with select_genes, which defaults to the most highly variable. For more, please view the help files.

Example

To illustrate, we’ll need the Splatter package to simulate some scRNAseq data.

install.packages("BiocManager", repos="http://cran.rstudio.com/")
BiocManager::install("splatter")
library(splatter)

We’ll consider a hypothetical example of 500 cells and 10,000 genes containing five distinct cell types of near equal size, then introduce some dropout events.

params <- splatter::newSplatParams(
  nGenes = 1e4,
  batchCells = 500,
  group.prob = rep(.2, 5),
  de.prob = .05,
  dropout.mid = rep(0, 5),
  dropout.shape = rep(-.5, 5),
  dropout.type = 'group',
  seed = 940
  )
splat <- splatter::splatSimulate(params = params, method = 'groups')
cell_types <- SummarizedExperiment::colData(splat)$Group
cell_types <- as.factor(gsub('Group', 'Cell type ', cell_types))

For visualization purposes we’ll demonstrate using the Seurat pipeline, as with our published work but now v3. However, any pipeline which fits the needs of your downstream analysis will do.

install.packages("Seurat", repos="http://cran.rstudio.com/")
library(Seurat)

First, we should remove genes that lost all counts to dropout.

counts_true <- SummarizedExperiment::assays(splat)$TrueCounts
counts_dropout <- SummarizedExperiment::assays(splat)$counts
comparable_genes <- rowSums(counts_dropout) != 0

Next, we normalize and scale the data.

expression_true <- Seurat::CreateSeuratObject(counts = counts_true)
expression_true <- Seurat::NormalizeData(expression_true)
expression_true <- Seurat::ScaleData(expression_true, features = rownames(expression_true))
expression_dropout <- Seurat::CreateSeuratObject(counts = counts_dropout[comparable_genes, ])
expression_dropout <- Seurat::NormalizeData(expression_dropout)
expression_dropout <- Seurat::ScaleData(expression_dropout, features = rownames(expression_dropout))

The last step is dimension reduction with PCA and then visualization with t-SNE.

expression_true <- Seurat::RunPCA(expression_true, features = rownames(expression_true), verbose = FALSE)
expression_true <- Seurat::SetIdent(expression_true, value = cell_types)
expression_true <- Seurat::RunTSNE(expression_true)
Seurat::DimPlot(expression_true, reduction = "tsne")

expression_dropout <- Seurat::RunPCA(expression_dropout, features = rownames(expression_dropout), verbose = FALSE)
expression_dropout <- Seurat::SetIdent(expression_dropout, value = cell_types)
expression_dropout <- Seurat::RunTSNE(expression_dropout)
Seurat::DimPlot(expression_dropout, reduction = "tsne")

It’s clear that dropout has distorted our evaluation of the data by cell type as compared to what we should see with the full set of counts. Now let’s impute zero counts to recover missing expression values and reevaluate.

impute <- rescue::bootstrapImputation(expression_matrix = expression_dropout@assays$RNA@data) # python_path can be set here
expression_imputed <- Seurat::CreateSeuratObject(counts = impute$final_imputation)
expression_imputed <- Seurat::ScaleData(expression_imputed)
expression_imputed <- Seurat::RunPCA(expression_imputed, features = rownames(expression_imputed), verbose = FALSE)
expression_imputed <- Seurat::SetIdent(expression_imputed, value = cell_types)
expression_imputed <- Seurat::RunTSNE(expression_imputed)
Seurat::DimPlot(expression_imputed, reduction = "tsne")

The recovery of missing expression values due to dropout events allows us to more accurately distinguish cell types with basic data visualization techniques in this simulated example.

References

Dries, R., et al. (2019). Giotto, a pipeline for integrative analysis and visualization of single-cell spatial transcriptomic data. BioRxiv. doi: https://doi.org/10.1101/701680.

Satija R., et al. (2019). Seurat: Tools for Single Cell Genomics. R package version 3.1.0. https://CRAN.R-project.org/package=Seurat.

Tracy, S., Dries, R., Yuan, G. C. (2019). RESCUE: imputing dropout events in single-cell RNA-sequencing data. BMC Bioinformatics, 20(388). doi: https://doi.org/10.1186/s12859-019-2977-0.

Zappia, L., Phipson, B., Oshlack, A. (2017). Splatter: simulation of single-cell RNA sequencing data. Genome Biology, 18(1):174. doi: https://doi.org/10.1186/s13059-017-1305-0.

rescue's People

Contributors

rubd avatar seasamgo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

rubd gcyuan

rescue's Issues

RESCUE issue

Hello,

I am trying to run your package. However, all goes fine till I get this error 👍
Error in system("which python", intern = T) : 'which' not found
any clue?
I think is a problem relative to windows.

Capture

bootstrapImputation function. 'breaks' are not unique

Hello!

I have such weird error. How can I deal with it?

impute <- rescue::bootstrapImputation(expression_matrix = s_obj@assays$RNA@data)
Error in cut.default(x = gene_in_cells_detected$mean_expr, breaks = expr_group_breaks, :
'breaks' are not unique

Thank you!

Can't change number of PCs

While using bootstrapImputation, I would like to change number of PCs. If I try a large number like 30 or 50, the error will happen:
Warning in irlba::irlba(A = my_small_matrix_scaled, nv = pcs_compute) :
You're computing too large a percentage of total singular values, use a standard svd instead.
Warning in irlba::irlba(A = my_small_matrix_scaled, nv = pcs_compute) :
did not converge--results might be invalid!; try increasing work or maxit
Error in fixupDN.if.valid(value, x@Dim) :
length of Dimnames[[2]] (50) is not equal to Dim[2] (16)

If I try 10 PCs, there will still be warnings for multiple times:
Warning in irlba::irlba(A = my_small_matrix_scaled, nv = pcs_compute) :
You're computing too large a percentage of total singular values, use a standard svd instead.

I think the default is 8. Wonder why it cannot be changed. Thank you!

doesn't work

newdata=bootstrapImputation(data,snn_resolution = 0.1,bootstrap_samples = 10)

Error in base::rowMeans(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
In addition: Warning message:
In if (class(expression_matrix) != "dgCMatrix") expression_matrix <- methods::as(expression_matrix, :
the condition has length > 1 and only the first element will be used

I try different parameters but it still doesn't work. Any reply will be appreciated.

impute Error

Hello,

Following the tutorial, I get this error when running the impute portion

impute <- rescue::bootstrapImputation(expression_matrix = expression_dropout@assays$RNA@data)
Error in :=(covariance, (stdev/mean_expr)) :
could not find function ":="

any ideas?

bootstrapImputation function not compatible with Seurat v3

Hi,
I just noticed that bootstrapImputation function is incompatible with Seurat v3 since it is using FindVariableGenes instead of FindVariableFeatures function from Seurat. I am not sure whether that is the only incompatibility since this is the furthest I've got. Do you plan to update rescue to be compatible with Seurat v3 because I would like to try it out? __

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.