Code Monkey home page Code Monkey logo

scrnaseq-bladdercancer's Introduction

scRNAseq-BladderCancer

Collaboration with Bishoy Morris Faltas ([email protected]) at Weill Cornell and David Mulholland ([email protected]) at Mount Sinai for Invasive Bladder Cancer project.

METHOD

Single-cell RNA-seq data is pre-processed with the scater R package. Data normalization, unsupervised cell clustering, and differential expression analysis were carried out by the Seurat R package. Reference-based cell type annotation was carried out using the SingleR R package.

How to use this Script

Key Software Setup

R version 3.6.0
Seurat_3.0.3
MAST_1.10.0
scater_1.12.0
scran_1.12.1
SingleR_1.0.1

After pulling this repository, create folders data and output in the top working folder. Move Cell Ranger analysis results into data folder. Tree structure of directory:

1. scater.R

scater.R for human
scater.R for mouse
Initial quality control and remove low quality cells.

After running these two scripts, sce_list_Human_{date}.Rda and sce_list_Mouse_{date}.Rda files will be generated inside

2. Seurat_setup.R

Seurat_setup.R for human
Seurat_setup.R for mouse

Cells with less than 800 genes or 1500 UMIs or more than 15% of mitochondria genes were excluded from the analysis. Gene expression raw counts were normalized following a global-scaling normalization method with a scale factor of 10,000 and natural log transformation, using the Seurat NormalizeData function. The top 2000 highly variable genes were selected using the expression and dispersion (variance/mean) of genes, followed by canonical correlation analysis (CCA) to identify common sources of variation between the patient and normal datasets. The first 20 CCA results were chosen to generate dimensional t-Distributed Stochastic Neighbor Embedding (tSNE) plots, and cell clustering by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm.

Need to modify the code according to the date. After running these two scripts, BladderCancer_H2_{date}.Rda and BladderCancer_H2_{date}.Rda files will be generated inside data folder. Do not modify any files in data folder.

3. SingleR.R

SingleR.R for Human
SingleR.R for Mouse
Cell types were identified by SingleR (Single-cell Recognition) package. SingleR is a novel computational method for unbiased cell type recognition of scRNA-seq. SingleR leverages reference transcriptomic datasets of pure cell types to infer the cell of origin of each of the single cells independently.

After running this script, singler_BladderCancer_H2_{date}.RData and singler_BladderCancer_M2_{date}.RData file will be generated inside output folder.

4. Identify_Cell_Types_Manually.R

Identify_Cell_Types_Manually.R for Human
Identify_Cell_Types_Manually.R for Mouse
All clusters are tested against marker genes and gene sets.

Multiple plots and table will be generated, save them when necessary.

5. Differential_analysis.R

Differential_analysis.R for Human
Differential_analysis.R for Mouse
Modified FindAllMarkers() FindAllMarkers.UMI() will generate similar dataframe plus two extra columns UMI.1 and UMI.2 to record nUMI. UMI.1 is average nUMI of current cluster, UMI.2 is average nUMI of all rest of clusters.
FindAllMarkers(object, test.use = "MAST") : MAST (Model-based Analysis of Single Cell Transcriptomics), a GLM-framework that treates cellular detection rate as a covariate (Finak et al, Genome Biology, 2015)

Below is an example of a Differential analysis output file.

gene p_val avg_logFC pct.1 pct.2 p_val_adj UMI.1 UMI.2 cluster
Psca 0 3.9340 0.939 0.055 0 3.5565 0.0339 0
Ppbp 0 2.9163 0.99 0.161 0 3.0622 0.1834 0
Ltf 0 2.9105 0.959 0.042 0 2.6070 0.0365 0
Ecm1 0 2.7729 0.965 0.072 0 2.6652 0.0931 0
Gsto1 0 2.7221 0.995 0.035 0 2.7625 0.0496 0

The results data frame has the following columns :

gene: gene name.
p_val: p_val is calculated using MAST (Model-based Analysis of Single Cell Transcriptomics, Finak et al., Genome Biology, 2015)
avg_logFC: log fold-change of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group.
pct.1: The percentage of cells where the gene is detected in the first group.
pct.2: The percentage of cells where the gene is detected in the second group.
p_val_adj: Adjusted p-value, based on Bonferroni correction
UMI.1 is average nUMI of the current cluster.
UMI.2 is average nUMI of rest of clusters.
cluster: either cell type or corresponding cluster.

scrnaseq-bladdercancer's People

Contributors

nyuhuyang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.