A collection of Python3 pipelines and R and Python3 scripts for analysing data generated with the 10x Genomics platform. The pipelines are based on 10x's Cell Ranger pipeline and DropEst for mapping and quantitation.
Downstream analysis currently relies on the R Seurat package and makes use of many excellent tools from the community including Scran, DropletUtils, SingleR, Clustree, Destiny (for diffusion maps), PHATE and Scanpy (PAGA and Scvelo) for downstream analysis. Automatic export of UCSC cell browser instances is also supported.
For geneset over representation analysis the pipelines use a bespoke R package called gsfisher, which can also be used interactively to analyse single-cell data.
The pipelines are in active development, and should be considered "beta" software - please use at your own risk!
This example shows how the Seurat stimulated and control vignette can be reproduced by the pipeline.
This is the scvelo Bastidas-Ponce et al. dataset
- Summary Report
- Full details to follow.
Here pipeline_seurat.py was run usin the seurat object provided by the Seurat authors in their Guided Clustering of the Microwell-seq Mouse Cell Atlas vignette.
- Summary Report
- Full details to follow.
-
Perform mapping, quantification, aggregation and down-sampling using:
pipelines/pipeline_cellranger.py
- Can be run either from
cellranger mkfastq
orcellranger aggr
outputs. - Samples are mapped and quantitated with
cellranger count
. - Aggregation of sample matrices is performed with with
cellranger aggr
. - Cells with barcodes shared between cells can be removed (within sequencing batch) to mitigate index hopping.
- Random down-sampling of the UMI-count matrix is supported.
- Arbitrary subsets of the aggregated dataset can be generated.
- Can be run either from
-
Perform downstream analysis using a Seurat based workflow:
pipelines/pipeline_seurat.py
- This can be run either from count matrices (e.g.
pipeline_cellranger.py
output) or from saved Seurat object(s). - Analysis of multiple samples with different parameter combinations can be executed in parallel.
- Supports testing for genes differently expressed between conditions.
- Supports finding conserved markers (both between cluster and condition).
- Support for basic geneset over-enrichment analysis (including of arbitrary "gmt" genesets e.g. from MSigDB) using gsfisher.
- Support for visualising expression of arbitrary lists of genes on violin and UMAP plots.
- The pipeline includes Clustree, PAGA, ScVelo, Diffusion maps and SingleR.
- The pipeline can automatically generate UCSC cell browser instances.
- This can be run either from count matrices (e.g.