Code Monkey home page Code Monkey logo

maestro's Introduction

MAESTRO

GitHub GitHub release (latest by date) Conda Docker Pulls Build Status

MAESTRO(Model-based AnalysEs of Single-cell Transcriptome and RegulOme) is a comprehensive single-cell RNA-seq and ATAC-seq analysis suit built using snakemake. MAESTRO combines several dozen tools and packages to create an integrative pipeline, which enables scRNA-seq and scATAC-seq analysis from raw sequencing data (fastq files) all the way through alignment, quality control, cell filtering, normalization, unsupervised clustering, differential expression and peak calling, celltype annotation and transcription regulation analysis. Currently, MAESTRO support Smart-seq2, 10x-genomics, Drop-seq, SPLiT-seq for scRNA-seq protocols; microfudics-based, 10x-genomics and sci-ATAC-seq for scATAC-seq protocols.

Change Log

v1.0.0

  • Release MAESTRO.

v1.0.1

  • Provide docker image for easy installation. Note, the docker does not include cellranger/cellranger ATAC, as well as the corresponding genome index. Please install cellranger/cellranger ATAC following the installation instructions.

v1.0.2

  • Fix some bugs and set LISA as the default method to predict transcription factors for scRNA-seq. Note, the docker includes the LISA conda environment, but does not include required pre-computed genome datasets. Please download hg38 or mm10 datasets and update the configuration following the installation instructions.

v1.1.0

  • Change the default alignment method of MAESTRO from cellranger to starsolo and minimap2 for accelerating the mapping time.
  • Improve the memory efficiency of scATAC gene score calculation.
  • Incorporate the installation of giggle into MAESTRO, and add web API for LISA function. All the core MAESTRO function can be installed through the conda environment now!
  • Provide more documents for the QC parameters and add flexibility for other parameters in the workflow.

v1.2.0

  • Modify the regulatory potential model by removing the interfering peaks from adjacent genes and adjusting the weight of exon peaks. The "enhanced RP model" is set as the default gene activity scoring model with original "simple RP model" as a option.
  • Modify the integration function of MAESTRO. The new function can output more intermediate figures and log files for diagnosing the possible failure in integrating rare populations.
  • Add the function for annotating cell-types for scATAC-seq clusters based on public bulk chromatin accessibility data from Cistrome database (Note: Please update the giggle index to the latest).
  • Provide the function of generating genome browser tracks at cluster level for scATAC-seq dataset visualization.
  • Support peak calling at the cluster level now!

v1.2.1

  • For scATAC, MAESTRO can support fastq, bam, fragments.tsv.gz as the input of the scATAC-seq workflow.
  • For scATAC, MAESTRO provides an option for users to skip the cell-type annotation step in the pipeline, and an option to choose the strategy for cell-type annotation (RP-based and peak-based).
  • Provide small test data for test scRNA-seq and scATAC-seq pipeline (sampling from 10x fastq files).
  • Add parameter validation before initializing the pipeline and provide more gracious error messages.
  • Update R in MAESTRO conda package from 3.6.3 to 4.0.2, and Seurat from 3.1.2 to 3.1.5.

System requirements

  • Linux/Unix
  • Python (>= 3.0) for MAESTRO snakemake workflow
  • R (>= 3.6.1) for MAESTRO R package

Installation

Install MAESTRO

There are two ways to install MAESTRO -- to install the full workflow through Anaconda cloud; or to install only the R codes for exploring the processed data.

Installing the full solution of MAESTRO workflow through conda

MAESTRO uses the Miniconda3 package management system to harmonize all of the software packages. Users can install the full solution of MAESTRO using the conda environment.

Use the following commands to install Minicoda3๏ผš

$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh

And then users can create an isolated environment for MAESTRO and install through the following commands:

$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
# To make the installation faster, we recommend using mamba
$ conda install mamba -c conda-forge
$ mamba create -n MAESTRO maestro=1.2.1 -c liulab-dfci
# Activate the environment 
$ conda activate MAESTRO

Installing the MAESTRO R package from source code

If users already have the processed datasets, like cell by gene or cell by peak matrix generate by Cell Ranger. Users can install the MAESTRO R package alone to perform the analysis from processed datasets.

$ R
> library(devtools)
> install_github("liulab-dfci/MAESTRO")

Required annotations for MAESTRO workflow

The full MAESTRO workflow requires extra annotation files and tools:

  • MAESTRO depends on starsolo and minimap2 for mapping scRNA-seq and scATAC-seq dataset. Users need to generate the reference files for the alignment software and specify the path of the annotations to MAESTRO through command line options.

  • MAESTRO utilizes LISA to evaluate the enrichment of transcription factors based on the marker genes from scRNA-seq clusters. By default, users can choose the "web" option, which will use the API function in MAESTRO to perform LISA analysis. However, if users want to use the local version of LISA, they need to install LISA locally, build the annotation files according to the LISA document, and provide the path of LISA to MAESTRO when using the RNAAnnotateTranscriptionFactor function.

  • MAESTRO utilizes giggle to identify enrichment of transcription factor peaks in scATAC-seq cluster-specific peaks. By default giggle is installed in MAESTRO environment. The giggle index for Cistrome database can be downloaded here (Note: Before v1.2.0, the giggle index giggle.tar.gz can be downloaded from http://cistrome.org/~chenfei/MAESTRO/giggle.tar.gz. Since v1.2.0, please download the latest index giggle.all.tar.gz). Users need to download the file and provide the location of the giggle annotation to MAESTRO when using the ATACAnnotateTranscriptionFactor function.

Usage

usage: MAESTRO [-h] [-v]
               {scrna-init,scatac-init,integrate-init,mtx-to-h5,count-to-h5,merge-h5,scrna-qc,scatac-qc,scatac-peakcount,scatac-genescore}

There are ten functions available in MAESTRO serving as sub-commands.

Subcommand Description
scrna-init Initialize the MAESTRO scRNA-seq workflow.
scatac-init Initialize the MAESTRO scATAC-seq workflow.
integrate-init Initialize the MAESTRO integration workflow.
mtx-to-h5 Convert 10X mtx format matrix to HDF5 format.
count-to-h5 Convert plain text count table to HDF5 format.
merge-h5 Merge multiple HDF5 files, e.g. different replicates.
scrna-qc Perform quality control for scRNA-seq gene-cell count matrix.
scatac-qc Perform quality control for scATAC-seq peak-cell count matrix.
scatac-peakcount Generate peak-cell binary count matrix.
scatac-genescore Calculate gene score based on the binarized scATAC peak count.

Example of running MAESTRO can be found at the following galleries. Please use MAESTRO COMMAND -h to see the detail description for each option of each module.

Galleries & Tutorials (click on the image for details)


Citation

Wang C, Sun D, Huang X, Wan C, Li Z, Han Y, Qin Q, Fan J, Qiu X, Xie Y, Meyer CA, Brown M, Tang M, Long H, Liu T, Liu XS. Integrative analyses of single-cell transcriptome and regulome using MAESTRO. Genome Biol. 2020 Aug 7;21(1):198. doi: 10.1186/s13059-020-02116-x. PMID: 32767996; PMCID: PMC7412809.

maestro's People

Contributors

chenfeiwang avatar crazyhottommy avatar dongqingsun avatar dongqingsun96 avatar kant avatar liulab-dfci avatar lzy604 avatar taoliu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.