Code Monkey home page Code Monkey logo

meta-scrna-seq's Introduction

Meta-scRNA-seq

Author: Michael Wang ([email protected])

temp

Outline

We developed meta-scRNA-seq, a pipeline for unbiased detection of non-host transcriptomic information from scRNA-seq data. To achieve this, meta-scRNA-seq aligns scRNA-seq data against the host-genome reference using standard approaches, collected single-cell tagged unmapped reads, labeled them based on sequence similarity against a large metagenomic database, and demultiplexed the reads to generate a cell-by-metagenome count matrix in parallel with the standard cell-by-gene (host) matrix.

Required Software

This workflow requires the following packages listed below. Please ensure that tool can be called from the command line (i.e. the paths to each tool is in your path variable).

conda install -c bioconda star

Please also ensure that you have downloaded the following R packages. They will be used throughout the pipeline.

conda install -c bioconda samtools

Please make sure this tool is available in your working environment. Please also download the reference database.

Procedure

1. Clone this repository.

Run the following command in your command line.

git clone https://github.com/fw262/Meta-scRNA-seq.git

2. Download required software listed above.

Please ensure to include all required software before starting.

3. Store or link paired end sequencing files.

Please move raw fastq files for each experiment into one data directory. Please ensure the sequence files end in "{sample}_R1_001.fastq.gz" and "{sample}_R1_001.fastq.gz" in your data directory.

4. Create the STAR reference of the host genome.

5. Edit the config.yaml file for your experiment.

Please change the variable names in the config.yaml as required for your analysis. This includes the following changes:

  • Samples: Samples prefix (before the _R1_001.fastq.gz)

  • STAR_IND: Path to your STAR generated index folder.

  • DATADIR: Path to where the sequencing samples ({sample}_R1_001.fastq.gz) are stored.

  • PIPELINE_MAJOR: Directory where the outputs (expression matrices, plots) are stored.

  • GLOBAL: Define global variables for pipeline including number of mismatches allowed in STAR, cell barcode base pair range in read 1, and UMI base pair range in read 1.

  • STAREXEC: Path to STAR.

  • KRAKEN: Path to Kraken2.

  • KRAKEN_DB: Path to Kraken2 database.

  • CORES: Number of cores used in each step of the pipeline. To run multiple samples in parallel, please specify total number of cores in the snakemake command (i.e. "snakemake -j {total cores}").

6. Run snakemake with the command "snakemake".

Please ensure the Snakefile and config.yaml files as well as the scripts folder are in the directory where you intend to run the pipeline.

Output

  • Merged transcriptome + metagenomice expression matrices are stored in "[PIPELINE_MAJOR]/[Samples]_solo/Solo.out/merged" folder.
  • The "[PIPELINE_MAJOR]/[Samples]_solo/plots" folder contains several useful plots including UMAP projection of the data, level of unmapped reads for each cell cluster, as well as cell-cluster specific expression of all metagenomic features, differentially expressed genes, and differentially expressed metagenomic features.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.