Code Monkey home page Code Monkey logo

baseqdrops's Introduction

baseqDrops

A versatile pipeline for processing dataset from 10X, indrop and Drop-seq.

The related paper can be accessed in: https://www.sciencedirect.com/science/article/pii/S1097276518308803?via%3Dihub;

Install baseqDrops

We need python3 and a package called: baseqDrops, which could be installed by:

pip install baseqDrops==2.0

After install, you will have a runnable command baseqDrops

It is recommend for the computer or server to have memory >= 30Gb and CPU cores >=8 for efficient processing;

Configuration file

The following software or resources are required:

While running command, the configures are recorded in the file called config_drops.ini:

[Drops]
samtools = /path/to/samtools
star = /path/to/STAR
whitelistDir = /path/to/whitelist_file_directory
cellranger_ref_hg38 = /path/to/reference/refdata-cellranger-GRCh38-1.2.0/

For Help Informations

baseqDrops run-pipe --help

Process Steps

  1. Cell Barcode Counting: Counting the existed barcodes in dataset. This will generate a file named: barcode_count_.csv;
  2. Cell Barcode Correction, Aggregating and Filtering: Correcting the cell barcodes within 1bp mismatch and then aggregating, filtering the barcode by minimum number of reads (default 5000), this will generate a valid barcode list named: barcode_stats_.csv;
  3. Split the Reads of Valid Cell Barcodes: The raw pair-end raw reads are splitted to 16 single-end files for multiprocessing according to the 2bp prefix of the barcode; The folder of barcode_splits contains files like: split..<AA|AT|AC|AG...|GG>.fq;
  4. Alignment to Genome using STAR: Several (defined by --parallel/-p) STAR programs run at the same time, the results will be at folder named as star_align; The bam files are further sorted by sequence header;
  5. Reads Tagging: Tagging the reads alignment position to the corresponding gene name;
  6. Generating Expression Table: Both the expression table quantified by UMI (Result.UMIs..txt) and raw read count (Result.Reads..txt) will be generated;

Run Pipeline

These parameters should be provided: (or run: baseqDrops run-pipe --help for information)

  • --outdir/-d: Output path (default ./, the result will be stored in ./);
  • --config: Path to the config file;
  • --genome/-g: Genome version [hg38/mm38/hgmm];
  • --protocol/-p: [10X|indrop|dropseq];
  • --minreads: Minimum reads required for a barcode;
  • --name/-n : Name of sample, a folder of / will be created and be the main directory;
  • --parallel : The number of STAR and tagging processes runs at the same time (default is 4, need more memory for larger parallel number);
  • --fq1/-1: Path of Pair-end 1 sequencing file;
  • --fq2/-2: Path of Pair-end 2 sequencing file;
  • --top_million_reads: For huge dataset, you can choose to use part of the data for a quick look, the reads exceeding N million of reads will be skipped;

If your data is human origin and cellranger_ref_hg38 has been defined in configuration file, you can run:

baseqDrops run-pipe --config ./config_drops.ini -g hg38 -p 10X --minreads 1000 -n 10X_test -1 10x_1.1.fq.gz -2 10x.2.fq.gz -d ./

Run by Steps

We also provide step-wise ways for running the pipeline, all the parameters should be provided as described above, an extra "--step" should be provided, for example:

baseqDrops run-pipe --config ./config.ini -g hg38 -p dropseq --minreads 1000 -n dropseq2 --top_million_reads 20 -1 dropseq_1.1.fq.gz -2 dropseq.2.fq.gz -d ./ --step count

The steps are listed:

  • Cell Barcode Counting: --step count
  • Cell Barcode Correction, Aggregating and Filtering: --step stats
  • Split the Reads of Valid Cell Barcodes: --step split
  • Alignment to Genome using STAR: --step star
  • Reads Tagging : --step tagging
  • Generating Expression Table: --step table

Contact

For any questions, please email to: [email protected]

baseqdrops's People

Contributors

beiseq avatar friedpine avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.