Code Monkey home page Code Monkey logo

rna-seq-flow's Introduction

RNA-Seq-Flow

RNA-Seq Flow is the script written in python snakemake format which starts from the raw fastq files and ends all the way to give you gene and Isoform level count using RSEM, to Increase the mapping effeciency, it does 2nd pass STAR Allignment by indexing the genome again using the merged SJ.0UT.tab files from the 1st pass. for the quality control it does Fastqc as well as does trimming though Trim-Galore.

workflow

Required Tools

  • FastQC (A quality control tool for high throughput sequence data)

  • Trim-galore (Automates quality control and adapter trimming of fastq files)

  • STAR (Spliced aware ultrafast transcript alligner to refernece genome)

  • Picard (Cammand line set tool to manipulate high-throughput sequencing data)

  • RNA-SeQC (Qualiy Control metrices of RNA-Seq data)

  • RSEM (Accurate quantification of gene and isoform expression levels from RNA-Seq data)

Setting up conda environment for tools and their dependencies

  • Install anaconda or load it if it's already on your server

  • conda create --name rnaseq-env

  • source activate rnaseq-env

  • conda install -c bioconda star

  • conda install -c bioconda fastqc

  • conda install -c bioconda rsem

Use STAR to index the genome for 1st pass allignment, the 2nd pass allignemnt uses the new index from merged SJ.out.tab files from the script

 STAR  --runMode genomeGenerate --runThreadN 24 --genomeDir ./ --genomeFastaFiles hg38.fa   --sjdbGTFfile gencode.v30.annotation.gtf 

To Generate a combined fastqc report for all the samples (.txt)

 python3 fastqc-summary -s $INDIR > "QC_Report.txt"

To quantify the gene expression levels and compatibility with RNA-SeQC, the gencode GTF needs to be collapsed using the GTex script collapse_annotation.py

python3 collapse_annotation.py gencode.v30.annotation.gtf  gencode.v30.GRCh38.genes.gtf

To Run the pipeline on cluster using this command 'modify cluster.json parameters according to your cluster configuration

snakemake -j 999 --configfile config.yaml --use-conda --nolock --cluster-config cluster.json --cluster "sbatch -A {cluster.account} -p {cluster.partition}  -N {cluster.N} -n {cluster.n}  -t {cluster.time} --mem {cluster.mem}"

rna-seq-flow's People

Contributors

khandaud15 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.