RNA-Seq_pipe

This repository stores all the R and bash files that are used to run a RNA-Seq analysis.

Downloading tools

Some tools need to be downloaded and compiled in order to run the pipeline available. Here is the list of the tools to be downloaded:

STAR: https://github.com/alexdobin/STAR
Trimmomatic: http://www.usadellab.org/cms/?page=trimmomatic
FastQC: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
featureCounts: http://bioinf.wehi.edu.au/featureCounts/

Tools must be downloaded following the instructions on the websites. The resulting folders and binary executable files must be saved in a scripts folder.

The version of the tools that have been tested with the scripts available on the repository are as follows:

STAR: 2.7
Trimmomatic: 0.39
FastQC: 4.9.3
featureCounts: Version 1.6.4 of Subread package

RNA-Seq pipeline

On the RNA-Seq folder, you will find the necessary tools to run a RNA-Seq pipeline on your terminal in a Unix environment. Several folder are located within it:

data: fasta, gff and a subfolder subdata must be added in this folder.
- subdata: fastq files are included in this folder.
scripts: all the scripts and tools needed to run the RNA-Seq analysis are found here.

A folder bin must be created where all the results will be stored. The user must follow the following steps before running the analysis

git clone https://github.com/jumagari14/RNA-Seq_pipe.git
cd RNA-Seq_pipe
mkdir -p -m 755 bin data
mkdir -p -m 755 data/subdata
## Include all the necessary data in data and subdata folder
cd scripts

Once these steps are done, a shell file can be executed

./main_rna_seq.sh

After running the pipeline, several folders are found in bin:

genome_ind: Genome indexes, necessary to run the STAR mapping.
STAR_Align: Sorted and unsorted Bam files from STAR mapping.
counts: txt files with the counting results.
trimm_data: trimmed fastq reads. A subfolder quality where html and zip files as a result of FASTQC analysis are stored is also generated.

In the main directory, a file .tabular where all the counting results are saved is created. This file will mainly be used in the latter normalisation step.

The main script includes parallel command. If this command is not available on the cluster where the analysis is run, the optional script (optional_rna-star.sh) must be run. In the current repository, R files to perform the normalisation analysis and to extract extra information about gff files are included.

jumagari14 / rna-seq_pipe Goto Github PK

rna-seq_pipe's Introduction

RNA-Seq_pipe

Downloading tools

RNA-Seq pipeline

rna-seq_pipe's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent