Code Monkey home page Code Monkey logo

rna-seq_pipe's Introduction

RNA-Seq_pipe

This repository stores all the R and bash files that are used to run a RNA-Seq analysis.

Downloading tools

Some tools need to be downloaded and compiled in order to run the pipeline available. Here is the list of the tools to be downloaded:

Tools must be downloaded following the instructions on the websites. The resulting folders and binary executable files must be saved in a scripts folder.

The version of the tools that have been tested with the scripts available on the repository are as follows:

  • STAR: 2.7
  • Trimmomatic: 0.39
  • FastQC: 4.9.3
  • featureCounts: Version 1.6.4 of Subread package

RNA-Seq pipeline

On the RNA-Seq folder, you will find the necessary tools to run a RNA-Seq pipeline on your terminal in a Unix environment. Several folder are located within it:

  • data: fasta, gff and a subfolder subdata must be added in this folder.

    • subdata: fastq files are included in this folder.
  • scripts: all the scripts and tools needed to run the RNA-Seq analysis are found here.

A folder bin must be created where all the results will be stored. The user must follow the following steps before running the analysis

git clone https://github.com/jumagari14/RNA-Seq_pipe.git
cd RNA-Seq_pipe
mkdir -p -m 755 bin data
mkdir -p -m 755 data/subdata
## Include all the necessary data in data and subdata folder
cd scripts 

Once these steps are done, a shell file can be executed

./main_rna_seq.sh 

After running the pipeline, several folders are found in bin:

  • genome_ind: Genome indexes, necessary to run the STAR mapping.
  • STAR_Align: Sorted and unsorted Bam files from STAR mapping.
  • counts: txt files with the counting results.
  • trimm_data: trimmed fastq reads. A subfolder quality where html and zip files as a result of FASTQC analysis are stored is also generated.

In the main directory, a file .tabular where all the counting results are saved is created. This file will mainly be used in the latter normalisation step.

The main script includes parallel command. If this command is not available on the cluster where the analysis is run, the optional script (optional_rna-star.sh) must be run. In the current repository, R files to perform the normalisation analysis and to extract extra information about gff files are included.

rna-seq_pipe's People

Contributors

jumagari14 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.