This repository stores all the R and bash files that are used to run a RNA-Seq analysis.
Some tools need to be downloaded and compiled in order to run the pipeline available. Here is the list of the tools to be downloaded:
- STAR: https://github.com/alexdobin/STAR
- Trimmomatic: http://www.usadellab.org/cms/?page=trimmomatic
- FastQC: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- featureCounts: http://bioinf.wehi.edu.au/featureCounts/
Tools must be downloaded following the instructions on the websites. The resulting folders and binary executable files must be saved in a scripts folder.
The version of the tools that have been tested with the scripts available on the repository are as follows:
- STAR: 2.7
- Trimmomatic: 0.39
- FastQC: 4.9.3
- featureCounts: Version 1.6.4 of Subread package
On the RNA-Seq folder, you will find the necessary tools to run a RNA-Seq pipeline on your terminal in a Unix environment. Several folder are located within it:
-
data: fasta, gff and a subfolder subdata must be added in this folder.
- subdata: fastq files are included in this folder.
-
scripts: all the scripts and tools needed to run the RNA-Seq analysis are found here.
A folder bin must be created where all the results will be stored. The user must follow the following steps before running the analysis
git clone https://github.com/jumagari14/RNA-Seq_pipe.git
cd RNA-Seq_pipe
mkdir -p -m 755 bin data
mkdir -p -m 755 data/subdata
## Include all the necessary data in data and subdata folder
cd scripts
Once these steps are done, a shell file can be executed
./main_rna_seq.sh
After running the pipeline, several folders are found in bin:
- genome_ind: Genome indexes, necessary to run the STAR mapping.
- STAR_Align: Sorted and unsorted Bam files from STAR mapping.
- counts: txt files with the counting results.
- trimm_data: trimmed fastq reads. A subfolder quality where html and zip files as a result of FASTQC analysis are stored is also generated.
In the main directory, a file .tabular where all the counting results are saved is created. This file will mainly be used in the latter normalisation step.
The main script includes parallel command. If this command is not available on the cluster where the analysis is run, the optional script (optional_rna-star.sh) must be run. In the current repository, R files to perform the normalisation analysis and to extract extra information about gff files are included.