Code Monkey home page Code Monkey logo

fucci-bulk-rna-seq's Introduction

Bulk RNA-seq analysis pipeline with Snakemake for Broad Institute UGER cluster

Snakemake is a Pythonic workflow description language, that is easily configurable to run in all sorts of environments. Since version 4.1, Snakemake contains a feature called 'profiles', for easy exchange of configuration presets for running in a certain environment. This repository contains a snakemake bulk RNA-seq analysis pipeline to run on the Broad's UGER cluster. This pipeline uses the tools as shown in the below graphic.

Installation

Setting up the folder structure

The pipeline expects the following folder organization. Please use this as a model to set up your working space for the pipeline to work successfully.

Setting up snakemake profile for Broad Institute UGER cluster

Please follow the instructions on the Broad Institute GitHub page to set up the snakemake profile.

Preparing a conda environment

The recommended way to run the analysis in this repository is to setup a conda environment, where the package versions of the tools used can be controlled. For a windows machine, please follow the below installation instructions on the Broad cluster.

use Anaconda3

# Create new conda environment with the environment.yml file provided in this repository
dos2unix environment.yml
conda env create -f environment.yml

Installing R Studio and DESeq2 package

DESeq2 is a R Bioconductor package that is used for differential expression analysis. This tool allows you to have more than two experimental groups and account for a second experimental factor. This tool takes as input a table of raw counts.

  1. To install RStudio and R, please follow the instructions [here][hr]. [hr]: https://uvastatlab.github.io/phdplus/installR.html
  2. Open RStudio and install DESeq2 using the instructions below:
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("DESeq2")

Using the pipeline

We're ready to go! To start the analysis:

  1. Update the config file parameters/config.yaml to ensure it has the right paths and sample names.
  2. Connect to the Broad login host
ssh login

# tmux command will keep your code running even if you disconnect
# To disconnect from the session, press CTRL+b, release both keys 
# and then press d. On the original login shell, type tmux a to reconnect 
# to your session, tmux ls to list all sessions, and tmux a -t [number] to 
# connect to session [number].
tmux
cd scripts/
use Anaconda
source activate ../tools/snakemake
# The below command will run all the steps in the pipeline
# up until the alignment by STAR.
snakemake --profile broad-uger --cluster-config cluster.json
  1. Once the snakemake jobs are completed successfully, open R Studio. Load the script: /scripts/r/de.analysis.R
  2. Set the working directory as /scripts/r/ using R command setwd("/scripts/r/")
  3. Select all lines in the script and click on the 'Run' button at the top corner of your source window in RStudio.

fucci-bulk-rna-seq's People

Contributors

ayushi-broadins avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.