Code Monkey home page Code Monkey logo

scalerna's Introduction

ScaleBio Seq Suite: RNA Workflow

This is a Nextflow workflow to run analysis of ScaleBio Single Cell RNA Sequencing libraries. It processes data from sequencing reads to alignments, single-cell outputs (gene-expression matrix, etc.), and QC reports.

Getting started

  • First install Nextflow (at least 22.10; 23.10 or newer recommended)
  • Download this workflow to your machine
  • Setup dependencies
  • Launch the small pipeline test run
  • Download / configure a reference genome for your samples
  • Create a samples.csv table for your samples
  • Create runParams.yml, specifying inputs and analysis options for your run
  • Launch the workflow for your run

Requirements

  • Linux system with GLIBC >= 2.17 (such as CentOS 7 or later)
  • Java 11 or later
  • 64GB of RAM and 12 CPU cores
    • Smaller datasets can be run with 32GB of RAM and 6 CPUs

Required Inputs

  • Sequencing reads
    • Path to the Illumina sequencer RunFolder (.bcl files)
    • To instead start the workflow from .fastq files, generated outside (before) this workflow, see Fastq generation.
  • Sample table
    • A .csv file listing all samples for this analysis run, optionally split by RT barcode. See samples.csv.
  • Reference genome
    • The workflow requires a reference genome, including a STAR index for alignment, and gene annotation. See Reference Genomes
  • Kit version / Library structure
    • Select the libStructure corresponding to the version of the ScaleBio RNA kit used. See Analysis Parameters

Outputs

The workflow produces per-sample and per-library QC reports (html), alignments (bam), a cell-by-gene count-matrix (mtx) and more; See Outputs for a full list.

Extended Throughput Kit

To analyze data from multiple final distribution plates in the extended throughput kit, see Extended Throughput

Workflow Execution

Workflow test

A small test run, with all input data stored online, can be run with the following command:

nextflow run /PATH/TO/ScaleRna -profile PROFILE -params-file /PATH/TO/ScaleRna/docs/examples/runParams.yml --outDir output

-profile docker is the preferred option if the system supports Docker containers; See Dependency Management for alternatives.

With this command, nextflow will automatically download the input data from the internet (AWS S3), so please ensure that the compute nodes have internet access and storage space. Alternatively you can manually download the data first (using AWS CLI):

aws s3 sync s3://scale.pub/testData/rna/202308_tinyPipelineTest/fastqs/ fastqs --no-sign-request
aws s3 sync s3://scale.pub/testData/rna/GRCh38_chr1_genome GRCh38_chr1_genome --no-sign-request

and then run with

nextflow run /PATH/TO/ScaleRna/ -profile PROFILE --samples /PATH/TO/ScaleRna/docs/examples/samples.csv --genome GRCh38_chr1_genome/grch38.chr1.json --fastqDir fastqs --outDir /PATH/TO/OUTPUT_DIR --libStructure libV1.json

Note that this test run is merely a quick and easy way to verify that the pipeline executes properly and does not represent a real assay.

Nextflow Command-line

nextflow options are given with a single - (e.g. -profile), while workflow parameters (e.g. --outDir) are given with a double dash --. See the Nextflow command-line documentation for further details.

A typical command to run the workflow on new data would then be:

nextflow run /PATH/TO/ScaleRna/ -profile docker --samples samples.csv --genome /PATH/TO/GRCh38/grch38.json --runFolder /PATH/TO/230830_A00525_1087_BHLFF3DSX7/ --splitFastq --outDir output

For large datasets (e.g. NovaSeq runs), setting --splitFastq increases the amount of parallelization, which can significantly reduce analysis time. See Analysis Parameters

Configuration

Specifying Analysis Parameters

Analysis parameters (inputs, options, etc.) can be defined either in a runParams.yml file or directly on the nextflow command-line. See Analysis Parameters for details on the options.

Config File

In addition to the analysis parameters, a user-specific nextflow configuration file can be used for system settings (compute and storage resources, resource limits, storage paths, etc.):

-c path/to/user.config

See Nextflow configuration for the way different configuration files, parameter files and the command-line interact.

Dependency Management

The Nextflow workflow can automatically use pre-built docker containers with all dependencies included. Activating the included -profile docker enables the required Nextflow settings. For details and alternatives see Dependencies.

Running in the cloud

Nextflow itself supports running using AWS, Azure and Google Cloud.

In addition Nextflow tower offers another simple way to manage and execute nextflow workflows in Amazon AWS.

Versions and Updates

See the change log

License

By purchasing product(s) and downloading the software product(s) of ScaleBio, You accept all of the terms of the License Agreement. If You do not agree to these terms and conditions, You may not use or download any of the software product(s) of ScaleBio.

scalerna's People

Contributors

felixschlesinger avatar rbeagrie avatar uray-scalebio avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.