Code Monkey home page Code Monkey logo

rna-bloom's Introduction

RNA-Bloom's logo

RNA-Bloom is a fast and memory-efficient de novo transcript sequence assembler for bulk and single-cell paired-end RNA-seq data.

Written by Ka Ming Nip ๐Ÿ“ง

ยฉ๏ธ 2018 Canada's Michael Smith Genome Sciences Centre, BC Cancer


Dependency ๐Ÿ“Œ

Check your Java version:

java -version

Example:

java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

Installation ๐Ÿ”ง

  1. Download the binary tarball rnabloom_vX.X.X.tar.gz from the releases section.
  2. Extract the downloaded tarball with the command:
tar -zxf rnabloom_vX.X.X.tar.gz
  1. RNA-Bloom is ready to use, ie. java -jar /path/to/RNA-Bloom.jar ...

There is nothing to compile/configure/build! ๐Ÿ‘

Quick Start ๐Ÿƒ

โš ๏ธ RNA-Bloom only supports paired-end RNA-seq data. Input reads must be in either FASTQ or FASTA format and may be compressed with GZIP.

assemble bulk RNA-seq data:

java -jar RNA-Bloom.jar -left LEFT.fastq.gz -right RIGHT.fastq.gz -revcomp-right -t THREADS -outdir OUTDIR

assemble strand-specific bulk RNA-seq data:

java -jar RNA-Bloom.jar -stranded -left LEFT.fastq.gz -right RIGHT.fastq.gz -revcomp-right -t THREADS -outdir OUTDIR

Note that dUTP protocols produce reads in the F2R1 orientation, where /2 denotes left reads in forward orientation and /1 denotes right reads in reverse orientation. In this case, please specify your reads paths as -left reads_2.fastq -right reads_1.fastq.

assemble single-cell RNA-seq data (Smart-seq2):

java -jar RNA-Bloom.jar -pool READSLIST.txt -revcomp-right -t THREADS -outdir OUTDIR

example READSLIST.txt for the -pool option:

cell1 /path/to/cell1/left.fastq.gz /path/to/cell1/right.fastq.gz
cell2 /path/to/cell2/left.fastq.gz /path/to/cell2/right.fastq.gz
cell3 /path/to/cell3/left.fastq.gz /path/to/cell3/right.fastq.gz

Columns are separated by space/tab characters.

This file consists of 3 columns, ie.

  1. cell ID
  2. path of left reads
  3. path of right reads

set the Bloom filter sizes based on the maximum allowable false positive rate and the expected number of unique k-mers:

java -jar RNA-Bloom.jar -fpr 0.1 -nk 28077715 ...

The number of unique k-mers in your dataset can be estimated efficiently with ntCard.

When running ntCard, please specifiy the same k-mer size to be used in RNA-Bloom (eg. 25), eg.

ntcard -k 25 -c 65535 -p outdir/freq LEFT.fastq.gz RIGHT.fastq.gz

ntCard would generate a histogram file outdir/freq_k25.hist, where F0 on the 2nd row is the number of unique k-mers, eg.

F1	140110302
F0	28077715

Alternatively, you can use the -ntcard option in RNA-Bloom if ntcard is already in your PATH, eg.

java -jar RNA-Bloom.jar -fpr 0.05 -ntcard ...

limit the total size of Bloom filters to 3GB:

java -jar RNA-Bloom.jar -mem 3 ...

Otherwise, it is adjusted automatically based on the size of input read files.

list all available options in RNA-Bloom:

java -jar RNA-Bloom.jar -help

limit the size of Java heap to 1GB:

java -Xmx1g -jar RNA-Bloom.jar ...

This option does not need to be set larger than the total Bloom filter size.

Other JVM options may also be used.


rna-bloom's People

Contributors

kmnip avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.