RNA-Bloom is a fast and memory-efficient de novo transcript sequence assembler for bulk and single-cell paired-end RNA-seq data.
Written by Ka Ming Nip ๐ง
ยฉ๏ธ 2018 Canada's Michael Smith Genome Sciences Centre, BC Cancer
Check your Java version:
java -version
Example:
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
- Download the binary tarball
rnabloom_vX.X.X.tar.gz
from the releases section. - Extract the downloaded tarball with the command:
tar -zxf rnabloom_vX.X.X.tar.gz
- RNA-Bloom is ready to use, ie.
java -jar /path/to/RNA-Bloom.jar ...
There is nothing to compile/configure/build! ๐
java -jar RNA-Bloom.jar -left LEFT.fastq.gz -right RIGHT.fastq.gz -revcomp-right -t THREADS -outdir OUTDIR
java -jar RNA-Bloom.jar -stranded -left LEFT.fastq.gz -right RIGHT.fastq.gz -revcomp-right -t THREADS -outdir OUTDIR
Note that dUTP protocols produce reads in the F2R1 orientation, where /2
denotes left reads in forward orientation and /1
denotes right reads in reverse orientation. In this case, please specify your reads paths as -left reads_2.fastq -right reads_1.fastq
.
java -jar RNA-Bloom.jar -pool READSLIST.txt -revcomp-right -t THREADS -outdir OUTDIR
cell1 /path/to/cell1/left.fastq.gz /path/to/cell1/right.fastq.gz
cell2 /path/to/cell2/left.fastq.gz /path/to/cell2/right.fastq.gz
cell3 /path/to/cell3/left.fastq.gz /path/to/cell3/right.fastq.gz
Columns are separated by space/tab characters.
This file consists of 3 columns, ie.
- cell ID
- path of left reads
- path of right reads
set the Bloom filter sizes based on the maximum allowable false positive rate and the expected number of unique k-mers:
java -jar RNA-Bloom.jar -fpr 0.1 -nk 28077715 ...
The number of unique k-mers in your dataset can be estimated efficiently with ntCard.
When running ntCard, please specifiy the same k-mer size to be used in RNA-Bloom (eg. 25), eg.
ntcard -k 25 -c 65535 -p outdir/freq LEFT.fastq.gz RIGHT.fastq.gz
ntCard would generate a histogram file outdir/freq_k25.hist
, where F0
on the 2nd row is the number of unique k-mers, eg.
F1 140110302
F0 28077715
Alternatively, you can use the -ntcard
option in RNA-Bloom if ntcard
is already in your PATH
, eg.
java -jar RNA-Bloom.jar -fpr 0.05 -ntcard ...
java -jar RNA-Bloom.jar -mem 3 ...
Otherwise, it is adjusted automatically based on the size of input read files.
java -jar RNA-Bloom.jar -help
java -Xmx1g -jar RNA-Bloom.jar ...
This option does not need to be set larger than the total Bloom filter size.
Other JVM options may also be used.