Code Monkey home page Code Monkey logo

rediscover's Introduction

REDiscover

Tool for RNA editing discovery from NGS data.

REDiscover reports differences between transcriptome and underlying genome, these are putative RNA editing sites. To achieve that, genome and transcriptome are genotyped simultanously and basecalls are compared.

REDiscover is:

  • easy-to-use - the programme will auto-detect and estimate all necessary parameters ie. strandness of your library.
  • fast & lightweight, multi-core support and memory-optimised, so it can be run even on the laptop
  • flexible toward many sequencing technologies and experimental designs ie. stranded and unstranded RNA-Seq, multiple genomes and/or transcriptomes are accepted as input
  • reliable - the tools was tested extensively on vertebrates D. rerio

/docs/flowchart.png

By default, REDiscover filters:

  • QC failed reads
  • duplicates
  • reads with mapping quality (mapQ) below 15

REDiscover reports only regions fulfilling several stringency criteria:

  • depth-of-coverage
  • mean basecall quality

Finally, reads with basecall quality below 20 (0.01 probability of error) for given positiong are ignored.

All above can be easily installed with bioconda:

conda install samtools pysam FastaIndex numpy matplotlib

REDiscover input consists of aligned NGS reads (BAM) from genome(s) and transcriptomes(s). REDiscover will return a list of putative RNA editign sites and their depth of coverage and frequency across samples. Note, you can run REDiscover with RNA-seq reads alone, then you need to provide reference FastA. REDiscover will detect strandness of your library if you provide it with exon annotation (GTF or GFF). Note, mixing of stranded and unstranded libraries is not allowed!

Most of REDiscover parameters can be adjusted manually (default values are given in square brackets []):

  • General options

    -h, --help

    show this help message and exit

    -v, --verbose

    verbose

    --version

    show program's version number and exit

    -o OUT, --out OUT
     

    output file

    -q MAPQ, --mapq MAPQ
     

    mapping quality [3]

    -Q BCQ, --bcq BCQ
     

    basecall quality [20]

    -t THREADS, --threads THREADS
     

    number of cores to use [4]

  • Reference genome (BAM or FastA)

    -d DNA, --dna DNA
     

    input DNA-Seq BAM file(s)

    -f FASTA, --fasta FASTA
     

    reference FASTA file

  • Aligned RNA-seq reads & strandness information

    -r RNA, --rna RNA
     

    input RNA-Seq BAM file(s)

    -g GTF, --gtf GTF
     

    GTF/GFF for auto-detection of strandness

    -u, --unstranded
     

    unstranded RNAseq libraries

    -s, --stranded, -fr-secondstrand
     

    stranded RNAseq libraries ie. Illumina or Standard Solid

    -fr-firststrand
     

    stranded RNAseq libraries ie. dUTP, NSR, NNSR

  • Analyse only subset of regions

    -b REGIONS, --regions REGIONS, --bed REGIONS
     

    BED file with regions to genotype

    -c CHRS, --chrs CHRS
     

    analyse only sublset of chromosomes [all]

  • Filtering

    --minDepth MINDEPTH
     

    minimal depth of coverage [5]

    --minDNAfreq MINDNAFREQ
     

    min frequency for DNA base [0.99]

    --minAltfreq MINALTFREQ
     

    min frequency for RNA editing base [0.01]

    -m MAXSTRANDBIAS, --maxStrandBias MAXSTRANDBIAS
     

    max allowed strand bias [0.1]

    -a, --advancedFiltering
     

    enable advanced filtering (slightly more accurate, but much slower)

    --dbSNP

    dbSNP file

    --dist DIST

    distance between SNPs in cluster [300]

To run the test example, first download & unpack the test dataset:

wget http://zdglab.iimcb.gov.pl/lpryszcz/REDiscover/test.tgz
tar xpfvz test.tgz

Then execute REDiscover.diff:

# discover editing in RNA-seq samples (*.bam) without reference sequencing (ref.fa needed)
~/src/REDiscover/REDiscover.diff -f test/ref.fa -r test/star/*.bam -o test/editing.gz

# discover editing in RNA-seq samples (*.bam) with reference sequencing (ref*.bam needed)
~/src/REDiscover/REDiscover.diff -d test/ref*.bam -r test/star/*.bam -o test/editing.ref.gz

# if you want to ignore dbSNP sites, just add `--dbSNP snps.vcf.gz` to above commands
# or recompute only last step using `./get_enrichment.py`
## you can alter also `--minDepth`, `--minAltfreq` and many more...
~/src/REDiscover/get_enrichment.py -i test/editing.gz --dbSNP snps.vcf.gz

# violin plots for editing sites present in at least 2 samples
~/src/REDiscover/plot_violin.py -i test/editing.gz.n2.gz

# histograms for editing sites present in at least 5 samples
~/src/REDiscover/plot_hist.py -i test/editing.gz.n5.gz

For more details have a look in test directory.

Along with REDiscover, we provide a bunch of usefull tools for characterisation of RNA editing. More details about these can be find in tools directory.

Pryszcz LP, Bochtler M, Winata CL. (In preparation) REDiscover: Robust & efficient detection of RNA editing from large NGS datasets.

rediscover's People

Contributors

lpryszcz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jiangchb biocko

rediscover's Issues

Dependency (ggplot2)

** libs
clang++ -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Library/Frameworks/R.framework/Versions/3.3/Resources/library/Rcpp/include"   -fPIC  -Wall -mtune=core2 -g -O2  -c RcppExports.cpp -o RcppExports.o
clang++ -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Library/Frameworks/R.framework/Versions/3.3/Resources/library/Rcpp/include"   -fPIC  -Wall -mtune=core2 -g -O2  -c edit_search.cpp -o edit_search.o
clang++ -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Library/Frameworks/R.framework/Versions/3.3/Resources/library/Rcpp/include"   -fPIC  -Wall -mtune=core2 -g -O2  -c mbym_search.cpp -o mbym_search.o
clang++ -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o editTools.so RcppExports.o edit_search.o mbym_search.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
installing to /Library/Frameworks/R.framework/Versions/3.3/Resources/library/editTools/libs
** R
** tests
** preparing package for lazy loading
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
  there is no package called ‘ggplot2’
ERROR: lazy loading failed for package ‘editTools’
* removing ‘/Library/Frameworks/R.framework/Versions/3.3/Resources/library/editTools’
에러: Command failed (1)

I think this package require the ggplot2 . right?

Please add this information on documentation.

seek advice about how to get p value when use REDiscover

Dear Professor lpryszcz,
Thanks for your excellent program for RNA editing discover,which is very helpful for our reseach. But I am confused that why there is no p-value parameter when we use this softwere to calculate tomato RNA editing frequency. Would you like to tell how to make the final data more reliable or which parameter can show the reliability of data. And we also want to know the right way to cite your REDiscover softwere when we will publish an article.
Yongfang Yang PHD
China Agricultural University
Beijing, China

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.