Code Monkey home page Code Monkey logo

dream_yara's Introduction

DREAM-Yara: An exact read mapper for very large databases with short update time

Overview

Yara is an exact tool for aligning DNA sequencing reads to reference genomes. DREAM-Yara is an extension of Yara to support distributed read mapping. It works by spliting a given reference database in to smaller manageble partitions and this allows faster indexing and super fast updating time. DREAM-Yara can quickly exclude reads for parts of the databases where they cannot match. This allows us to keep the databases in several indices which can be easily rebuilt if parts are updated while maintaining a fast search time. Both Yara and DREAM-Yara are fully sensitive read mappers.

Main features

  • Exhaustive enumeration of sub-optimal end-to-end alignments under the edit distance.
  • Excellent speed, memory footprint and accuracy.
  • Accurate mapping quality computation.
  • Support for reference genomes consisiting of million of contigs.
  • Direct output in SAM/BAM format.

Supported data

DREAM-Yara has been tested on DNA reads (i.e., Whole Genome, Exome, ChIP-seq, MeDIP-seq) produced by the following sequencing platforms:

  • Illumina GA II, HiSeq and MiSeq (single-end and paired-end).
  • Life Technologies Ion Torrent Proton and PGM.

Quality trimming is necessary for Ion Torrent reads and recommended for Illumina reads.

Unsupported data

  • RNA-seq reads spanning splicing sites.
  • Long noisy reads (e.g., Pacific Biosciences RSII, Oxford Nanopore MinION).

Installation from sources

The following instructions assume Linux or OS X. For more information, including Windows instructions, refer to the SeqAn getting started tutorial.

Software requirements

A modern C++11 compiler with OpenMP 3.0 extensions is required to build Yara. If unsure, use GNU G++ 4.9 or newer.

  • Git.
  • CMake 3.2 or newer.
  • G++ 4.9 or newer.

Download

DREAM-Yara sources downloaded by executing:

$ git clone --recurse-submodules https://github.com/seqan/dream_yara.git

Configuration

Create a build project by executing CMake as follows:

$ mkdir build
$ cd build
$ cmake ../dream_yara

Build

Invoke make as follows:

$ make all

Installation

Copy the binaries to a folder in your PATH, e.g.:

# cp bin/* /usr/local/bin

Usage

Distributed Indexer

Here we rquire a refence database splited in to many bins. This can be achieved (eg.) by using TaxSBP from https://github.com/pirovc/taxsbp

$ git clone https://github.com/pirovc/taxsbp

Create 64 fasta files under GENOMES_DIR/ directory with names 0-63.fasta

$ dream_yara_build_filter --threads 8 --kmer-size 18 --filter-type bloom --bloom-size 16 --num-hash 3 --output-file IBF.filter GENOMES_DIR/*.fasta
$ dream_yara_indexer --threads 8 --output-prefix INDICES_DIR/ GENOMES_DIR/*.fasta

Distributed Mapper

Single-end reads

Map single-end DNA reads on the indexed reference genome by executing:

$ dream_yara_mapper -t 8 -ft bloom -e 0.03 -fi IBF.filter -o READS.bam INDICES_DIR/ READS.fastq.gz

Paired-end reads

Map paired-end reads by providing two DNA read files:

$ dream_yara_mapper -t 8 -ft bloom -e 0.03 -fi IBF.filter -o READS.bam INDICES_DIR/ READS_1.fastq.gz READS2.fastq.gz

Output format

Output files follow the SAM/BAM format specification. In addition, Yara generates the following optional tags:

Tag Meaning
NM Edit distance
X0 Number of co-optimal mapping locations
X1 Number of sub-optimal mapping locations
XA Alternative locations: (chr,begin,end,strand,NM;)*

Contact

For questions or comments, feel free to contact: Temesgen H. Dadi <[email protected]>

References

Dadi, T. H., Siragusa, E., Piro, V. C., Andrusch, A., Seiler, E., Renard, B. Y., & Reinert, K. (2018). DREAM-Yara: an exact read mapper for very large databases with short update time, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i766โ€“i772, https://doi.org/10.1093/bioinformatics/bty567

dream_yara's People

Contributors

eseiler avatar temehi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.