Code Monkey home page Code Monkey logo

prophasm2's Introduction

ProphAsm2

ProphAsm test

Introduction

ProphAsm2 is a versatile tool for computing simplitigs/SPSS from k-mer sets and for k-mer set operations. The new features compared to the original ProphAsm include a largely speed and memory optimization, parallelization, support for k-mer sizes up to 64 and support for minimum abundances.

Various types of sequencing datasets can be used as the input for ProphAsm, including genomes, pan-genomes, metagenomes or sequencing reads. Besides computing simplitigs, ProphAsm can also compute intersection and set differences of k-mer sets (while set unions are easy to compute simply by merging the source files).

Upon execution, ProphAsm first loads all specified datasets (see the -i param) and the corresponding k-mer sets (see the -k param). If the -x param is provided, ProphAsm then computes their intersection, subtracts the intersection from the individual k-mer sets and computes simplitigs for the intersection. If output files are specified (see the -o param), it computes also set differences.

Prerequisites

  • GCC 4.8+ or equivalent
  • ZLib

Getting started

Download and compile ProphAsm:

git clone https://github.com/prophyle/prophasm2
cd prophasm2 && make -j

Compute simplitigs:

./prophasm -k 31 -i tests/test1.fa -o simplitigs.fa

How to use

Set operations:

./prophasm -k 31 -i tests/test1.fa -i tests/test2.fa -o _out1.fa -o _out2.fa -x _intersect.fa -s _stats.tsv

Command-line arguments



Algorithm

In its core, ProphAsm2 uses the original algorithm for rapid computation of simplitigs as described in the simplitig paper.

def extend_simplitig_forward (K, simplitig):
	extending = True
	while extending:
		extending = False
		q = simplitig[-k+1:]
		for x in ['A', 'C', 'G', 'T']:
			kmer = q + x
			if kmer in K:
				extending = True
				simplitig = simplitig + x
				K.remove (kmer)
				K.remove (reverse_complement (kmer))
				break
	return K, kmer

def get_maximal_simplitig (K, initial_kmer):
	simplitig = initial_kmer
	K.remove (initial_kmer)
	K.remove (reverse_complement (initial_kmer))
	K, simplitig = extend_simplitig_forward (K, simplitig)
	simplitig = reverse_complement (simplitig)
	K, simplitig = extend_simplitig_forward (K, simplitig)
	return K, simplitig

def compute_simplitigs (kmers):
	K = set()
	for kmer in kmers:
		K.add (kmer)
		K.add (reverse_complement(kmer))
	simplitigs = set()
	while |K|>0:
		initial_kmer = K.random()
		K, simplitig = get_maximal_simplitig (K, initial_kmer)
		simplitigs.add (simplitig)
	return simplitigs

Issues

Please use Github issues.

Changelog

See Releases.

Licence

MIT

Contact

Ondrej Sladky <[email protected]>
Karel Brinda <[email protected]>

prophasm2's People

Contributors

ondrejsladky avatar karel-brinda avatar

Watchers

 avatar Pavel Vesely avatar  avatar

prophasm2's Issues

2x slower with k=18

I've done some first a bit complex benchmark for one of my DBs for diagnostics of resistance (~661 executions of ProphAsm).

ProphAsm2 is really fast!!!

However, I noticed that from some reason, with k=18 it's approx 2x slower compared to k=31. Is there any reason for this? With ProphAsm 1, I didn't observe this.

ProphAsm, k18

real	7m54.141s
user	43m45.501s
sys	1m28.739s

ProphAsm, k31

real	8m29.515s
user	48m24.024s
sys	1m25.909s


ProphAsm 2, k18

real	4m35.686s
user	21m11.361s
sys	1m44.798s


ProphAsm 2, k31

real	2m51.831s
user	11m32.846s
sys	1m25.608s

Feature request: Support for k-mer frequencies

Each k-mer paired with the corresponding k-mer count (0..255 โ€“ 0 for the case of decrements during computation; might be faster than deletions)

Then adding a param -m, with default value 1

Saturation: 255+1=255

Feature request: k-mer length >32 with `-large` version included

The program should choose the corresponding datatype based on the params, it should be the same executable. In particular:

If I remember correctly, __uint128_t is a software emulation, see the following comparison:

$ time ./kmercamel -k 31 -a local -c -p "/Users/karel/github/my/rase-db-spneumoniae-sparc/isolates/ZXPKH.fa" > /dev/null 

real	0m0.507s
user	0m0.475s
sys	0m0.016s

$ time ./kmercamel-large -k 31 -a local -c -p "/Users/karel/github/my/rase-db-spneumoniae-sparc/isolates/ZXPKH.fa" > /dev/null 

real	0m0.713s
user	0m0.653s
sys	0m0.025s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.