Code Monkey home page Code Monkey logo

ariesk's People

Contributors

dcdanko avatar

Watchers

 avatar  avatar

ariesk's Issues

Make DB Build Faster

Currently building the database is quite slow, likely because of python interface to sqlite

Preliminary Benchmarking

Four stages I want to test:

  • Communication
  • Coarse Search
  • Retrieval
  • Alignment

All testing on Hippo, database with 10^7 kmers and r=0.2 in /dev/shm

Convert to Unscaled Ram Distances

Unscaled Ramanujan Distances seem prefereable to scaled. Useful property is that they are good at discriminating very distant sequences while sub-kmer distance is only useful for discriminating more similar sequences. In combination we should be able to search < 10% of a dataset to get near perfect recall

Profile Search

Figure out what is making the filtering step of search slow

Add Bloom Filters for Cluster Fast Reject

Quickly reject candidate clusters using a bloom filter of sub-kmers.

Approach. Given stored k-mers of size k and sub-mers of size m

If two k-mers have n_e mismatches/indels they will have at least k + 1 - m(n_e + 1) matching sub-mers.

Exploit this fact to build a bloom filter that checks if a hit is possible in a cluster. Can also use a Bloom Grid with reads hashed to 1 of N bloom filters

Benchmark Caching

Using a too simple cache benchmark performance on db in /dev/shm and on disk

All testing on Hippo, database with 10^7 kmers and r=0.2

Test Command
ariesk search-seq -p 5431 --search-mode full --inner-metric none -r 0 -i 0.1 <kmer> | wc -l

Queries

CCCCCCCCCCCCCCGGGGGGGGGGGGGGGGG
ACCGCAGTATTATGATGTTGAAAACATGGAT
AGGGCCAGTTCGAAGCGATGTACTCAAAACT
CAGATGTGCCGACGATTTTGCGCCCCGGAGG 
AATAATCCAATGCACGCTCTACTTCTACTAT

Run each query twice

Find mystery build bug

chunks/chunk_dz
  [#######-----------------------------]   20%  00:18:32
Traceback (most recent call last):
  File "/home/dcd3001/miniconda3/envs/ariesk/bin/ariesk", line 11, in <module>
    load_entry_point('ariesk', 'console_scripts', 'ariesk')()
  File "/home/dcd3001/miniconda3/envs/ariesk/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/dcd3001/miniconda3/envs/ariesk/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/dcd3001/miniconda3/envs/ariesk/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/dcd3001/miniconda3/envs/ariesk/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/dcd3001/miniconda3/envs/ariesk/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/dcd3001/miniconda3/envs/ariesk/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/pbtech_mounts/homes039/dcd3001/Dev/AriesK/ariesk/cli/cli_build.py", line 111, in build_grid_cover_fasta
    n_added = predb.fast_add_kmers_from_fasta(fasta_filename)
  File "ariesk/pre_db.pyx", line 148, in ariesk.pre_db.PreDB.fast_add_kmers_from_fasta
  File "ariesk/pre_db.pyx", line 49, in ariesk.pre_db.PreDB.c_add_kmer
  File "ariesk/ram.pyx", line 58, in ariesk.ram.RotatingRamifier.c_ramify
  File "ariesk/ram.pyx", line 36, in ariesk.ram.Ramifier.c_ramify
IndexError: Out of bounds on buffer access (axis 1)

happened in 18/106 chunks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.