Code Monkey home page Code Monkey logo

cqf's People

Contributors

asl avatar kuszmaul avatar prashantpandey avatar rtjohnso avatar the-alchemist avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cqf's Issues

Fixed Bits Per Slot

Hi,
I am trying to create cqf having key remainders of size less than 8 bits. From the usage example(main.c), the number of hash bits equals 8+ qbits. Changing the value 8 to other value leads to assertion error("(BITS_PER_SLOT == 0 || BITS_PER_SLOT == qf->bits_per_slot)"). However, Changing BITS_PER_SLOT in gqf.h from 8 to 0 bypass the assertion and the code passes the testcase written in main.c.

I want to make sure is it safe to set BITS_PER_SLOT to 0. I am curious if you did enough tests for this situation.

nslots and maximum value for key

We are trying to make use of this implementation in khmer (specifically dib-lab/khmer#1675). It is pretty early on and the PR takes a lot of shortcuts at the moment.

One thing I've run into and would appreciate your help on is the interplay between qf_init's nslots parameter and the maximum value the key in qf_insert can take. I assumed that there was no limit on key. However I end up with a segfault when using the following setup: nslots=2**8 and K=28 (length of the kmers). For example inserting "TAGCCTGGTTACTACGCCGAATTATGAT" (key = 6827347471803364) leaves me with:

Program received signal SIGSEGV, Segmentation fault.
offset_lower_bound (slot_index=26669326061731, qf=0x7c6e30) at gqf.c:417
417		const uint64_t boffset = b->offset;

this is the stack trace:

(gdb) where
#0  offset_lower_bound (slot_index=26669326061731, qf=0x7c6e30) at gqf.c:417
#1  is_empty (slot_index=26669326061731, qf=0x7c6e30) at gqf.c:429
#2  insert1 (hash=<optimized out>, qf=0x7c6e30) at gqf.c:1038
#3  qf_insert (qf=qf@entry=0x7c6e30, key=key@entry=6827347471803364, value=value@entry=0, count=count@entry=1) at gqf.c:1453
#4  0x00007ffff50b7612 in khmer::QFStorage::add (this=0x7c6e20, khash=6827347471803364) at lib/storage.hh:432

4 and beyond is inside khmer.

Because there are quite a few moving parts in integrating CQF with khmer I wanted to ask if this should work in principle or not.


(K=8 with nslots = 2**24 seems to work)

Checking how full a CQF is

Hi, we are adding CQF support to khmer and one thing we've run into is that after inserting some number of unique kmers we hit this assertion: https://github.com/splatlab/cqf/blob/master/gqf.c#L487

For example when you create a QF with qf_init(cf, (1ULL << size), size+8, 0) and size=7. I then attempt to insert 400 random 20-mers. The assert gets triggered when I try to insert kmer 242.

I think this is because the QF is "full". Is that right? I'm asking because coming from a world of bloom filters (BF) you can keep adding and adding to your BF and while eventually all bits will be set to 1, you can always add more (even if it makes little sense).

For khmer this is nice. We can guarantee to the user that the program will not run out of memory (all allocated up front) and we can warn them if we think that they made their BF too small. But you can leave your program running for hours and get "some kind of result" instead of "ah we used up all the memory and crashed please start from scratch".

So finally my question: is there a way for me to check how full the CQF is and if adding one more kmer will fail or not? Or is there a way to use the CQF so that it will let me keep adding kmers until I go blue in the face regardless of how full it is/overflow the counter?


(I think you can reproduce this with squeakr: ./main 0 10 1 test.fastq though for me that segfaults and doesn't print the assert it hit so I am not 100% sure.)

Expected space/time?

Very exciting work!

In comparing the Counting Quotient Filter to counting bloom filters and other probabilistic data structures, can you provide any numbers?

For example, the Cuckoo Filter paper (Table 1) lists the number of cache misses per lookup and space requirements as compared to the basic Bloom Filter, as well as whether or not deletion is supported. Of course, your work is more comparable to only the counting Bloom filter. (And, perhaps, comparable to HyperLogLog, which, while not optimized for it, provides a AMQ interface with counting.)

Thanks!

Compiling with GCC 6.2

I ran into two problems with GCC 6.2 on our HPC --

cc1plus: error: invalid option argument ‘-Ofast’
cc1plus: error: unrecognized command line option "-std=c++11"

and

gqf.c: In function ‘void qf_multi_merge(QF**, int, QF*)’:
gqf.c:1593: error: ‘UINT64_MAX’ was not declared in this scope

and eventually figured out this set of fixes:

dib-lab/khmer#1775 (comment)

I don't think anything needs to be done here, and this may be for an old version of the CQF anyway (we're on github.com/dib-lab/khmer commit 1c2810cb, and the CQF code is under third-party/cqf/ FYI), but I wanted to link the problem and its fix into this repo.

segfault in shift_remainders() when empty_index < start_index

I've been experiencing segfaults in shift_remainders(), and it seems to be happening because the value of empty_index is less than start_index, when called from line 1372 (from within insert1()).
I'm just getting started with cqf and it's quite possible that I've misunderstood something in the API. The keys I'm inserting to the QFs should all be within the allowable range, and I am not storing values, just keys. I've enabled auto resizing, and these crashes are happening during qf_resize_malloc.
Am I doing something wrong, or is this a bug?

Mac OS X experience (success)

Using homebrew installed openssl,

% make NH=1 CXXFLAGS='-I /usr/local/Cellar/openssl/1.0.2j/include' LDFLAGS='-L/usr/local/Cellar/openssl/1.0.2j/lib/ -lcrypto'
% ./main 24

worked!

I'm running on Mac OS X 10.12.3, Mac OS X Sierra:

% uname -a
Darwin Titus-MacBook-Air-3.local 16.4.0 Darwin Kernel Version 16.4.0: Thu Dec 22 22:53:21 PST 2016; root:xnu-3789.41.3~3/RELEASE_X86_64 x86_6

"./test 24" fails

I followed the instructions

make
./test 24

and got

Please specify the log of the number of slots and the number of remainder bits in the CQF.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.