Code Monkey home page Code Monkey logo

bloomfilter.netcore's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bloomfilter.netcore's Issues

ContainsAsync too slow

I have 50 items to test the ContainsAsync of Redis filter. It took about 1.5 seconds. And 1000 items took 5 seconds. Its configuation is : Expected Elements:100000, errorate: 0.01. Could u pls tell my how to improve my performace?

Unexpectedly high memory usage

Perhaps I'm missing some configuration options, but I would have expected Bloom filter to have static memory usage:

[MemoryDiagnoser]
public class Benchmark
{
    private static readonly int items = 30_000_000;
    [Benchmark]
    public void BloomFilter()
    {
        var bf = FilterBuilder.Build(1000, 0.01);
        for (var i = 0; i < items; i++)
        {
            bf.Add($"property_{i}_name");
        }
        for (var i = 0; i < items; i++)
        {
            if (!bf.Contains($"property_{i}_name"))
            {
                Console.WriteLine($"False negative {i}");
            }
        }
    }

    [Benchmark]
    public void Dictionary()
    {
        var bf = new Dictionary<string, bool>();
        for (var i = 0; i < items; i++)
        {
            bf.Add($"property_{i}_name", true);
        }
        for (var i = 0; i < items; i++)
        {
            if (!bf.ContainsKey($"property_{i}_name"))
            {
                Console.WriteLine($"False negative {i}");
            }
        }
    }
}
BenchmarkDotNet v0.13.7, macOS Big Sur 11.6.5 (20G527) [Darwin 20.6.0]
Intel Core i5-1038NG7 CPU 2.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 6.0.202
  [Host]     : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT AVX2


|      Method |     Mean |    Error |   StdDev |         Gen0 |        Gen1 |       Gen2 | Allocated |
|------------ |---------:|---------:|---------:|-------------:|------------:|-----------:|----------:|
| BloomFilter |  9.538 s | 0.0471 s | 0.0440 s | 3774000.0000 |           - |          - |  11.03 GB |
|  Dictionary | 17.162 s | 0.2386 s | 0.1993 s | 1008000.0000 | 131000.0000 | 11000.0000 |   6.37 GB |

// * Hints *
Outliers
  Benchmark.Dictionary: Default -> 2 outliers were removed (17.85 s, 18.53 s)

Tested with various different options, and Dictionary uses consistently less memory

Bloomfilter bit collision

Hi,
can you explain for me how does bloomfilter deal with bit collision and how we can prevent that?

Tks,
Wilson

Checking for bits set on Add?

I noticed that you check whether the bits were set on Add. Wouldn't it be better to just set the bits and not care about the results?

        lock (_sync)
        {
            for (var i = 0; i < hashes.Count; i++)
            {
                if (!_hashBits.Get(hashes[i]))
                {
                    _hashBits.Set(hashes[i], true);
                    processResults[i] = false;
                }
                else
                {
                    processResults[i] = true;
                }
            }
        }

Number of Hash Functions Design Question

My understanding is that Bloom Filters work on the premise of using multiple hash functions (i.e., k = # of hash functions) (see https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions). To add or test an element, feed it to each of the k hash functions to get k array positions. Set the bits at all these positions to 1 or check the positions are set to 1. If they are set to 1, the item maybe in the set. If any of the bits are 0, then the item is not in the set.

The implementation of Filter.cs only uses a single HashFunction. Why is that? While you can do this with a single hash function, the implementation should have the ability to use 1 or more hash functions.

To be put another way, the Bloom Filter requires "different" hash functions. There must be k different hash functions defined.

There is a discussion of using a hash function by passing k different initial values to a hash function. I am assuming that is what you are doing. Please help me confirm this.

"Alternatively, one can pass k different initial values (such as 0, 1, ..., k โˆ’ 1) to a hash function that takes an initial value; or add (or append) these values to the key" from WIKIPEDIA below.

*********************************** FROM WIKIPEDIA ************************************
The requirement of designing k different independent hash functions can be prohibitive for large k. For a good hash function with a wide output, there should be little if any correlation between different bit-fields of such a hash, so this type of hash can be used to generate multiple "different" hash functions by slicing its output into multiple bit fields. Alternatively, one can pass k different initial values (such as 0, 1, ..., k โˆ’ 1) to a hash function that takes an initial value; or add (or append) these values to the key. For larger m and/or k, independence among the hash functions can be relaxed with negligible increase in false positive rate.[3] (Specifically, Dillinger & Manolios (2004b) show the effectiveness of deriving the k indices using enhanced double hashing or triple hashing, variants of double hashing that are effectively simple random number generators seeded with the two or three hash values.)
"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.