Code Monkey home page Code Monkey logo

bitfield-test's Introduction

bitfield Build Status

A Cython implemented fast positive integer set implementation, optimised for sets of sequential numbers.

WARNING : The serialisation mechanism is not currently portable, This will be fixed soon

Installation

$ sudo easy_install bitfield

Usage

>>> import bitfield
>>> field = bitfield.Bitfield()
>>> field.add(100)
>>> print list(field)
[100L]
>>> second = bitfield.Bitfield([2, 100])
>>> list(field | second)
[2L, 100L]

>>> second.add(10000)
>>> second.pickle()
'BZ:x\x9cca@\x00\x01\x86Q0\nF\xc1(\x18N\x80\x11\x00e\xe0\x00\x16'

>>> large=bitfield.Bitfield(random.sample(xrange(1000000), 500000)) # 500,000 items, randomly distributed
>>> len(large)
500000
>>> len(large.pickle())
125049  # 122KB

>>> large=bitfield.Bitfield(xrange(1000000)) # 1 million items, all sequential
>>> len(large)
1000000
>>> len(large.pickle())
36 # <40 bytes

Bitfields support most of the same operations/usage as regular sets, see the tests for examples.

Design

Bitfield was designed to efficiently handle tracking large sets of items.

The main design goals were:

  • Space-efficient serialisation format
  • Fast membership tests and set differences

Internally, bitfield achieves this using a page-compressed 1-d bitmap.

Within a page, a number is recorded as being present in the set by setting the n-th bit to 1. I.e. the set([1]) is recorded as ...00000010b, while set([1,4]) would be ...00010010b.

This works well for small sets, but the size of the bitfield tends towards (highest set member)/8 bytes as the largest number in the set increases.

To counter this, the bit field is split into chunks of 1 page (usually 4k). If a particular page is empty(no set members in that range) or full, then the bitfield is discarded, and represented by an EMPTY or FULL flag.

To improve lookup times and simplify set comparison, the bitfield always indexes items from 0.
Therefore, a set with a single item of 1,000,000,000 isn't going to be as fast as it could be. This was an intentional design decision.

bitfield-test's People

Contributors

stestagg avatar

Watchers

 avatar  avatar

bitfield-test's Issues

Compilation Error

Compiling the library with newer cython versions gives the following errors:

 Error compiling Cython file:
      ------------------------------------------------------------
      ...
              while len(self.pages) > 0 and self.pages[-1].count == 0:
                  self.pages.pop()

          cpdef add(self, usize_t number):
              """Add a positive integer to the bitfield"""
              cdef usize_t page = number / PAGE_FULL_COUNT
                                         ^
      ------------------------------------------------------------

      cimpl/field.pyx:463:35: Cannot assign type 'double' to 'usize_t'

      Error compiling Cython file:
      ------------------------------------------------------------
      ...
              the_page.add(page_index)

          cpdef remove(Bitfield self, usize_t number):
              """Remove a positive integer from the bitfield
              If the integer does not exist in the field, raise a KeyError"""
              cdef usize_t page_no = number / PAGE_FULL_COUNT
                                            ^
      ------------------------------------------------------------

      cimpl/field.pyx:472:38: Cannot assign type 'double' to 'usize_t'

      Error compiling Cython file:
      ------------------------------------------------------------
      ...
                  raise KeyError()

          cpdef discard(Bitfield self, usize_t number):
              """Remove a positive integer from the bitfield if it is a member.
              If the element is not a member, do nothing."""
              cdef usize_t page = number / PAGE_FULL_COUNT
                                         ^
      ------------------------------------------------------------

      cimpl/field.pyx:486:35: Cannot assign type 'double' to 'usize_t'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.