Code Monkey home page Code Monkey logo

rsdic-mmap's Introduction

RSDic

rsdic is a Go package for rank/select dictionary supporting rank/select operations efficiently, and space efficient for both sparse and dense bit arrays. (Such data structures are also called as fully indexable dictionary in CS literatures (FID)).

Conceptually, rsdic represents a bit vector B[0...num), B[i] = 0 or 1, and bits are provided PushBack operation (Thus RSDic supports dynamic addition).

All operations (Bit, Rank, Select, BitAndRank, RunZeros) are supported in O(1) time. For example, rsdic can solve all above operation within 1 micro second for a bit vector of length 10^8 (100M bit).

RSDic combines the idea of on-the-fly decoding of enumrative code, and utilization of rank/select samplings, which is discussed as a future work in the paper [1].

In RSDic, a bit vector is stored in compressed form (Note, we don't need to decode all at operations). To achieve this, a bit vector is divided into small blocks of length 64, and each small block is compressed using enumurative code. For example, a small block contains 10 ones and 54 zeros will be compressed in 38 bits (See enumCode.go for detail). This achieves not only its information theoretic bound, but also achieves more compression if bits are clusterd. rsdic stores information at most 1.3 bit per original bit including its indicies, and compress more if bit vector is "compresible".

This Go version is based on the C++ implementation [2]. But this Go version supports PushBack so that it can support dynamic addition.

Usage

import "github.com/hillbig/rsdic"

rsd := rsdic.New()
rsd.PushBack(true)
rsd.PushBack(false)
rsd.PushBack(true)
rsd.PushBack(true)
// rsd = 1011
fmt.Printf("%d %d %d\n", rsd.Num(), rsd.OneNum(), rsd.ZeroNum()) // 4 3 1

// Bit(pos uint64) returns B[pos]
fmt.Printf("%v\n", rsd.Bit(2)) // true

// Rank(pos uint64, bit bool) returns the number of bit's in B[0...pos)
fmt.Printf("%d %d\n", rsd.Rank(2, false), rsd.Rank(4, true)) // 1 3

// Select(rank uint64, bit bool) returns the position of (rank+1)-th occurence of bit in B.
oneNum := rsd.OneNum()
for i := uint64(0); i < oneNum; i++ {
	fmt.Printf("%d:%d\n", rsd.Select(i, true))
}
// 0:0
// 1:2
// 2:3

rsd.PushBack(false) // You can add anytime

// Use MarshalBinary() and UnmarshalBinary() for serialize/deserialize RSDic.
bytes, err := rsd.MarshalBinary()
newrsd := rsdic.NewRSDic()
err := newrsd.UnmarshalBinary(bytes)

// Enjoy !

Benchmark

  • 1.7 GHz Intel Core i7
  • OS X 10.9.2
  • 8GB 1600 MHz DDR3
  • go version go1.3 darwin/amd64

The results shows that RSDic operations require always (almost) constant time with regard to the length and one's ratio.

go test -bench=.

// RSDic
// A bit vector of length 10^6 with one's ratio = 0.5
// Allocated size: 1.27 bits per original bit.
BenchmarkDenseRSDicBit	20000000	       129 ns/op
BenchmarkDenseRSDicRank	20000000	       127 ns/op
BenchmarkDenseRSDicSelect	 5000000	       315 ns/op

// RSDic
// A bit vector of length 10^8 with one's ratio = 0.5
BenchmarkBit	 5000000	       274 ns/op
BenchmarkDenseRSDicRank	 5000000	       283 ns/op
BenchmarkDenseRSDicSelect	 5000000	       551 ns/op
BenchmarkSparseRSDicBit	10000000	       238 ns/op
BenchmarkSparseRSDicRank	10000000	       279 ns/op
BenchmarkSparseRSDicSelect	 5000000	       593 ns/op

// RSDic
// A bit vector of length 10^6 with one's ratios = 0.01
// Allocated size: 0.32 bits per original bit.
BenchmarkSparseRSDicBit	10000000	       190 ns/op
BenchmarkSparseRSDicRank	10000000	       197 ns/op
BenchmarkSparseRSDicSelect	 5000000	       495 ns/op

// RSDic
// A bit vector of length 10^8 with one's ratios = 0.01
// Allocated size: 0.32 bits per original bit.
BenchmarkSparseRSDicBit	10000000	       247 ns/op
BenchmarkSparseRSDicRank	10000000	       264 ns/op
BenchmarkSparseRSDicSelect	 5000000	       655 ns/op

// []uint8 (for comparison)
// A raw bit vector of length 10^6 with one's ratio = 0.5
// Bit requires O(1) time and Rank, Select require O(n) time
BenchmarkDenseRawBit	2000000000	         1.86 ns/op
BenchmarkDenseRawRank	    5000	    629768 ns/op
BenchmarkDenseRawSelect	     500	   5089415 ns/op

// []uint8 (for comparison)
// A raw bit vector of length 10^8 with one's ratio = 0.5
BenchmarkDenseRawBit	2000000000	         1.82 ns/op
BenchmarkDenseRawRank	      50	  67224404 ns/op
BenchmarkDenseRawSelect	       5	 477187054 ns/op

rsdic-mmap's People

Contributors

hillbig avatar alexwan0 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.