Code Monkey home page Code Monkey logo

two-three-tree's Introduction

Two-Three Tree

Build Status

What is this?

This is an implementation of a immutable key-value store (i.e. a 'Map'). The store is immutable. This means that adding, removing or changing a key-value pair doesn't alter the map but instead creates a copy.

The implementation uses a 2-3-tree and is entirely built out of immutable nodes. Creating a copy is implemented efficiently. Most of the original tree's nodes are re-used to build the copy. Roughly speaking the operation will end up making copies only of nodes along a path from the tree's root down to the inserted/deleted leaf. So the amount of copying is O(log(n)).

How to use it

Maven dependency:

	<dependency>
		<groupId>com.github.kdvolder</groupId>
		<artifactId>two-three-tree</artifactId>
		<version>0.0.1-RELEASE</version>
	</dependency>

Create an empty tree to get started:

	TTTree<String, String> tree = TTTree.empty();

Add entries using .put() method to create an extended copy of a map:

	largerTree = tree.put("Hello", "World!");

Lookup entries using .get():

	System.out.println(tree.get("Hello"));

Remove entries using .remove() to create a reduced copy of a map:

	smallerTree = tree.remove("Hello");

Why not use Google Guava ?

Google's Guava library is great. I love it. It provides ImmutableMap and ImmutableSortedMap implementations.

So why not just use that instead?

The answer is, it depends on what you wanted to use it for.

Guava's immutable sorted map is most similar to TTTree. So let's compare the two. Guava's immutables seem to be optimized towards a compact representation of a single map instance. So it's ideal if all you want is to keep a map containing some 'static' data assigned to a constant forever.

However, it performs rather poorly when you are interested in making modified copies of this map. Essentially making a modified copy with an extra/removed key, implies 'rebuilding' the whole map with just a small change to its contents. So its performance is O(n) in both time and space. If you need to do a lot of copying this becomes problematic.

In contrast, TTTree makes modified copy of a map in O(log(n)) time and space. This remains practical, even for larger maps.

For a quick performance comparison, I measured the time it takes to build up a map, starting from a empty map and adding one entry at a time, creating a modified copy each time.

The table below shows the time taken for executing 1,000,000 inserts in this scenario to build up maps of different sizes.

Size TTTree Guava
1 0.029 0.064
10 0.065 0.313
100 0.168 1.576
1,000 0.263 14.173
10,000 0.380 140.890

The price you pay for this spectacular speedup is that a single TTTree will take significantly more memory than a single ImmutableSortedSet containing the same data. For larger maps, on a 64bit VM with compressed oops ImmutableSortedSet uses only slightly over 8 bytes on average per entry. TTTree needs about 50 bytes per entry.

What about access times? A bit surprising, but access times especially for smaller sized maps upto 10_000 entries are very comparable between Guava and TTTree. Starting at about 10_000 entries my tests showed that Guava is gaining a significant edge over TTTree in raw access speed (better cache locality due to its super-compact array-based representation?). Below 10,000 entries access speeds where comparable or even better with TTTree.

The table below shows time taken in seconds, to perform 100,000,000 .get calls on maps of different sizes.

Size TTTree Guava
1 0.358 0.595
10 1.514 2.445
100 2.381 3.091
1,000 8.031 8.103
10,000 14.725 11.449
100,000 31.194 18.412

two-three-tree's People

Contributors

kdvolder avatar dependabot[bot] avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.