Code Monkey home page Code Monkey logo

btree's People

Contributors

borgstad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

btree's Issues

Hashmap collision resolution

In order to keep track of inserted elements, a simple hashmap is used.
This hashmap is by default initialized to having 10,000,000 entries.
Given the probablity of a hash collision: , when k = 1000, then p < 0.05. At k=2000, p < 0.2.

Thus, it is nescessary to use a hashing scheme that deals with collisions.

Add buffer pool functionality

Whenever a Node is read through diskRead, memory is being allocated. Some structure must keep track of what is in memory.

The in-memory content could be represented by a hashmap structure, mapping Node Ids to their offset on disk. The hashmap should also have a list containing Nodes that has been deleted, and therefore their offset freed. The hashing functionality should be controlled through hash.c, and have the following API:

  • createHashTable: Creating a hash table which amounts to initalizing the struct defined below.
  • hashPut: Hash a Node Id, with the value being the current offset + 1.
  • hashGet: Get offset value from Node Id.

HashTable struct to initialized by createHashTable

typedef struct HashTable HashTable;

struct HashTable
{
  int *table;
  int tableSize;
  int curOffset;
  int unallocatedSize;
  int *unallocated;
  int unallocatedN;
};

Likewise, a structure, possibly the DiskNode struct, should maintain a list of Node Ids currently in memory. When the number of Ids exceed some amount, the Node should be written to disk, and removed from memory. This list should be maintained in btree.c, and totally transparent to any calls to diskRead or diskWrite.

Binary search instead of linear

To minimize the worst case running time of internal node search, use binary search which runs in O(log n), instead of linear search which runs in O(n).

Implement cache with LRU for optimizing read/write

Implement a structure like:

typedef struct 
{
    Node *cache;
    HashTable *nodeMemStatus;
    LinkedList lru;
    int nodesInMem;
} Cache

The cache should be initialized to be of a variable size (also depending on the size of the node, see #10). This also implies that calls to diskRead and diskWrite should look in the cache before reading from disk.

Whenever the nodesInMem exceeds a user-defined threshold, nodes should be written to disk.

Idea 1: It would be interesting to keep simple statistics of how many writes/reads are done in a btree, and optimize for that.
Idea 2: Would it make sense to have a thread which functions as a garbage collector, managing nodes in memory?

Insertion is way too slow

The following code inserts 1000 elements, and measures clocks used for computatoins, and wall clock time.

  time_t start = time(NULL);
  clock_t begin = clock();
  for (int i = 0; i < nInsertions; i++)
  {
    bt = btreeInsert(bt, i);
    bt.root = diskRead(bt.id, maxDegree);
  }
  printf("%f\n", (double)((time(NULL) - start)));
  clock_t end = clock();
  double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
  printf("%f\n", time_spent);
  // inorderTraversal(bt.root, maxDegree);
  for (int i = 0; i < maxDegree + 25; i++)
  {
    // printf("%i\n", i);
    assert(btreeSearch(bt.root, i).ok);
  }
}

Wall clock time: 15 sec.
Clock time: 1.546875 sec.

It takes a whooping 15 sec to insert 1000 elements! Anything over 1 sec is useless.

Perhaps experiment with different reading/writing functions in DiskRead. The OS should cache data before flushing to disk, how to take advantage of that?

Test inserts of random values

The tests only reflects the case where values are inserted in sequential order.

Create test(s) where random values are inserted.

Btree does not need to rely on hashtable for lookup

Instead of using a generated id for each node in the tree and look this up in a hashtable, simply use the offset on disk (perhaps name this blockNumber), as the id of a node:

  • diskRead does not need a lookup in a hashtable, as it can use the id directly.
  • diskWrite needs to ask the storage object for the next blockNumber

Also, a storage structure should keep track of nodes that has been freed due to deletion.

Add unit tests

Add tests, preferably with fixtures.

The tests now relies on C assert. Look into Unity for C.

Restructure build process

The build process (Make) should create a bin dir, containing test files, that are all run.

The object files should land in bin/.obj and bin/.obj/tests for application object files and tests respectively.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.