Code Monkey home page Code Monkey logo

jbindex's Introduction

example workflow

jbindex

Goal is to provide easy to use key value map for billions of records using just one directory and some space.

It's simple fast index. Work with index should be split into phases of:

  • Writing data to index. All data that should be stored in index should be send to index.
  • Building index. In this phase data are organized for fast access.
  • Search through index. In this phase it's not possible to alter data in index.

Index is not thread safe.

Useful links

Basic work with index

Index could be in following states:

Index methods

Index should be created with builder, which make index instance. For example:

final Index<Integer, String> index = Index.<Integer, String>builder()
        .withDirectory(directory)
        .withKeyClass(Integer.class)
        .withValueClass(String.class)
        .build();

Index states

Interruption of process of writing data to index could lead to corruption of entire index.

Development

Mockito requires reflective access to non-public parts in a Java module. It could be manually open by passing following parameter as jvm parameter:

--add-opens=java.base/java.lang=ALL-UNNAMED

How to get segment disk size

On apple try:

diskutil  info /Volumes/LaCie

jbindex's People

Contributors

jajir avatar

Stargazers

 avatar

Watchers

 avatar

jbindex's Issues

Make builder more easy to use

Add some default values, allows to insert just data type, without specifying descriptor. Make it work even without configuring bloom filter. Add option to define type withou desceiptor, just with class. Add default values per data type.

Stream is not closed

In class SstIndexImpl method getStream() return stream that doesn't close underlying resources.

Clearing of segment cache requires to read it into memory

Segment cache could be cleared in a following was:

final SegmentCache<K, V> sc = new SegmentCache<>(
        segmentFiles.getKeyTypeDescriptor(), segmentFiles,
        segmentPropertiesManager);
sc.clear();

It's not effective. Cache should not be loaded into memory. It's in SegmentFullWriter.close().

Add recovery

Use wal for cache with puts and deletes. After successful flushndelete wal and start new one. When index starts after ctrl+c than recovery from wal.

Improve segment search object caching

Segment contains few objects that helps with search process. It's cache, bloom filter and scarce index. All of them should be cached on one place and still be accessible from segment.

Rewrite segment iterating tryAdvance

hasNext and next methods are not suitable, they require preloading element it could faul, when next element is unloaded from delta cache. Use tryAdvance method

Add flush() to index

Add flush() to index. It allows "commit" changes to work with them further. Document it.

Index iterator is inconsistent

Segment iterator should work after flushing. Because of that it can't iterate through data in cache. Iterator should just use data from segment. Event read part should check it's value in cache. There will be problem in case of tombstone in cache..

Segment splitting

When there are multiple duplicated write operations than properties hold invalid value of number of keys in segment. Because of that splitter segments are not equally sized.

Segment iterator is not consistent

When iterator starts read values, one value is changed that old value is returned. For example there are pairs in segment [a,1],[b,1],[c,1] following commands are executed:

iterator.next() --> [a,1]
put(b,2)
iterator.next() --> [b,1]

Last call should return [b,2]

Flush to segment, should compact just once

When flushing pairs to segment. There should be introduces, two parameters fro delta cache size:

  • max size for searching and reading
  • max size for flushing, it temporally allows to exceed previous value to prevent from repeating compacting during flushing

Improve logging format

In logs are number formatted simply 3740512. To improve readability it should be 3 740 512.

Splitting with delta cache full of tombstones fail

When delta cache has similar size to main index and delta cache is full of delete command of already stored keys. In that case splitting function can't correctly estimate where is half of keys and could fail.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.