Code Monkey home page Code Monkey logo

bigbase's People

Contributors

vladrodionov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

xiashuijun

bigbase's Issues

Block cache: Efficient cache warm up.

How to pre-cache data efficiently. Current implementation of a BlockCache does not allow to specify which generation space block should be put into (young or tenured).

New compression for block cache

We need to investigate new, more advanced compression scheme. Ideally, with decompression speed comparable to LZ4. Compression may be two stage:
first, when block is inserted - use fast/regular compression bq thread will be running and re-compressing blocks
Large block cache (10s-100sGB )can have hot-warm-cold zones. Each zone can have different compression schemes.
Another additional option is to try columnar representation of blocks in memory.

Block cache: add block checksum support for external storage

We need additional check-sum to verify data integrity.
When Linux has hardware failure, some garbage can be written to data file, that is why we need check-sums to verify data integrity on reading. When check-sum fails, file needs to be truncated to the position of an end of last valid block.

For check-sum generation, we have com.koda.util.Utils class (hash functions). We will probably need some additional code.

Block cache: L3 FATAL no space left on device

We need special treatment for this error:
DISK STORAGE =137686247146
ITEMS IN STORE=15840571
14/05/25 12:31:08 FATAL storage.FileExtStorage: java.io.IOException: No space left on device
java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
at com.koda.integ.hbase.storage.FileExtStorage$FileFlusher.run(FileExtStorage.java:303)
14/05/25 12:31:08 FATAL storage.FileExtStorage: file-flusher thread died.
14/05/25 12:31:08 FATAL storage.FileExtStorage: java.io.IOException: No space left on device
java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
at com.koda.integ.hbase.storage.FileExtStorage$FileFlusher.run(FileExtStorage.java:303)
14/05/25 12:31:08 FATAL storage.FileExtStorage: file-flusher thread died.

Couple notes:

  1. I have no idea why this happened. The device raw size was 148G, for cache I allocated 140G. df -h shows 100% utilization, direct total count - 138G, du -s - 129G. Linux?

Block cache: implement posix_fadvise for file read (SSD) (bypass OS page cache)

We do not need page cache for SSD at all. SOme other perf tips:

SSD optimization tips

  • Your data are placed on XFS filesystem (mounted with "noatime,nodiratime,nobarrier" options) and your storage device is managed by "noop" or "deadline" I/O scheduler (see previous tests for details).
  • You're using O_DIRECT within your InnoDB config (don't know yet if using 4K page size will really bring some improvement over 16K as there will be x4 times more pages to manage within the same memory space, which may require x4 times more lock events and other overheads.. - while in term of Writes/sec potential performance the difference is not so big! - from the presented test results in most cases it's only 80K vs 60K writes/sec -- but of course a real result from a real database workload will be better ;-)).
  • And, finally, be sure your write activity is not focused on a single data file! - they should at last be more or equal than 4 to be sure your performance is not lowered from the beginning by the filesystem layer!

Koda: NativeMemory keeps statis Malloc

Only single memory allocator is allowed and no global memory limit, expansion factor etc can be specified. We need Memory allocator per OffHeap instance, way to set global limit (to enable compaction), specify expansion factor, minimum slab size, total slabs.
For example, for block cache min slab size can be 4K, for row cache - 64 bytes (not 16). We can set expansion factor to 1.1 instead of default 1.2, thus saving ~ 10% of space.

Koda: small cache size issues

When cache size is small (less say < 20-40MB) nothing works actually. See CacheDirectCompressionTest. I think we need to impose restriction minimum size on cache

Row cache: Implement merge cached rows on update

Row cache caches row:col entirely in memory and serves as L1 cache in a multilevel caching architecture. The current limitation is cached object gets purged from cache every time 'row:col' is updated. We could merge new values with existing cached one on update.

HBase new compaction (local)

I think something similar has been implemented by FB.
NN/DN/HDFS enhancement need to support file 'registration' (now we create, open, delete) - we need 'register' as well . Agent creates all replicas locally on K data nodes (K - is replication factor), agent register new file with DNs and NN. How? This new API will allow to reduce or even eliminate network traffic during HBase compaction.
Check HBase JIRA.

Consider using single FileChannel per file in file - based L3 cache

FileChannel.read(ByteBuffer, long pos) can be used for parallel reading.

See FileExtStorage implementation. We keep list of open RandomAccessFile handles per actual file. Default size is 5. We need this to avoid thread issues. The assumption was accessing RandomAccessFile from multiple threads is not thread safe. It seems that having just one FileChannel per file suffice.

Cache autotuning

This is complex feature: L1 and L2 RAM caches share the same RAM and size of both caches are dynamically adjusted to workload.

HBase: proposal for prefix API

Read here: RocksDB Wiki. Perfix API is similar to Amazon Dynamo DB NoSQL API, where row key is split into two parts: hash-prefix and scan - suffix. This will require custom split support as well.

Block cache: add unit test for save/load

Added test. Observation:

  • Does not shutdown when cache is being written.
  • Fail on read but its probably because HBaseTestingUtility creates new data store every time

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.