The bigbase from vladrodionov

LRU Eviction + fast eviction processing does not work for large %% of IMMORTAL objects

When we have 50% of cache population IMMORTAL (pinned) object default eviction processing OffHeapCache.processEvictionFast in LRU does not work 100%. It seems that 25% of IMMORTAL is OK. Consider switching to more complex eviction processing.

Block cache: Efficient cache warm up.

How to pre-cache data efficiently. Current implementation of a BlockCache does not allow to specify which generation space block should be put into (young or tenured).

New compression for block cache

We need to investigate new, more advanced compression scheme. Ideally, with decompression speed comparable to LZ4. Compression may be two stage:
first, when block is inserted - use fast/regular compression bq thread will be running and re-compressing blocks
Large block cache (10s-100sGB )can have hot-warm-cold zones. Each zone can have different compression schemes.
Another additional option is to try columnar representation of blocks in memory.

HBase: replace memstore with off-heap implementaion (Karma)

This feature is becoming more and more important: See this JIRA: https://issues.apache.org/jira/browse/HBASE-10191

Koda: add cache name to cache dump file name (store/load/snapshots)

StoreConfiguration has a store name, which can be used.
TODO: Write test where 2 different caches saved/loaded from the same directory

Block cache: convert current StorageRecycler to LRUStorageRecycler

Make Storage Recycler pluggable. Check code - its probably done.

Integration with Hadoop/HBase metrics API

All cache levels (L1/L2/L3)

Block cache: FileExtStorage.getStorageSize() reports stale data

It does not take into account the current data file size.

yamm : malloc with compaction SEGFAULT

The interim solution: fall back to non-compacting allocator.

Block cache: Reimplement cache similar to LruBlockCache (in memory, scan resistant)

Koda: Fix or remove broken tests

If we still have any. Need to verify.

Block cache: investigate hit ratio for L3 in OffHeapBlopckCachePerfTest

It decreases over time. Suspect is in memory L3-RAM reference cache.

Block cache: add block checksum support for external storage

We need additional check-sum to verify data integrity.
When Linux has hardware failure, some garbage can be written to data file, that is why we need check-sums to verify data integrity on reading. When check-sum fails, file needs to be truncated to the position of an end of last valid block.

For check-sum generation, we have com.koda.util.Utils class (hash functions). We will probably need some additional code.

Introduce BLOCKCACHE-L3 config option for HBase table/column family

BLOCKCACHE-L3 = true means that blocks will be cached only in L3, bypassing L2. This to avoid L2 cache trashing when L2 size is not enough to keep majority of hot blocks but L3 cache is large enough for the purpose.

Block cache: Reconsider config for on heap part

Need carefully to calculate required amount for non-DATA blocks. It seems that the current recommended - 4% from off heap size is too large.

HBase: early failure detection

Google "accrual failure detection". Check Cassandra failure detection.
Under HBase JIRA (to be opened)

Koda: investigate eviction impl. It seems that we are always at 95% of capacity

Or increase default min to 98%. We need every % of RAM

fix JUnit tests which are perf and stress

They run too long.

Koda: enforce eviction based on space (not items)

Koda: Add background recompression of objects in cache

Idea is to use the fastest compression when object is being placed into cache and later re-compress it using more advanced version (LZ4 =>LZ4-HC).

Some optimizations:
https://groups.google.com/forum/#!topic/lz4c/fdlAfZzSAr4

Block cache: preserve cache during compaction

Major feature. During compaction (minor and major) block cache invalidates old blocks but new ones are not added to the cache. We need compaction to be block hotness - aware to put new block into the cache if block 'hotness' index is high. The placement may be into L2/L3 or into L3 only.

The work will done under Apache HBase JIRA:
https://issues.apache.org/jira/browse/HBASE-5263

Block cache: L3 FATAL no space left on device

We need special treatment for this error:
DISK STORAGE =137686247146
ITEMS IN STORE=15840571
14/05/25 12:31:08 FATAL storage.FileExtStorage: java.io.IOException: No space left on device
java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
at com.koda.integ.hbase.storage.FileExtStorage$FileFlusher.run(FileExtStorage.java:303)
14/05/25 12:31:08 FATAL storage.FileExtStorage: file-flusher thread died.
14/05/25 12:31:08 FATAL storage.FileExtStorage: java.io.IOException: No space left on device
java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
at com.koda.integ.hbase.storage.FileExtStorage$FileFlusher.run(FileExtStorage.java:303)
14/05/25 12:31:08 FATAL storage.FileExtStorage: file-flusher thread died.

Couple notes:

I have no idea why this happened. The device raw size was 148G, for cache I allocated 140G. df -h shows 100% utilization, direct total count - 138G, du -s - 129G. Linux?

Block cache: implement posix_fadvise for file read (SSD) (bypass OS page cache)

We do not need page cache for SSD at all. SOme other perf tips:

SSD optimization tips

Your data are placed on XFS filesystem (mounted with "noatime,nodiratime,nobarrier" options) and your storage device is managed by "noop" or "deadline" I/O scheduler (see previous tests for details).
You're using O_DIRECT within your InnoDB config (don't know yet if using 4K page size will really bring some improvement over 16K as there will be x4 times more pages to manage within the same memory space, which may require x4 times more lock events and other overheads.. - while in term of Writes/sec potential performance the difference is not so big! - from the presented test results in most cases it's only 80K vs 60K writes/sec -- but of course a real result from a real database workload will be better ;-)).
And, finally, be sure your write activity is not focused on a single data file! - they should at last be more or equal than 4 to be sure your performance is not lowered from the beginning by the filesystem layer!

Create custom AWS AMI with SSD (cache)/EBS storage

We need for testing L3 cache, relatively fast storage. IO - provisioned EBS can deliever several thousands IOPS on a budget and SSD can be a local cache L3.
See instruction here: http://hortonworks.com/blog/deploying-hadoop-cluster-amazon-ec2-hortonworks/

New eviction policy for block cache (scan resistant)

Yep, as an option.

Koda: NativeMemory keeps statis Malloc

Only single memory allocator is allowed and no global memory limit, expansion factor etc can be specified. We need Memory allocator per OffHeap instance, way to set global limit (to enable compaction), specify expansion factor, minimum slab size, total slabs.
For example, for block cache min slab size can be 4K, for row cache - 64 bytes (not 16). We can set expansion factor to 1.1 instead of default 1.2, thus saving ~ 10% of space.

Block cache: Test block cache snapshots

The code is there but no tests yet have been done.

Koda: small cache size issues

When cache size is small (less say < 20-40MB) nothing works actually. See CacheDirectCompressionTest. I think we need to impose restriction minimum size on cache

Row cache: Implement merge cached rows on update

Row cache caches row:col entirely in memory and serves as L1 cache in a multilevel caching architecture. The current limitation is cached object gets purged from cache every time 'row:col' is updated. We could merge new values with existing cached one on update.

Investigate RoksDB as network-based L3 storage

http://rocksdb.org/ Flash SSD optimized K-V store

HBase new compaction (local)

I think something similar has been implemented by FB.
NN/DN/HDFS enhancement need to support file 'registration' (now we create, open, delete) - we need 'register' as well . Agent creates all replicas locally on K data nodes (K - is replication factor), agent register new file with DNs and NN. How? This new API will allow to reduce or even eliminate network traffic during HBase compaction.
Check HBase JIRA.

Block cache: L3-RAM reference cache - bulk removal of stale references

When storage recycle deletes file all references to this file must be removed from L3-RAM reference cache.

Koda : fix LRU2Q eviction

Row cache: Expose row cache to the server-side app (coprocessors)

We need efficient way to retrieve data from row cache from inside coprocessors.

Consider using single FileChannel per file in file - based L3 cache

FileChannel.read(ByteBuffer, long pos) can be used for parallel reading.

See FileExtStorage implementation. We keep list of open RandomAccessFile handles per actual file. Default size is 5. We need this to avoid thread issues. The assumption was accessing RandomAccessFile from multiple threads is not thread safe. It seems that having just one FileChannel per file suffice.

Koda: Serializer for small arrays (< 256 in length)

The standard ByteArraySerializer takes 4 bytes for array length. We can save 3 bytes for small arrays.

Cache autotuning

This is complex feature: L1 and L2 RAM caches share the same RAM and size of both caches are dynamically adjusted to workload.

Block cache: Efficient cacheOnWrite

Current cacheOnWrite put data into young gen space. Which is 50% of overall cache. We can not bypass this yet.

Koda : Improve format of a compressed value

If codec == null - do not put 4 bytes length of a compressed stream (0) - save 4 bytes.

Koda: storage optimizer fails if no compression was set by default

When optimizer works on non-compressed data. Check PerfTest with compression disabled and optimizer enabled

Block cache: NetworkStorage (L3) support

File Storage (L3) is in 1.0 (1.1) release already. This is experimental feature (3.0)

HBase: proposal for prefix API

Read here: RocksDB Wiki. Perfix API is similar to Amazon Dynamo DB NoSQL API, where row key is split into two parts: hash-prefix and scan - suffix. This will require custom split support as well.

Block cache: improve external storage in RAM overhead

Currently, its ~ 140 bytes per one block because we do not have efficient serializer for StorageHandle.

Create custom AWS AMI with IOP EBS storage

We need for testing L3 cache, relatively fast storage. IO - provisioned EBS can deliever several thousands IOPS on a budget.
See instruction here: http://hortonworks.com/blog/deploying-hadoop-cluster-amazon-ec2-hortonworks/

block-cache: BlockCacheStorageSimpleRegionTests fails when OffHeapCache.clear() is called

Actually, OffHeapCache.clear() has some serious issues and either MUST be rewritten or fixed. This can be the sign of a MORE serious problem in OffHeapCache: memory alloc/free/alloc sequence can fail.

Block cache: rack-aware NetworkStorage

Rack awareness is good to have to avoid inter - rack up-link bottleneck.

Block cache: Consider using on heap cache for HOT data, configurable

If it would be configurable per table/cf that would be another story. One could then have smaller, hot table still use the block cache and have larger not-so-tables use the off heap cache; and thus we'd be able to make use of RAM sizes of 128gb or more.

Block cache: add unit test for save/load

Added test. Observation:

Does not shutdown when cache is being written.
Fail on read but its probably because HBaseTestingUtility creates new data store every time

Block Cache : finalize tests of L3

L3 (SSD) needs to be tested standalone and in cluster.

vladrodionov / bigbase Goto Github PK

bigbase's People

Contributors

Stargazers

Watchers

Forkers

bigbase's Issues

Recommend Projects

Recommend Topics

Recommend Org