Comments (12)
Yeah pre-fetching META is something that was added in HTable
but that never really made it in AsyncHBase due to the lack of a real need for it. That's because AsyncHBase was mostly target at long-running application servers, which tend to quickly learn the working set of regions they need to work with. But you're right this doesn't play so nicely with something like MR jobs.
I'm not a big fan of seeding the client with pre-looked up data, I feel like we'd be better off telling the client "please prefetch all the regions of this table", and it could just scan the .META.
table and learn as it goes. What do you think about that?
from asynchbase.
Pre-fetching meta was actually turned off by default in HTable. It's a pretty big hassle to get it correct all the time (for large clusters it's bad if you only want to target one table). It's much better to have an on-demad pre-fetch (per table probably).
from asynchbase.
A single (hopefully performant) .META.
scan per client (and so map task for me) might be good enough. It's the O(clients * regions)
probes that are currently killing me. My proposal lops off one dimension, yours does the other.
from asynchbase.
It would be even nicer if we could limit the prefetch to a certain key range. For map jobs that are over a given input split we know the key range a priori, so we could avoid prefetching regions that are outside of the range of regions we care about. This would save even more network traffic and congestion.
from asynchbase.
Yeah I suppose we could add a couple API to HBaseClient
, something along the lines of:
Deferred<Object> prefetchMeta(final byte[] table);
Deferred<Object> prefetchMeta(final byte[] table, final byte[] start, final byte[] stop);
Deferred<Object> prefetchMeta(final String table);
Deferred<Object> prefetchMeta(final String table, final String start, final String stop);
The callback on the Deferred<Object>
would be fired once we're done scanning. The versions without the start/stop argument would simply call the other version, specify an empty start key, and an empty stop key but for a table name that is table + "\0"
.
What do you guys think?
from asynchbase.
That sounds lovely!
from asynchbase.
That sounds great to me, too!
from asynchbase.
Anybody up for sending a pull request?
from asynchbase.
I'll see you, and raise you another spelling out exactly how to run your tests.
from asynchbase.
Would you rather the integration tests go in TestIntegration
or in a new suite?
from asynchbase.
Putting them in TestIntegration
is easier.
from asynchbase.
The tests are in the .test
sub-package, meaning I'm unable to make HBaseClient#getRegion
package-private in order to see it from the tests. I need a way to observe the cache being populated. In the mocked unit tests powermock's Whitebox
is used to invasively inspect state, which I'll do for now.
Have any opinions on how I should inspect the cache state?
from asynchbase.
Related Issues (20)
- Asynchbase MultiAction deserializes response from batch putRequest incorrectly HOT 1
- PR build failed with Java environment error HOT 1
- AssertionError: "Must not be used." HOT 1
- IN version2.3.0 HOT 1
- An exception was thrown by TimerTask. java.lang.OutOfMemoryError: Java heap space HOT 1
- Should buffering gets be possible? HOT 1
- When a scanner lease expires, it will retry request same regionserver endless. RS too busy! HOT 16
- can't set async wal?
- Mutation Attributes support
- Appends with no returns and few errors can break deserialization
- Can't re-write data correctly HOT 1
- Can you open chat room
- Why ConnectionResetException happen? HOT 1
- SparkStreaming with mapPartitions use client find error ”Too many open files“ HOT 1
- Comparison method violates its general contract
- org.hbase.async.NonRecoverableException: Too many attempts: OpenScannerRequest HOT 1
- How do i use the PageFilter
- Broken pipe error with "hbase.rpc.protection: privacy" HOT 1
- Can this support the kerberos authentication of hbase 2.2 HOT 1
- Not able to create jar
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from asynchbase.