Code Monkey home page Code Monkey logo

Comments (12)

tsuna avatar tsuna commented on July 17, 2024

Yeah pre-fetching META is something that was added in HTable but that never really made it in AsyncHBase due to the lack of a real need for it. That's because AsyncHBase was mostly target at long-running application servers, which tend to quickly learn the working set of regions they need to work with. But you're right this doesn't play so nicely with something like MR jobs.

I'm not a big fan of seeding the client with pre-looked up data, I feel like we'd be better off telling the client "please prefetch all the regions of this table", and it could just scan the .META. table and learn as it goes. What do you think about that?

from asynchbase.

elliottneilclark avatar elliottneilclark commented on July 17, 2024

Pre-fetching meta was actually turned off by default in HTable. It's a pretty big hassle to get it correct all the time (for large clusters it's bad if you only want to target one table). It's much better to have an on-demad pre-fetch (per table probably).

from asynchbase.

phs avatar phs commented on July 17, 2024

A single (hopefully performant) .META. scan per client (and so map task for me) might be good enough. It's the O(clients * regions) probes that are currently killing me. My proposal lops off one dimension, yours does the other.

from asynchbase.

b4hand avatar b4hand commented on July 17, 2024

It would be even nicer if we could limit the prefetch to a certain key range. For map jobs that are over a given input split we know the key range a priori, so we could avoid prefetching regions that are outside of the range of regions we care about. This would save even more network traffic and congestion.

from asynchbase.

tsuna avatar tsuna commented on July 17, 2024

Yeah I suppose we could add a couple API to HBaseClient, something along the lines of:

  Deferred<Object> prefetchMeta(final byte[] table);
  Deferred<Object> prefetchMeta(final byte[] table, final byte[] start, final byte[] stop);
  Deferred<Object> prefetchMeta(final String table);
  Deferred<Object> prefetchMeta(final String table, final String start, final String stop);

The callback on the Deferred<Object> would be fired once we're done scanning. The versions without the start/stop argument would simply call the other version, specify an empty start key, and an empty stop key but for a table name that is table + "\0".

What do you guys think?

from asynchbase.

phs avatar phs commented on July 17, 2024

That sounds lovely!

from asynchbase.

b4hand avatar b4hand commented on July 17, 2024

That sounds great to me, too!

from asynchbase.

tsuna avatar tsuna commented on July 17, 2024

Anybody up for sending a pull request?

from asynchbase.

phs avatar phs commented on July 17, 2024

I'll see you, and raise you another spelling out exactly how to run your tests.

from asynchbase.

phs avatar phs commented on July 17, 2024

Would you rather the integration tests go in TestIntegration or in a new suite?

from asynchbase.

tsuna avatar tsuna commented on July 17, 2024

Putting them in TestIntegration is easier.

from asynchbase.

phs avatar phs commented on July 17, 2024

The tests are in the .test sub-package, meaning I'm unable to make HBaseClient#getRegion package-private in order to see it from the tests. I need a way to observe the cache being populated. In the mocked unit tests powermock's Whitebox is used to invasively inspect state, which I'll do for now.

Have any opinions on how I should inspect the cache state?

from asynchbase.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.