Code Monkey home page Code Monkey logo

Comments (6)

sduskis avatar sduskis commented on September 26, 2024

I think that this is intended behavior on the server side. We only delete for TTL during major compactions, and reading does not currently skip expired values.

from java-bigtable-hbase.

b-2-83 avatar b-2-83 commented on September 26, 2024

Hello,

For the get, I think I can manage by getting the timestamp of the version.

But for a filter, it is falsing the result:

@Test
    public void testFilterTtl() throws IOException, InterruptedException {
        Scan scan = new Scan();
        scan.setFilter(new SingleColumnValueFilter(Bytes.toBytes("cd"), Bytes.toBytes("counter"), CompareOp.EQUAL, Bytes.toBytes(1L)));
        Table table = Bigtable.getInstance().getTable(tableName);
        for (int i = 0; i < 8; i++) {
            ResultScanner resultScan = table.getScanner(scan);
            Result result = resultScan.next();
            System.out.println((System.currentTimeMillis() - debut) + " : " + (result != null ? Bytes.toHex(result.getRow()) : "null"));
            synchronized (this) {
                wait(10000);
            }
        }
    }

Result:

'testBBO', {NAME => 'cd', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '60 SECONDS (1 MINUTE)', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
225 : abcdef
10305 : abcdef
20374 : abcdef
30408 : abcdef
40422 : abcdef
50440 : abcdef
60460 : abcdef
70484 : abcdef

from java-bigtable-hbase.

benjumanji avatar benjumanji commented on September 26, 2024

@sduskis, to clarify:

TTLs are more intended to be used like:

  • You have some timeseries-y like data, and you always query ranges like [T-6months, T).
  • As time marches on your query will be ignoring lots of data so you set a TTL for say 8 months.
  • As your application doesn't read those records anyway, it doesn't really matter when they actually get reaped.

And they are definitely not to be used as a proxy for filtering.

?

Seems OK to me. That's mostly how I use TTLs in Cassandra even if it actually has slightly more precise GC.

from java-bigtable-hbase.

b-2-83 avatar b-2-83 commented on September 26, 2024

From HBase point of view: ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached. This applies to all versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC.

(time is reached)

From bigtable point of view: TTL: The time to live, in seconds, for values stored in the column family. Values are deleted after this time.

(after ttl, could be weeks...)

From my point of view : it is not very hard to check (isFiltered(cell) && (hasNoTTL(cell) || isTTLOk(cell))
For Google that means less bandwidth spent for nothing but a little more CPU spent when TTL is defined.

Here was the main idea for this proxy shortcut : We wanted to speed up the check of the presence of a type of event that occurred for the last 1 hour, 24 hour, 7 days, 31 days.
So instead of browsing all timeseries for 7 days, the simple fact that the 7day-event exists means that at least one event happened for the last 7 day.

1 value tested vs potential of millions events fetched and tested on our servers.

So we will fetch the timeseries...

from java-bigtable-hbase.

dmmcerlean avatar dmmcerlean commented on September 26, 2024

@reagere: It's not clear to me why TTL is helpful for this, why can't you just fetch the most recent version and check its age? Or is the point that you want to be able to skip the row entirely if the value is too old? I'm not even sure HBase will do that, see my last paragraph for an explanation of why.

@ everyone: In general, if your concern is returning too much data from your requests, it should be easy to use Scan.setTimeRange() to restrict the query to cells within the family TTL. It's true that it's not especially hard to perform this check on our side, but there are both performance and correctness ramifications if we do so:

  1. From a performance standpoint, we now need to look up additional information about the table on every read, which is a non-negligible latency hit even for users who don't set any TTLs.

  2. From a correctness standpoint, lengthening the TTL on a family would cause apparently "deleted" data to reappear, as we would have been hiding some of it without having actually reaped it from the underlying files. This has the potential to cause more serious problems than deleting things more slowly than expected. Worse yet, there could be a period of oscillation between the two cases if we aren't careful to look up the table metadata consistently, which compounds the performance problems mentioned above.

The first point can be partially mitigated with a change in our internal architecture, which is non-trivial but will likely be needed for other reasons anyway. However, the second point is trickier, and it's not clear that there's a solution to it that will satisfy everyone.

As a side note, standard HBase may not handle things quite as you might like. For max_versions at least (we haven't tested TTL), HBase actually rejects unreaped but "deletable" cells after running them through filters, not before. So from the perspective of a SingleColumnValueFilter, for example, an apparently "deleted" cell could still "exist."

from java-bigtable-hbase.

b-2-83 avatar b-2-83 commented on September 26, 2024

Hello,

I think it is useful as the number of rows fetched is lower, the amount of data too and you spend less time with another step of filtering.

If I understand your point, I need to browse all the table to check if the last cell is part of my time range or not.

If you want to know quickly if you have an event for the last 7 days, TTL is (in the spec) a solution.

We do filtering for custom time periods, we fetch the data with MultipleColumnPrefix (time is in the column qualifier, and we use multiple column families) We just wanted to speed up common requests.

Thank you

from java-bigtable-hbase.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.