Comments (6)
I think that this is intended behavior on the server side. We only delete for TTL during major compactions, and reading does not currently skip expired values.
from java-bigtable-hbase.
Hello,
For the get, I think I can manage by getting the timestamp of the version.
But for a filter, it is falsing the result:
@Test
public void testFilterTtl() throws IOException, InterruptedException {
Scan scan = new Scan();
scan.setFilter(new SingleColumnValueFilter(Bytes.toBytes("cd"), Bytes.toBytes("counter"), CompareOp.EQUAL, Bytes.toBytes(1L)));
Table table = Bigtable.getInstance().getTable(tableName);
for (int i = 0; i < 8; i++) {
ResultScanner resultScan = table.getScanner(scan);
Result result = resultScan.next();
System.out.println((System.currentTimeMillis() - debut) + " : " + (result != null ? Bytes.toHex(result.getRow()) : "null"));
synchronized (this) {
wait(10000);
}
}
}
Result:
'testBBO', {NAME => 'cd', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '60 SECONDS (1 MINUTE)', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
225 : abcdef
10305 : abcdef
20374 : abcdef
30408 : abcdef
40422 : abcdef
50440 : abcdef
60460 : abcdef
70484 : abcdef
from java-bigtable-hbase.
@sduskis, to clarify:
TTLs are more intended to be used like:
- You have some timeseries-y like data, and you always query ranges like
[T-6months, T)
. - As time marches on your query will be ignoring lots of data so you set a TTL for say 8 months.
- As your application doesn't read those records anyway, it doesn't really matter when they actually get reaped.
And they are definitely not to be used as a proxy for filtering.
?
Seems OK to me. That's mostly how I use TTLs in Cassandra even if it actually has slightly more precise GC.
from java-bigtable-hbase.
From HBase point of view: ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached. This applies to all versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC.
(time is reached)
From bigtable point of view: TTL: The time to live, in seconds, for values stored in the column family. Values are deleted after this time.
(after ttl, could be weeks...)
From my point of view : it is not very hard to check (isFiltered(cell) && (hasNoTTL(cell) || isTTLOk(cell))
For Google that means less bandwidth spent for nothing but a little more CPU spent when TTL is defined.
Here was the main idea for this proxy shortcut : We wanted to speed up the check of the presence of a type of event that occurred for the last 1 hour, 24 hour, 7 days, 31 days.
So instead of browsing all timeseries for 7 days, the simple fact that the 7day-event exists means that at least one event happened for the last 7 day.
1 value tested vs potential of millions events fetched and tested on our servers.
So we will fetch the timeseries...
from java-bigtable-hbase.
@reagere: It's not clear to me why TTL is helpful for this, why can't you just fetch the most recent version and check its age? Or is the point that you want to be able to skip the row entirely if the value is too old? I'm not even sure HBase will do that, see my last paragraph for an explanation of why.
@ everyone: In general, if your concern is returning too much data from your requests, it should be easy to use Scan.setTimeRange() to restrict the query to cells within the family TTL. It's true that it's not especially hard to perform this check on our side, but there are both performance and correctness ramifications if we do so:
-
From a performance standpoint, we now need to look up additional information about the table on every read, which is a non-negligible latency hit even for users who don't set any TTLs.
-
From a correctness standpoint, lengthening the TTL on a family would cause apparently "deleted" data to reappear, as we would have been hiding some of it without having actually reaped it from the underlying files. This has the potential to cause more serious problems than deleting things more slowly than expected. Worse yet, there could be a period of oscillation between the two cases if we aren't careful to look up the table metadata consistently, which compounds the performance problems mentioned above.
The first point can be partially mitigated with a change in our internal architecture, which is non-trivial but will likely be needed for other reasons anyway. However, the second point is trickier, and it's not clear that there's a solution to it that will satisfy everyone.
As a side note, standard HBase may not handle things quite as you might like. For max_versions at least (we haven't tested TTL), HBase actually rejects unreaped but "deletable" cells after running them through filters, not before. So from the perspective of a SingleColumnValueFilter, for example, an apparently "deleted" cell could still "exist."
from java-bigtable-hbase.
Hello,
I think it is useful as the number of rows fetched is lower, the amount of data too and you spend less time with another step of filtering.
If I understand your point, I need to browse all the table to check if the last cell is part of my time range or not.
If you want to know quickly if you have an event for the last 7 days, TTL is (in the spec) a solution.
We do filtering for custom time periods, we fetch the data with MultipleColumnPrefix (time is in the column qualifier, and we use multiple column families) We just wanted to speed up common requests.
Thank you
from java-bigtable-hbase.
Related Issues (20)
- bigtable.hbase.TestBigtableBufferedMutator: testPut failed HOT 4
- bigtable.hbase.TestBigtableTable: many tests failed HOT 4
- bigtable.hbase.TestBigtableBufferedMutator: testClose failed HOT 4
- bigtable.hbase.TestBigtableBufferedMutator: testIncrement failed HOT 4
- bigtable.hbase.TestBigtableBufferedMutator: testBulkMultipleRequests failed HOT 4
- bigtable.hbase.TestBigtableBufferedMutator: testBulkSingleRequests failed HOT 4
- bigtable.hbase.TestBigtableBufferedMutator: testAppend failed HOT 4
- bigtable.hbase.TestBigtableBufferedMutator: testDelete failed HOT 4
- bigtable.hbase.TestBigtableBufferedMutator: testInvalidPut failed HOT 4
- bigtable.hbase.wrappers.veneer.TestBigtableHBaseVeneerSettings: testTimeoutBeingPassed failed HOT 4
- bigtable.hbase.wrappers.veneer.TestBigtableHBaseVeneerSettings: testWhenRetriesAreDisabled failed HOT 4
- bigtable.hbase.wrappers.veneer.TestBigtableHBaseVeneerSettings: testWithNullCredentials failed HOT 4
- bigtable.hbase.wrappers.veneer.TestBigtableHBaseVeneerSettings: testBulkMutationConfiguration failed HOT 4
- bigtable.hbase.wrappers.veneer.TestBigtableHBaseVeneerSettings: testDefaultThrottlingDisabled failed HOT 4
- bigtable.hbase.wrappers.veneer.TestBigtableHBaseVeneerSettings: testAdminSettingsChannelPool failed HOT 4
- bigtable.hbase.wrappers.veneer.TestBigtableHBaseVeneerSettings: testDataSettingsChannelPool failed HOT 4
- bigtable.hbase.wrappers.veneer.TestBigtableHBaseVeneerSettings: testAdminSettingsBasicKeys failed HOT 4
- bigtable.hbase.wrappers.veneer.TestBigtableHBaseVeneerSettings: testDataSettingsBasicKeys failed HOT 4
- bigtable.hbase.wrappers.veneer.SharedDataClientWrapperFactoryTest: testChannelsAreShared failed HOT 4
- bigtable.hbase.wrappers.veneer.TestBulkMutationVeneerApi: testWhenBatcherIsClosed failed
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from java-bigtable-hbase.