Comments (12)
The performance results were favorable for switching to block based format.
https://gerrit.readyset.name/c/readyset/+/5206
https://readyset-workspace.slack.com/archives/C05471W2M44/p1687374146825119
from readyset.
It's REA-2714 (mentioned above) — going to perform at least the PlainTable vs BlockBased performance to validate this is OK to switch.
from readyset.
We had a task slated to benchmark other table formats against PlainTable for both write and read latency specifically for this reason - I think @Ethan might have been working on it but I'm not sure where it ended up
from readyset.
Thanks. Not much to go by out there I'm afraid.
from readyset.
commit 3e668db2de9a66e9788371de981377fbb19b17ed
Author: Martin Ek <[email protected]>
Date: Wed May 2 23:13:11 2018 +0200
Use rocksdb's plain table SST format
Looks like this change was made back in the noria days and there isn't much context as to why it was chosen.
If i had to take a wild guess, the wording in https://github.com/facebook/rocksdb/wiki/PlainTable-Format "PlainTable is a RocksDB's SST file format optimized for low query latency on pure-memory or really low-latency media." sounds appealing for a research project focused on low query latency.
They also had the advantage of running on smaller datasets and avoiding the "file is too large" problem that real world scenarios can run into.
from readyset.
@KwilLuke Can you git-blame to see who might have made the choice to use PlainTable instead of the default?
from readyset.
PlainTable Format is the only rocksdb SST format that has this size restriction, and it is a hard limit due to the data structure using 32-bit integers in indexing.
It seems like we should be able to configure rocksdb to avoid making a file greater than the 2GiB limit in compaction, and from what I can tell we set it to make 256MiB files at most, but there must be something that is at least temporarily making one too large. I'm not exactly sure where yet, and it is a bit slow to reproduce this repeatedly.
As we already have a hunch that BlockBased Table Format may be a better choice (it is the default for rocksdb), I will try using that and confirm that it can handle the failure scenario in this ticket, as well as sanity check the performance a bit, which will cover some of the matrix of REA-2714.
from readyset.
cc: @jasobrown Who I know was interested in RocksDB settings as well.
from readyset.
So, are you looking at implementing REA-2900 first?
No, I was going to work on this one first.
I thought we might already be invoking "PrepareForBulUpload" in our code.
We don't directly call this currently. We do similar options-tweaking that PrepareForBulkUpload does to improve performance, though. I tried calling this and didn't see any significant performance improvement.
from readyset.
So, are you looking at implementing REA-2900 first? I thought we might already be invoking "PrepareForBulUpload" in our code. Seems worthwhile for us to try that out first, I agree.
from readyset.
If we implement REA-2900 before this, it may either resolve this or have a different signature, with compaction failing with the same error after opening PersistentState rather than during.
from readyset.
Fixed in 8a32b30
from readyset.
Related Issues (20)
- Problem when starting readyset with mysql HOT 2
- Clarify standalone as preferred means to run ReadySet from source
- no data was found
- Query Support Checker Tool HOT 1
- Update Cluster Types
- Document using Railway w/ReadySet HOT 4
- `show caches` breaks after seeing cached query alias HOT 1
- Caches recreated with wrong query string after backwards-incompatible upgrade
- Provide a more clear error when snapshotting a non-supported version of a database
- Group and Vectorize Filter node operations
- JIT compiled code for efficient fused filter evaluation operation HOT 1
- Show ReadySet Status improvements
- Include an instance specific unique ID in logs
- Query ID is treated as an identifier in some SQL extensions and a string literal in others
- 'show readyset status' shows Incorrect DB connection value when connecting to an Aurora instance
- Re-evaluate whether caches should have names HOT 2
- Error when connecting to db that doesn't have a password requirement
- Allow running logs without timestamps
- ctrl-c handler logs broken pipe errors
- Make non-blocking reads the default
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from readyset.