Comments (5)
Using PR #17, I've rerun the same queries, showing some improvement when reading in qbeast
format.
In the last column, I added the percentages comparing to delta format (as previously) and the difference to the main branch in qbeast format.
Query | delta format, read in delta |
qbeast format, read in qbeast (main 15667c2) |
qbeast format, read in qbeast (PR #17) |
---|---|---|---|
3 | 7.822s. | 20.635s. (263,80%) | 13.203s. (168,79%) (-95,01%) |
7 | 18.839s. | 35.546s. (188,68%) | 24.319s. (129,09%) (-59,59%) |
15 | 7.201s. | 24.620s. (341,89%) | 16.481s. (228,87%) (-113,02%) |
Detailed values (AVG, MAX and MIN) for the execution
Query | AVG | MIN | MAX |
---|---|---|---|
3 | 13.203s. | 12.462s. | 16.989s. |
7 | 24.319s. | 23.121s. | 27.730s. |
15 | 16.481s. | 15.152s. | 18.936s. |
from qbeast-spark.
@eavilaes can you provide more info (e.g. a quick guide) on how you run these tests?
from qbeast-spark.
Well, the process is a bit complicated to handle (welcome to the world of benchmarking):
As I mentioned, I'm using Qbeast-io/spark-sql-perf-private together with our automated-deployments tools. The last one contains two directories: a scala app
ready to run TPC-DS benchmarks, which uses spark-sql-perf under the hood, and other for shell scripts
to make it easier to run the app with all the dependencies needed.
Note that these two repositories are currently for internal use, but they are based on the idea of databricks' spark-sql-perf.
As per your quote, the big refactor of #39, which includes the update to Delta version to 1.0.0, and per #51 (thanks, I can now index big amounts of data) I ran these tests again, and you can see the results below:
Query | delta format, read in delta |
qbeast format, read in qbeast (after PR #17) |
qbeast format, read in qbeast (#51 1d4812a) |
---|---|---|---|
3 | 7.152s. | 13.203s. | 12.094s. |
7 | 16.504s. | 24.319s. | 18.666s. |
15 | 6.880s. | 16.481s. | 15.448s. |
To be mentioned: for the last column of the table, all the TPC-DS tables have been indexed in qbeast
format using the primary key of the table, with a cubeSize of 2.000.000 (~100Mib per cube). The queries have been executed 10 times, as previously done.
Detailed values (AVG, MAX and MIN)
Data written in qbeast
format, read in qbeast
format (#51 1d4812a)
Query | AVG | MAX | MIN |
---|---|---|---|
q3 | 12.094s. | 18.893s. | 10.879s. |
q7 | 18.666s. | 24.063s. | 17.670s. |
q15 | 15.448s. | 19.026s. | 14.584s. |
from qbeast-spark.
I don't think this is relevant, at least as an issue. We should move it to a discussion, probably. Do you agree? @eavilaes @cugni
from qbeast-spark.
I don't think this is relevant, at least as an issue. We should move it to a discussion, probably. Do you agree? @eavilaes @cugni
Yep! I think that's more a discussion than a real issue. We can move it.
from qbeast-spark.
Related Issues (20)
- Add metastores documentation
- Add method to retrieve Revision Information HOT 1
- Using different hash seed for each revision
- Error on sampling when using <columnName>:<type> in columnsToIndex
- Unify Table Properties structure and storage location
- Add QbeastTable.forTable method
- Update CONTRIBUTING.md HOT 1
- Update README with links to the documentation HOT 1
- Add documentation about the Release process HOT 1
- Update some markdown files of Qbeast-spark repository
- Broken link in main README HOT 1
- TBLPROPERTIES not updated on Spark Catalog
- TBLPROPERTIES on new table are saved partially
- Add Option for rollupCubeSize
- Unify Create Qbeast Table in one transaction
- Update links from the README file
- Broken links in docs
- Need to change html labels
- Fix AdvancedConfiguration.md
- Update CODE_OF_CONDUCT.md header
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qbeast-spark.