Comments (2)
Here are some issues found while working on the tolerance feature.
- Right now the tolerance is defined for the mean (or avg) function only. A similar concept for other types of aggregate functions like min, max, etc can have a different name and a different range of admissible values (for tolerance it is [0, 1]).
- The tolerance is defined as
sampleDeviation * zScore / mean / sqrt(sampleSize)
. It is not clear if the tolerance is still efficient if the mean is 0 or is close to 0. - Current implementation just extracts the column with avg function and calculate the mean using samples of the whole table. Suppose the user specified
val df = spark.read.format("qbeast").load("...").where("value > 100").agg(avg("value")).tolerance(0.01)
. The sampling should apply the specified where condition otherwise the returned average can be wrong. - zScore is hardcoded, should it be a parameter specified by user?
from qbeast-spark.
We should also define more precisely what kind of queries we want to support, so the user can have a clear understanding whether a given query is supported or not. Can we define it in terms of SQL syntax tree or alike, maybe a bit informal?
from qbeast-spark.
Related Issues (20)
- Add metastores documentation
- Add method to retrieve Revision Information HOT 1
- Using different hash seed for each revision
- Error on sampling when using <columnName>:<type> in columnsToIndex
- Unify Table Properties structure and storage location
- Add QbeastTable.forTable method
- Update CONTRIBUTING.md HOT 1
- Update README with links to the documentation HOT 1
- Add documentation about the Release process HOT 1
- Update some markdown files of Qbeast-spark repository
- Broken link in main README HOT 1
- TBLPROPERTIES not updated on Spark Catalog
- TBLPROPERTIES on new table are saved partially
- Add Option for rollupCubeSize
- Unify Create Qbeast Table in one transaction
- Update links from the README file
- Broken links in docs
- Need to change html labels
- Fix AdvancedConfiguration.md
- Update CODE_OF_CONDUCT.md header
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qbeast-spark.