Comments (7)
FWIW @jayzhan211 has a really nice API suggestion here I think https://github.com/apache/arrow-datafusion/pull/9920/files#r1549905825
Specifically, add these two functions to the
AggregateUDFImpl
/// Return true if the aggregate supports `IGNORE NULL`s fn support_ignore_nulls() -> bool { false } /// Return true if the aggregate supports and `ORDER BY` to specify its ordering: /// `SELECT first_value(x ORDER BY y) ... ` fn support_ordering() -> bool { false }
I think these APIs are very nice, clear. We can add these supports
from arrow-datafusion.
@huaxingao this might be interesting
from arrow-datafusion.
@comphead Thanks for pinging me! I will work on this issue.
from arrow-datafusion.
I think for AVG
, or COUNT
type of aggregates we can safely ignore ORDER BY
expression. Since operation result is same for any permutation of the input. However, for first_value
, nth_value
, last_value
this is not the case. However, I agree that IGNORE NULLS
should raise an error.
from arrow-datafusion.
I think for AVG, or COUNT type of aggregates we can safely ignore ORDER BY expression. Since operation result is same for any permutation of the input.
That is true, though applying an ORDER BY to the argument of COUNT may be much slower 🤔 (as DataFusion would have to sort the input)
However, we could remove ORDER BY
that don't change the output as a follow on optimization
from arrow-datafusion.
FWIW @jayzhan211 has a really nice API suggestion here I think https://github.com/apache/arrow-datafusion/pull/9920/files#r1549905825
Specifically, add these two functions to the AggregateUDFImpl
/// Return true if the aggregate supports `IGNORE NULL`s
fn support_ignore_nulls() -> bool { false }
/// Return true if the aggregate supports and `ORDER BY` to specify its ordering:
/// `SELECT first_value(x ORDER BY y) ... `
fn support_ordering() -> bool { false }
from arrow-datafusion.
I wrote some tests for this feature in another PR that we decided not to use. I put them here #9953 in case that is helpful @huaxingao
from arrow-datafusion.
Related Issues (20)
- Add an example of how to use the SQL parser/unparser API HOT 2
- Support Substrait VirtualTables
- Discussion: make it easier for specify SQL --> function translation HOT 3
- Create a DataFusion blog HOT 3
- `analysis.rs` bounds check panic HOT 5
- AggregateUDF expression API design HOT 2
- Example for building an external index for parquet files HOT 6
- `array_slice` can't correctly handle NULL parameters or some edge cases
- Add an example of how to convert LogicalPlan to/from SQL Strings
- Cast from string to date with "/" HOT 2
- error: this arithmetic operation will overflow (on i386)
- REmove workaround for `COUNT(*)` in subquery decorrelation code
- Make SQL strings generated from `Expr`s "prettier" HOT 11
- Dynamic schema for custom TableProvider HOT 4
- ScalarValue serialization does not handle nested dictionary values
- Using `Expr::field` panics HOT 3
- Improve signature of `get_field` is function
- UserDefinedLogicalNode::from_template does not return a Result<...> >
- UserDefindedLogicalNode::from_template does not return a Result<...>. HOT 3
- Row groups are read out of order or with completely different values HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.