Code Monkey home page Code Monkey logo

Comments (7)

mustafasrepo avatar mustafasrepo commented on July 17, 2024 2

FWIW @jayzhan211 has a really nice API suggestion here I think https://github.com/apache/arrow-datafusion/pull/9920/files#r1549905825

Specifically, add these two functions to the AggregateUDFImpl

/// Return true if the aggregate supports `IGNORE NULL`s
fn support_ignore_nulls() -> bool  { false }

/// Return true if the aggregate supports and `ORDER BY` to specify its ordering:
/// `SELECT first_value(x ORDER BY y) ... `
fn support_ordering() -> bool { false }

I think these APIs are very nice, clear. We can add these supports

from arrow-datafusion.

comphead avatar comphead commented on July 17, 2024

@huaxingao this might be interesting

from arrow-datafusion.

huaxingao avatar huaxingao commented on July 17, 2024

@comphead Thanks for pinging me! I will work on this issue.

from arrow-datafusion.

mustafasrepo avatar mustafasrepo commented on July 17, 2024

I think for AVG, or COUNT type of aggregates we can safely ignore ORDER BY expression. Since operation result is same for any permutation of the input. However, for first_value, nth_value, last_value this is not the case. However, I agree that IGNORE NULLS should raise an error.

from arrow-datafusion.

alamb avatar alamb commented on July 17, 2024

I think for AVG, or COUNT type of aggregates we can safely ignore ORDER BY expression. Since operation result is same for any permutation of the input.

That is true, though applying an ORDER BY to the argument of COUNT may be much slower 🤔 (as DataFusion would have to sort the input)

However, we could remove ORDER BY that don't change the output as a follow on optimization

from arrow-datafusion.

alamb avatar alamb commented on July 17, 2024

FWIW @jayzhan211 has a really nice API suggestion here I think https://github.com/apache/arrow-datafusion/pull/9920/files#r1549905825

Specifically, add these two functions to the AggregateUDFImpl

/// Return true if the aggregate supports `IGNORE NULL`s
fn support_ignore_nulls() -> bool  { false }

/// Return true if the aggregate supports and `ORDER BY` to specify its ordering:
/// `SELECT first_value(x ORDER BY y) ... `
fn support_ordering() -> bool { false }

from arrow-datafusion.

alamb avatar alamb commented on July 17, 2024

I wrote some tests for this feature in another PR that we decided not to use. I put them here #9953 in case that is helpful @huaxingao

from arrow-datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.