Comments (2)
It is a good observation that ScalarUDFImpl
and AggregateUDFImpl
don't share a common base trait and thus adding functionality that affects both requires duplication of code
I would need to introduce a more general trait for both function
Another alternative is having a duplicate function for scalar and aggregate for a related function
I agree with your analysis of the tradeoffs: a common base trait would result in less duplication in DataFusion
However, I personally prefer duplicating coerce_arguments_for_signature
in each trait rather than introducing a common base trait because:
- It is backwards compatible (not an API change for the existing library of functions)
- Makes it slightly easier to implement ScalarUDF and AggregateUDF (especially when new to rust) -- rather than two
impl
s for your function, you only need one
from arrow-datafusion.
I found that to have coerce_types
for both scalarUDF and aggregateUDF, I would need to introduce a more general trait for both function
impl UDFImpl for T {
fn name(&self) -> &str {
&self.name
}
fn coerce_types(&self, data_types: &[DataType]) -> Result<Vec<DataType>> {
not_impl_err!("Function {} does not implement coerce_types", self.name)
}
}
impl ScalarUDFImpl: UDFImpl
impl AggregateUDFImpl: UDFImpl
Then we can have
fn coerce_arguments_for_signature(
expressions: Vec<Expr>,
schema: &DFSchema,
signature: &Signature,
func: Arc<dyn UDFImpl>,
) -> Result<Vec<Expr>> {}
Another alternative is having a duplicate function for scalar and aggregate for a related function
fn coerce_arguments_for_signature(
expressions: Vec<Expr>,
schema: &DFSchema,
signature: &Signature,
func: &ScalarUDF,
) -> Result<Vec<Expr>> {}
fn coerce_arguments_for_signature(
expressions: Vec<Expr>,
schema: &DFSchema,
signature: &Signature,
func: &AggregateUDF,
) -> Result<Vec<Expr>> {}
I think the first option is potentially beneficial in the long run(?) but the user now needs to define two traits. The second option only increases the maintenance cost.
What do you think about this @alamb
I also track if there is rust solution for this in https://users.rust-lang.org/t/inheritance-like-with-rust-trait/111102
from arrow-datafusion.
Related Issues (20)
- Create presentation for DataFusion SIGMOD 2024 paper
- Keynote presentation for SiMoD workshop at SIGMOD 2024 HOT 3
- DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 HOT 1
- Convert internal representation of LogicalPlanBuilder from `LogicalPlan` to `Arc<LogicalPlan>` HOT 3
- [Regression] Query using ARRAY_AGG(DISTINCT) causes panic HOT 5
- Add `ProgressiveEval` operator for optimize `SortPreservingMerge` HOT 3
- SortMergeJoin: The query stuck when join filter is set and more matched rows than batch size
- API to get all `Column` references in an `Expr` without cloning `Columns`
- Strengthen TypeSignature and Coercion rule.
- Excessive memory consumption on sorting HOT 3
- feat: enable optional UDF arguments in `regexp_*` functions HOT 1
- cannot import datafusion-37.1.0 in python 3.8 of windows 7 x64 HOT 2
- Release DataFusion `39.0.0` HOT 7
- `ScalarVariable` Expr --> String Support
- `IsNull` / `IsNotNull` Expr --> String Support HOT 2
- `TryCast` Expr --> String Support
- `GroupingSet` Expr --> String Support HOT 5
- `Placeholder` Expr --> String Support HOT 1
- `OuterColumnReference` Expr --> String Support HOT 1
- Add an example of how to use the SQL parser/unparser API HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.