Comments (7)
Hmm not sure how you got the NotImplemented error - maybe somehow running a quite old DataFusion? However with the query
SELECT p1.PS_PARTKEY supp_key, p2.PS_PARTKEY cust_key FROM 'partsupp' p1, 'partsupp' p2
I do get the same error as you originally:
#[tokio::test]
async fn roundtrip_implicit_cross_join() -> Result<()> {
roundtrip("SELECT p1.a p1_a, p2.a p2_a FROM data p1, data p2").await
}
Error: Plan("Projections require unique expression names but the expression \"data.a\" at position 0 and \"data.a\" at position 1 have the same name. Consider aliasing (\"AS\") one of them.")
This is because Substrait doesn't include aliases neither for tables nor for columns. I'm trying to see if I can add that into Substrait, it'd make these things easier to support: substrait-io/substrait#648
from arrow-datafusion.
Hmm not sure how you got the NotImplemented error - maybe somehow running a quite old DataFusion? However with the query
Ahhh yea...i was on an older version.
from arrow-datafusion.
same column names with different aliases
Isn't the repro trying to alias different column names (PS_PARTKEY, PS_SUPPKEY) to same alias (K1)? Why would you want to do that? 😅
from arrow-datafusion.
same column names with different aliases
Isn't the repro trying to alias different column names (PS_PARTKEY, PS_SUPPKEY) to same alias (K1)? Why would you want to do that? 😅
Ahh...that was my mistake. One of those should be k2. I was trying to get a more simple repro of a much larger query with multiple joins. However...now that I have a more proper query, I am running into a different issue.
This is the query I have now:
SELECT p1.PS_PARTKEY supp_key, p2.PS_PARTKEY cust_key
FROM
'partsupp' p1,
'partsupp' p2
And this is the substrait error from that:
DataFusion error: NotImplemented("Unsupported operator: CrossJoin:\\n SubqueryAlias: p1\\n TableScan: partsupp projection=[ps_partkey]\\n SubqueryAlias: p2\\n TableScan: partsupp projection=[ps_partkey]")')
So the original issue that I was hitting was datafusion trying to run a substrait plan generated from DuckDB. And the error from that is the same error as I put in the description.
from arrow-datafusion.
I added to the substrait support epic: #5173
from arrow-datafusion.
Given that names don't matter in Substrait (the final names are provided) is the problem solvable within the Substrait consumer for Datafusion? Shouldn't the consumer be able to rename the columns to whatever it wants?
Stepping further back I wonder if the check is needed at all here -- is it trying to prevent extra work or is it trying to prevent confusion on its part later on? It may be designed for the case where the fields are named the same but are from different sources which isn't happening here. Perhaps the check needs to be made more precise?
from arrow-datafusion.
Given that names don't matter in Substrait (the final names are provided) is the problem solvable within the Substrait consumer for Datafusion?
As discussed on the Substrait ticket, yes it can be solved, but not in a nice way.
Shouldn't the consumer be able to rename the columns to whatever it wants?
It can, however given the user has named the columns/tables in one way in the original plan, it can be quite confusing to the user if the columns/tables are named much differently in the actually executed plan.
Stepping further back I wonder if the check is needed at all here -- is it trying to prevent extra work or is it trying to prevent confusion on its part later on? It may be designed for the case where the fields are named the same but are from different sources which isn't happening here. Perhaps the check needs to be made more precise?
This plan results in a cross join, so the fields do refer to different sources, or same table but different sides of the join, so they are different columns.
from arrow-datafusion.
Related Issues (20)
- Add example for writing an `AnalyzerRule` HOT 4
- Support comparison operators on nested data types (Struct, List, ..) HOT 2
- Support for Multiple Local Files in Substrait ReadType::LocalFiles
- Support for contains function in datafusion substrait consumer HOT 9
- `array_to_string` panic on dictionary values HOT 2
- Add substrait support for multiple files in `ReadType::LocalFiles`
- DataFusion weekly project plan (Andrew Lamb) - June 10, 2024 HOT 1
- Convert `ApproxPercentileCont` and ``ApproxPercentileContWithWeight` to UDAF HOT 3
- Add example for writing an SQL analysis pass HOT 3
- SMJ LeftAnti filtered join fuzz test cannot pass
- Substrait plan execution of COUNT() errors with 'pyarrow.lib.ArrowInvalid: Schema at index 0 was different'
- Helper to list all table references in a SQL query
- `fetch_statistics` for a given ParquetMetaData HOT 2
- Aggregate with `FILTER` clause and alias errors HOT 2
- Incorrect LEFT JOIN evaluation result on OR conditions HOT 9
- SMJ: incorrect result for filtered right outer join HOT 1
- Convert `Regr` to UDAF HOT 1
- Convert `Correlation` to UDAF HOT 1
- SMJ producing different results than HashJoin when doing a semi join HOT 8
- Construction of user-defined table functions (UDTFs) should be async to allow for async schemas HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.