Comments (6)
I understand that the issue runs a bit deeper as we employ sqlparser for converting SQL queries into statements.
"SELECT c FROM t WHERE c >= '2019-03-27T22:00:00.000Z'::timestamp at time zone 'Europe/Brussels'"
Processing stmts from tokenizer:
[Statement(Query(Query { with: None, body: Select(Select { distinct: None, top: None, projection: [UnnamedExpr(Identifier(Ident { value: "c", quote_style: None }))], into: None, from: [TableWithJoins { relation: Table { name: ObjectName([Ident { value: "t", quote_style: None }]), alias: None, args: None, with_hints: [], version: None, partitions: [] }, joins: [] }], lateral_views: [], selection: Some(AtTimeZone { timestamp: BinaryOp { left: Identifier(Ident { value: "c", quote_style: None }), op: GtEq, right: Cast { expr: Value(SingleQuotedString("2019-03-27T22:00:00.000Z")), data_type: Timestamp(None, None), format: None } }, time_zone: "Europe/Brussels" }), group_by: Expressions([]), cluster_by: [], distribute_by: [], sort_by: [], having: None, named_window: [], qualify: None, value_table_mode: None }), order_by: [], limit: None, limit_by: [], offset: None, fetch: None, locks: [], for_clause: None }))]
datafusion/datafusion/sql/src/parser.rs
Line 315 in dac2a7e
yes, I think we should fix sqlparser to get ('2019-03-27T22:00:00.000Z'::timestamp at time zone 'Europe/Brussels')
as the right hand side of binary operation
from arrow-datafusion.
Thank you for the report -- this definitely looks like a bug to me
from arrow-datafusion.
An example to reproduce.
statement ok
create table t1 as values (
date_bin(interval '1 hour', '2022-08-03 14:38:50Z' at time zone 'Europe/Brussels'),
date_bin(interval '1 hour', '2022-08-03 14:38:50Z' at time zone 'Europe/Brussels'));
query error
select * from t1 where column1 >= '2019-03-27T22:00:00.000Z'::timestamp at time zone 'Europe/Brussels';
----
DataFusion error: Internal error: binary_op should be handled by sql_expr_to_logical_expr..
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
sql we got in sql_expr_to_logical_expr
sql: AtTimeZone { timestamp: BinaryOp { left: Identifier(Ident { value: "column1", quote_style: None }), op: GtEq, right: Cast { expr: Value(SingleQuotedString("2019-03-27T22:00:00.000Z")), data_type: Timestamp(None, None), format: None } }, time_zone: "Europe/Brussels" }
We need to support AtTimeZone
, and maybe timezone casting 🤔 ?
Related code is in
datafusion/datafusion/sql/src/expr/mod.rs
Lines 69 to 98 in 286eb34
from arrow-datafusion.
Thanks for the deep dive @jayzhan211! Could it be that the expression is not getting correctly parsed in this instance? Executing this query for example:
select * from t1 where column1 >= ('2019-03-27T22:00:00.000Z'::timestamp at time zone 'Europe/Brussels');
seems to be working fine with no issues
from arrow-datafusion.
Yes, you are right!
This is one with ()
sql: BinaryOp { left: Identifier(Ident { value: "column1", quote_style: None }), op: GtEq, right: Nested(AtTimeZone { timestamp: Cast { expr: Value(SingleQuotedString("2019-03-27T22:00:00.000Z")), data_type: Timestamp(None, None), format: None }, time_zone: "Europe/Brussels" }) }
sql: AtTimeZone { timestamp: Cast { expr: Value(SingleQuotedString("2019-03-27T22:00:00.000Z")), data_type: Timestamp(None, None), format: None }, time_zone: "Europe/Brussels" }
This one is without ()
sql: AtTimeZone { timestamp: BinaryOp { left: Identifier(Ident { value: "column1", quote_style: None }), op: GtEq, right: Cast { expr: Value(SingleQuotedString("2019-03-27T22:00:00.000Z")), data_type: Timestamp(None, None), format: None } }, time_zone: "Europe/Brussels" }
I think it is parsed as
(column1 >= '2019-03-27T22:00:00.000Z'::timestamp) at time zone 'Europe/Brussels'
which ideally should be parsed as
column1 >= ('2019-03-27T22:00:00.000Z'::timestamp at time zone 'Europe/Brussels')
from arrow-datafusion.
I understand that the issue runs a bit deeper as we employ sqlparser for converting SQL queries into statements.
"SELECT c FROM t WHERE c >= '2019-03-27T22:00:00.000Z'::timestamp at time zone 'Europe/Brussels'"
Processing stmts from tokenizer:
[Statement(Query(Query { with: None, body: Select(Select { distinct: None, top: None, projection: [UnnamedExpr(Identifier(Ident { value: "c", quote_style: None }))], into: None, from: [TableWithJoins { relation: Table { name: ObjectName([Ident { value: "t", quote_style: None }]), alias: None, args: None, with_hints: [], version: None, partitions: [] }, joins: [] }], lateral_views: [], selection: Some(AtTimeZone { timestamp: BinaryOp { left: Identifier(Ident { value: "c", quote_style: None }), op: GtEq, right: Cast { expr: Value(SingleQuotedString("2019-03-27T22:00:00.000Z")), data_type: Timestamp(None, None), format: None } }, time_zone: "Europe/Brussels" }), group_by: Expressions([]), cluster_by: [], distribute_by: [], sort_by: [], having: None, named_window: [], qualify: None, value_table_mode: None }), order_by: [], limit: None, limit_by: [], offset: None, fetch: None, locks: [], for_clause: None }))]
datafusion/datafusion/sql/src/parser.rs
Line 315 in dac2a7e
from arrow-datafusion.
Related Issues (20)
- `strpos` (and `postion` and `instr`) throw a cast error when called with arguments of different types
- Panic in `NTH_VALUE()` window function (SQLancer) HOT 1
- Add min_by and max_by aggregate functions HOT 1
- A simple count() query caused Internal Error in PhysicalOptimizer (SQLancer) HOT 1
- Add additional regexp function `regexp_count()`
- Use GenericStringBuilder to improve performance of UDF split_part
- Document "how to read an explain plan" HOT 11
- Thread panics in SpawnedTask during shutdown. HOT 6
- SMJ full join produces a null join row for a buffered row even the buffered row is joined with streamed row and passes join filter
- Implement GroupsAccumulator for stddev and var aggregaters
- Cannot infer common string type for string concat operation Dictionary(Int32, Utf8) || Dictionary(Int32, Utf8) HOT 2
- Bug detecting datatype in VALUES tuples HOT 1
- Handle downstream impacts to union's behavioral changes. HOT 6
- Aggregation fuzz testing HOT 5
- Systematic fuzz testing for parquet predicate pushdown HOT 1
- Support protobuf serialization for `ScalarValue::Utf8View` and `ScalarValue::BinaryView` HOT 1
- Support substrait serialization for `ScalarValue::Utf8View` and `ScalarValue::BinaryView` HOT 3
- Add config flag to convert `Utf8View`/`BinaryView` --> `Utf8` / `Binary` at output HOT 4
- Add the possibility to work with plan clones directly in `inline_table_scan` HOT 3
- Parquet statistics missing when reading `Utf8` as `Utf8View`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.