Comments (7)
That makes sense. Looks like @viirya has submitted the PR to fix this in sqlparser-rs, so will need that to land, then wait for next sqlparser-rs release to fix this bug
from arrow-datafusion.
I guess this is an unfortunate side effect of having JSON_TABLE be a keyword for a MySQL function.
Indeed.
You can remedy this by escaping it in the SQL via backticks to indicate it shouldn't be a keyword
But since Datafusion is agnostic of any specific database, I think this issue should be circumvent within Datafusion itself.
I actually wasted a huge amount of time, trying to figure out why my queries were failing with ParseError, and it was only by chance I spotted it was related to the name json_table
. So indeed escaping with backticks is a workaround, it would be nicer if this can be handled within Datafusion itself - if possible.
from arrow-datafusion.
I guess this is an unfortunate side effect of having JSON_TABLE
be a keyword for a MySQL function.
You can remedy this by escaping it in the SQL via backticks to indicate it shouldn't be a keyword.
From your reproduction:
pub fn set_up_data_test() -> Result<SessionContext> {
// define a schema.
let schema = Arc::new(Schema::new(vec![
Field::new("index", DataType::UInt8, false),
Field::new("json_data", DataType::Utf8, true),
]));
// define data.
let batch = RecordBatch::try_new(
schema,
vec![
Arc::new(UInt8Array::from_iter_values([1, 2, 3, 4, 5, 6, 7, 8])),
Arc::new(StringArray::from(vec![
Some(r#" { "this" : "is", "a": [ "test" ] } "#),
Some(r#"{"a":[2,3.5,true,false,null,"x"]}"#),
Some(r#"[ "one", "two" ]"#),
Some(r#"123"#),
Some(r#"12.3"#),
Some(r#"true"#),
Some(r#"false"#),
None,
])),
],
)?;
// declare a new context
let ctx = SessionContext::new();
// declare a table in memory.
ctx.register_batch("json_table", batch)?;
// define a schema.
let schema = Arc::new(Schema::new(vec![
Field::new("index", DataType::UInt8, false),
Field::new("json_data", DataType::Utf8, true),
]));
let batch = RecordBatch::try_new(
schema,
vec![
Arc::new(UInt8Array::from_iter_values([1, 2, 3, 4, 5, 6, 7, 8])),
Arc::new(StringArray::from(vec![
Some(r#" { "this" : "is", "a": [ "test" ] } "#),
Some(r#"{"a":[2,3.5,true,false,null,"x"]}"#),
Some(r#"[ "one", "two" ]"#),
Some(r#"123"#),
Some(r#"12.3"#),
Some(r#"true"#),
Some(r#"false"#),
None,
])),
],
)?;
// declare a new context
let ctx = SessionContext::new(); // <-------------- HERE: ctx redefined loses above registration of json_table
// declare a table in memory.
ctx.register_batch("json_value_table", batch)?;
Ok(ctx)
}
#[tokio::main]
async fn main() -> Result<()> {
let ctx = set_up_data_test()?;
// Fails with Error: SQL(ParserError("Expected (, found: EOF"), None)
let df = ctx
.sql(
r#"SELECT * FROM `json_table`"#, // <------- HERE: escape with backticks
)
.await?;
df.clone().show().await?;
Ok(())
}
- Technically double quotes could be used as well:
r#"SELECT * FROM "json_table""#
I also noticed something strange in your reproduction code, where you redefine the ctx
in set_up_data_test()
meaning you lose the registration of the json_table
table anyway.
- Taking as of dadepo/df-repro@05a0a2e
- Unless this was an intention redefinition of
ctx
for testing purposes
We could probably update the docs, like here for example https://arrow.apache.org/datafusion/user-guide/sql/select.html
To indicate quoting isn't only for respecting capitalization, but also for using keywords as table/column names
from arrow-datafusion.
I think the JSON_TABLE
function is part of the SQL standard. So SELECT * FROM json_table
is not valid SQL, and datafusion is correct in refusing it.
I think this issue should be closed. It's the old behavior of accepting the query that was incorrect.
from arrow-datafusion.
Maybe we can improve the error message in the sql parser so it is clearer to people that they have unknowingly used a keyword and should use a different table name (or double quotes) 🤔
from arrow-datafusion.
We can improve the error message, I think. I don't think the table function is well known among users as it is not supported by most engines. The current refusing of the query looks very confusing to end users.
from arrow-datafusion.
It should be possible to support both json_table as an unquoted table name and as a table valued function call. I think it would be the most helpful behavior.
from arrow-datafusion.
Related Issues (20)
- For DML plans, `LogicalPlan::schema` returns the input schema instead of output schema
- DataFusion weekly project plan (Andrew Lamb) - May 6, 2024 HOT 3
- Reduce repetition in datafusion::functions using macros HOT 1
- Support custom SchemaAdapter on ParquetExec HOT 2
- Use `min_value` and `max_value` on statistics to avoid `ExecutionPlan.execute` HOT 3
- Make ASF public press release HOT 1
- Substrait integration doesn't recognize typed functions HOT 2
- Incorrect results with expression resolution HOT 4
- DISCUSSION: remove `CREATE EXTERNAL TABLE` syntax: `DELIMITER`, `WITH HEADER ROW` and `COMPRESSION` HOT 6
- Index out of bounds in `file_stream.rs`. HOT 3
- EnforceDistribution fails, seems to turn all the types of the schema to UInt64 HOT 6
- DataFusion repo got 40MB larger
- Support "User defined coercion" rules HOT 2
- `stride` is not optional for new `array_slice` UDF HOT 17
- `array_slice` panics with `stride=1` HOT 2
- Make `CommonSubexprEliminate` faster by avoiding the use of strings HOT 8
- Add push down sort to the source (table provider) HOT 4
- "Unknown frame descriptor" for ZSTD data. HOT 6
- Type coercion when creating table HOT 1
- Unify schema usage in Datafusion HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.