Code Monkey home page Code Monkey logo

Comments (7)

Jefffrey avatar Jefffrey commented on May 29, 2024 2

That makes sense. Looks like @viirya has submitted the PR to fix this in sqlparser-rs, so will need that to land, then wait for next sqlparser-rs release to fix this bug

from arrow-datafusion.

dadepo avatar dadepo commented on May 29, 2024 1

I guess this is an unfortunate side effect of having JSON_TABLE be a keyword for a MySQL function.

Indeed.

You can remedy this by escaping it in the SQL via backticks to indicate it shouldn't be a keyword

But since Datafusion is agnostic of any specific database, I think this issue should be circumvent within Datafusion itself.

I actually wasted a huge amount of time, trying to figure out why my queries were failing with ParseError, and it was only by chance I spotted it was related to the name json_table. So indeed escaping with backticks is a workaround, it would be nicer if this can be handled within Datafusion itself - if possible.

from arrow-datafusion.

Jefffrey avatar Jefffrey commented on May 29, 2024

I guess this is an unfortunate side effect of having JSON_TABLE be a keyword for a MySQL function.

You can remedy this by escaping it in the SQL via backticks to indicate it shouldn't be a keyword.

From your reproduction:

pub fn set_up_data_test() -> Result<SessionContext> {

    // define a schema.
    let schema = Arc::new(Schema::new(vec![
        Field::new("index", DataType::UInt8, false),
        Field::new("json_data", DataType::Utf8, true),
    ]));

    // define data.
    let batch = RecordBatch::try_new(
        schema,
        vec![
            Arc::new(UInt8Array::from_iter_values([1, 2, 3, 4, 5, 6, 7, 8])),
            Arc::new(StringArray::from(vec![
                Some(r#" { "this" : "is", "a": [ "test" ] } "#),
                Some(r#"{"a":[2,3.5,true,false,null,"x"]}"#),
                Some(r#"[ "one", "two" ]"#),
                Some(r#"123"#),
                Some(r#"12.3"#),
                Some(r#"true"#),
                Some(r#"false"#),
                None,
            ])),
        ],
    )?;

    // declare a new context
    let ctx = SessionContext::new();
    // declare a table in memory.
    ctx.register_batch("json_table", batch)?;

    // define a schema.
    let schema = Arc::new(Schema::new(vec![
        Field::new("index", DataType::UInt8, false),
        Field::new("json_data", DataType::Utf8, true),
    ]));

    let batch = RecordBatch::try_new(
        schema,
        vec![
            Arc::new(UInt8Array::from_iter_values([1, 2, 3, 4, 5, 6, 7, 8])),
            Arc::new(StringArray::from(vec![
                Some(r#" { "this" : "is", "a": [ "test" ] } "#),
                Some(r#"{"a":[2,3.5,true,false,null,"x"]}"#),
                Some(r#"[ "one", "two" ]"#),
                Some(r#"123"#),
                Some(r#"12.3"#),
                Some(r#"true"#),
                Some(r#"false"#),
                None,
            ])),
        ],
    )?;

    // declare a new context
    let ctx = SessionContext::new(); // <-------------- HERE: ctx redefined loses above registration of json_table
    // declare a table in memory.
    ctx.register_batch("json_value_table", batch)?;

    Ok(ctx)
}

#[tokio::main]
async fn main() -> Result<()> {

    let ctx = set_up_data_test()?;

    // Fails with Error: SQL(ParserError("Expected (, found: EOF"), None)
    let df = ctx
        .sql(
            r#"SELECT * FROM `json_table`"#, // <------- HERE: escape with backticks
        )
        .await?;

    df.clone().show().await?;


    Ok(())
}
  • Technically double quotes could be used as well: r#"SELECT * FROM "json_table""#

I also noticed something strange in your reproduction code, where you redefine the ctx in set_up_data_test() meaning you lose the registration of the json_table table anyway.

We could probably update the docs, like here for example https://arrow.apache.org/datafusion/user-guide/sql/select.html

To indicate quoting isn't only for respecting capitalization, but also for using keywords as table/column names

from arrow-datafusion.

lovasoa avatar lovasoa commented on May 29, 2024

I think the JSON_TABLE function is part of the SQL standard. So SELECT * FROM json_table is not valid SQL, and datafusion is correct in refusing it.

I think this issue should be closed. It's the old behavior of accepting the query that was incorrect.

from arrow-datafusion.

alamb avatar alamb commented on May 29, 2024

Maybe we can improve the error message in the sql parser so it is clearer to people that they have unknowingly used a keyword and should use a different table name (or double quotes) 🤔

from arrow-datafusion.

viirya avatar viirya commented on May 29, 2024

We can improve the error message, I think. I don't think the table function is well known among users as it is not supported by most engines. The current refusing of the query looks very confusing to end users.

from arrow-datafusion.

lovasoa avatar lovasoa commented on May 29, 2024

It should be possible to support both json_table as an unquoted table name and as a table valued function call. I think it would be the most helpful behavior.

from arrow-datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.