Code Monkey home page Code Monkey logo

Comments (10)

Subhabrata1996 avatar Subhabrata1996 commented on June 8, 2024 6

I am facing something similar with UNNEST in bigquery. example SQL :

SELECT * FROM UNNEST([1,2,3,4]) AS NUMBERS

from dbt-checkpoint.

i-zaykov avatar i-zaykov commented on June 8, 2024 2

same issue here with bq

from dbt-checkpoint.

tommyh avatar tommyh commented on June 8, 2024 1

@tomsej @BAntonellini - as a workaround, would it be possible to add an --exclude or --ignore flag, so we can configure dbt-checkpoint to ignore specific edgecases? (While I agree that this isn't optimal and the list will grow over time, the specific number of offending keywords we have are quite low.)

from dbt-checkpoint.

cnolanminich avatar cnolanminich commented on June 8, 2024

I think this is a feature / bug of using regex to parse the SQL, vs. a SQL parser.

sqlfluff has a parser that can extract only the tables (and, crucially, not the CTEs), which I believe could be used as a drop-in replacement for has_table_name()`, defined something like this like this:

import sqlfluff

def has_table_name(
    sql: str, filename: str, dotless: Optional[bool] = False
) -> Tuple[int, Set[str]]:
    # not sure if this status code would still be necessary
    status_code = 0
    parsed_sql = sqlfluff.parse(sql)
    table_names = set(parsed_sql .tree.get_table_references())
    return status_code, table_names

The downside to this is, this would tie this project to the sqlfluff dependencies, and the python API isn't quite set. The upside is that the code would allow for a more robust parser (regex on SQL to understand the SQL parse tree will always be an 80% solution, which is problematic is a CI setting).

https://github.com/sqlfluff/sqlfluff/blob/main/examples/03_extracting_references.py#L19-L20

from dbt-checkpoint.

jmriego avatar jmriego commented on June 8, 2024

I totally agree with that and I understand sqlfluff is not ready yet for something like this. But wouldn't it be possible to modify the regex so it ignores things like extract(month|year|... from)?
That would at least add one extra case covered that seems to be used in several databases

from dbt-checkpoint.

eugenekim-orrum avatar eugenekim-orrum commented on June 8, 2024

I have this same issue with date_spine.

from dbt-checkpoint.

Sebastian-Trustpower avatar Sebastian-Trustpower commented on June 8, 2024

Similar problem

SELECT 'whatever you want to type here from something' AS message

will return an error
does not use source() or ref() macros for tables:
because it use the part of text as a table reference

from dbt-checkpoint.

lidalei avatar lidalei commented on June 8, 2024

In Redshift / Postgres, select substring(col FROM POSITION('a' IN col) + 1 for 10) also gives an error.

from dbt-checkpoint.

wilson-urdaneta avatar wilson-urdaneta commented on June 8, 2024

Same when doing something like trim(device.web_info.hostname::varchar(100))as device_web_info_hostname,

Pretty much any unnesting logic

from dbt-checkpoint.

tommyh avatar tommyh commented on June 8, 2024

We hit an interesting edge case, which can be added here (string parsing vs sql parsing): sql string which contains the keyword from.

example:

select
  CASE
    WHEN 'hello' then 'foo from bar'
    ELSE 'goodbye'
   END as name
from {{ ref('stg_users') }}

will give this error:

models/my_model.sql: does not use source() or ref() macros for tables:
 - bar'

It sees 'foo from bar' and thinks from bar is a table usage. One interesting thing to note: it is parsing the table name as bar', and not bar.

from dbt-checkpoint.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.