Comments (10)
I am facing something similar with UNNEST in bigquery. example SQL :
SELECT * FROM UNNEST([1,2,3,4]) AS NUMBERS
from dbt-checkpoint.
same issue here with bq
from dbt-checkpoint.
@tomsej @BAntonellini - as a workaround, would it be possible to add an --exclude
or --ignore
flag, so we can configure dbt-checkpoint to ignore specific edgecases? (While I agree that this isn't optimal and the list will grow over time, the specific number of offending keywords we have are quite low.)
from dbt-checkpoint.
I think this is a feature / bug of using regex to parse the SQL, vs. a SQL parser.
sqlfluff
has a parser that can extract only the tables (and, crucially, not the CTEs), which I believe could be used as a drop-in replacement for has_table_name()`, defined something like this like this:
import sqlfluff
def has_table_name(
sql: str, filename: str, dotless: Optional[bool] = False
) -> Tuple[int, Set[str]]:
# not sure if this status code would still be necessary
status_code = 0
parsed_sql = sqlfluff.parse(sql)
table_names = set(parsed_sql .tree.get_table_references())
return status_code, table_names
The downside to this is, this would tie this project to the sqlfluff
dependencies, and the python API isn't quite set. The upside is that the code would allow for a more robust parser (regex on SQL to understand the SQL parse tree will always be an 80% solution, which is problematic is a CI setting).
https://github.com/sqlfluff/sqlfluff/blob/main/examples/03_extracting_references.py#L19-L20
from dbt-checkpoint.
I totally agree with that and I understand sqlfluff is not ready yet for something like this. But wouldn't it be possible to modify the regex so it ignores things like extract(month|year|... from)
?
That would at least add one extra case covered that seems to be used in several databases
from dbt-checkpoint.
I have this same issue with date_spine.
from dbt-checkpoint.
Similar problem
SELECT 'whatever you want to type here from something' AS message
will return an error
does not use source() or ref() macros for tables:
because it use the part of text as a table reference
from dbt-checkpoint.
In Redshift / Postgres, select substring(col FROM POSITION('a' IN col) + 1 for 10)
also gives an error.
from dbt-checkpoint.
Same when doing something like trim(device.web_info.hostname::varchar(100))as device_web_info_hostname,
Pretty much any unnesting logic
from dbt-checkpoint.
We hit an interesting edge case, which can be added here (string parsing vs sql parsing): sql string which contains the keyword from
.
example:
select
CASE
WHEN 'hello' then 'foo from bar'
ELSE 'goodbye'
END as name
from {{ ref('stg_users') }}
will give this error:
models/my_model.sql: does not use source() or ref() macros for tables:
- bar'
It sees 'foo from bar'
and thinks from bar
is a table usage. One interesting thing to note: it is parsing the table name as bar'
, and not bar
.
from dbt-checkpoint.
Related Issues (20)
- allow extra keys in check-sources-has-meta-keys HOT 1
- Add skip functionality to "dbt-deps" hook
- Support for multiple data types for check-column-name-contract hook HOT 1
- `check-column-name-contract` doesn't output the name of the files HOT 1
- check-model-name-contract support for snapshots HOT 1
- Typo in docs for how to use --exclude
- Following the instructions for github actions failed (re: profile.yml)
- unable to exclude specific model from pre-commit
- check_source_has_all_columns reports failure due to use of lower() HOT 1
- Support multiple dbt project roots in a single repo HOT 2
- check-script-ref-and-source erroneously checks refs in comments
- Generate docs for only the staged/changed model(s) HOT 3
- `check-source-has-tests` has the wrong argument documented for test count HOT 1
- check-source-loaded-at-field-is-valid
- check-column-name-contract (and maybe other checkpoints based on Catalog) does not work with Versioned models HOT 1
- dbt-docs-generate speedup possibility with --no-compile? HOT 9
- check-script-ref-and-source hook not accounting for versioned models
- check_model_has_tests_by_name being applied to all models HOT 1
- `check-model-has-description` raises `IsADirectoryError` intermittently
- generate-model-properties-file hook "When to use it" is wrong
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-checkpoint.