axelthevenot / dbt-assertions Goto Github PK
View Code? Open in Web Editor NEWPackage to assert rows in-line with dbt macros.
License: Apache License 2.0
Package to assert rows in-line with dbt macros.
License: Apache License 2.0
First init of dbt-assert
from_column
argument in the package is set as default to 'errors'
column.
Some teams will have different terminologies and specifying the from_column
argument might be redundant and is avoidable.
create a projet variable taking as example the variable set in dbt-date
.
vars:
'dbt_assertions:from_column': 'errors'
@vvaneeclo if you want to try to add this feature ๐
It would be great if the package included some integration tests, like in dbt_project_evaluator for example: https://github.com/dbt-labs/dbt-project-evaluator/tree/main/integration_tests
While it would be awesome to have those tests run in a CI/CD pipeline, before that, having tests defined would allow more people to contribute, making sure that their implementation is working.
The way it works in dbt_project_evaluator is that the folder is a dbt project itself, that installs the package from ..
and people can then cd
to the project and do a dbt build
, which run models and also tests that the output of those models matches some predefined seeds.
First of all, thanks for releasing this package! I really think that it offers a new way to solve specific problems in dbt.
Now, from an adapter point of view, I am keen to help supporting Snowflake.
Arrays functions are so different between different warehouses that there is a bit of work required to make it work.
For now, I have tested snowflake___assertions_expression
which seems to work
{%- macro snowflake___assertions_expression(from_column, assertions) -%}
ARRAY_CAT(
{%- for assertion_id, assertion_config in assertions.items() %}
{%- set expression = assertion_config.expression if '\n' not in assertion_config.expression else assertion_config.expression | indent(12) -%}
{%- set description = assertion_config.description -%}
{%- set null_as_error = 'FALSE' if (assertion_config.null_as_error is not defined or assertion_config.null_as_error is true) else 'TRUE' %}
/* {{ assertion_id }}: {{ description }} */
{{ 'ARRAY_CAT(' if not loop.last }}
IFF(
COALESCE({{ expression }}, {{ null_as_error }}) = FALSE,
['{{ assertion_id }}'],
[]
){{ ',' if not loop.last }}
{%- endfor %}
{{ '[]' if assertions | length == 0}}
{{ ')' * (assertions | length - 1) ~ ','}}
[]
) AS {{ from_column }}
{%- endmacro %}
I will tackle assertions_filter
next week.
Question:
Would it make sense to rename some of the default__assertions_filter
macros to bigquery__assertions_filter
? The difference is so significant for some macros that I think it makes sense to do so.
Add a script to overide dbt docs generate
results (from manifest.json
?) in order to add documentation about the assertions
in the .yml
definition which are ignored by dbt.
This can be a simple Python script.
I think a resulting Markdown table will be a good idea.
Example
assertion | description | expression | null_as_error |
---|---|---|---|
unique | Rows must be unique over columns (key_1, key_2) | n/a |
false |
open_date_is_past | Open date must be in the past | open_date <= CURRENT_DATE() | false |
... | ... | ... | ... |
I looked at __unique__
and __not_null__
this morning and got a bit confused by the different behaviour when listing multiple columns. The README is accurate but I was surprised by how the 2 macros handle multiple columns.
Example with col1
and col2
checked:
__unique__
will check that the combination of col1
and col2
is unique__not_null__
will check that the each of col1
and col2
is unique, creating 2 different assertionsI feel that there is an opportunity to maybe update __unique__
to align it with __not_null__
and maybe create an additional __unique_combination__
. With the same use case as above:
__unique__
could check the uniqueness of each column, creating 2 assertions__unique_combination__
could check the overall uniqueness of the combination of columns (like __unique__
today)What do you think?
In order to facilitate comprehension for users with different databases compatibility, create a first table in the README to show what feature is supported in which database
I was chatting with some colleagues and I was wondering if the assertions would show in manifest.json
if they are not under meta
.
Putting them under meta
would also make them show in the docs I believe.
__unique__
helper will flag duplicated rows but some usages requires to flag rows without having to remove all of them but keeping one version of each.
__soft_unique__
can be a new helper replacing 1 = COUNT(1) OVER(...)
by 1 < ROW_NUMBER() OVER(...)
But how to discriminate rows with an ordering ?
Maybe rethink the v2 with
__unique__:
by:
- key_1
- key_2
soft: true
order_by: <str> | <List<str>>
__not_null__:
by:
- key_1
- key_2
layered: true
Breaking change
Change the blacklist
and whitelist
arguments to be include_list
and exclude_list
In order to facilitate contributions create a PR template.
In order to facilitate ideas and issue tracking, open issues template for at least new feature request and bugs.
Don't know yet how, but keep it simple for both
Add first README for the first release
The example shows version: [">=1.0.0", "<2.0.0"]
but this should be version: [">=0.1.0", "<0.2.0"]
based on the current version
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.