risingwavelabs / dbt-risingwave Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Reproduction step:
dbt init
dbt run
dev=> CREATE VIEW "dev"."public"."my_first_dbt_model" AS (
dev(> WITH source_data AS (
dev(> SELECT
dev(> 1 AS id
dev(> UNION ALL
dev(> SELECT
dev(> NULL AS id
dev(> )
dev(> SELECT
dev(> *
dev(> FROM
dev(> source_data
dev(> );
ERROR: QueryError: Invalid input syntax: UNION types INT32 of column id is different from types VARCHAR of column id
Essentially, it's because RW is unable to run the following query
select 1 as id union all select null as id;
The query compiler incorrectly recognizes null as varchar.
RisingWave dbt plugin will create an index by constructing his table name and column name and it is easy to exceed the 63 character limitation.
According to the dbt community discussion dbt-labs/dbt-core#6911, we decide to change the dbt run
default behavior to not drop models by default unless --full-refresh
option is provided.
From user:
About the dbt plugin, looks like it’s working pretty well but there is a major problem with it.
It always recreate all the models. So even if I changed just one file the run itself is very slow.
Do you think it’s possible to change this? So files which hasn’t changed since the last run will be skipped
I would like to use a new feature in dbt Core 1.7. Unfortunately the connector is only compatible with dbt 1.6:
Line 26 in 0339c2f
Would be nice if the support for dbt 1.7 can be added.
https://docs.getdbt.com/blog/announcing-materialized-views dbt newly supports the materialized_view model in dbt-postgres.
{{
config(
materialized='materialized_view',
)
}}
We should maintain the compatibility with other systems.
We currently don't support CTAS (CREATE TABLE AS). We should provide an error to the user and ensure that it's not compilable.
Currently, dbt-risingwave doesn't support incremental
materialization and recommends users use materialized_view
instead, because RisingWave could maintain materialized views incrementally in real-time. However, dbt itself is well known for the incremental
materialization, many users know how incremental
works and how to use it. When they fully understand the trade-off between real-time stream processing and incremental batch processing, you want to use incremental
materialization to optimize their workload. E.g. they can run dbt for incremental models every midnight.
From the dbt docs a materialized view
and a incremental
should only fully refresh if it does not already exist or if you supply the ´--full-refresh´ parameter.
The risingwave docs says that it implements dbt's incremental
mode i.e. https://docs.getdbt.com/docs/build/incremental-models but I'm not certain that it is implemented correctly.
In dbt-risingwave every time I run dbt run
, all materialized views {{ config(materialized='materializedview') }}
are dropped and re-created which is not expected when reading the official dbt documentation.
Since the Postgres dbt plugin would define a md5 name for indexes each time run will generate a different name unsuitable for RisingWave. I propose constructing the index name by __dbt_index_
+ table_name + _column_names
instead.
{% macro risingwave__get_columns_in_relation(relation) -%}
{% call statement('get_columns_in_relation', fetch_result=True) %}
select
column_name,
data_type,
0 as character_maximum_length, -- todo
0 as numeric_precision,
0 as numeric_scale
from {{ relation.information_schema('columns') }}
where table_name = '{{ relation.identifier }}'
{% if relation.schema %}
and table_schema = '{{ relation.schema }}'
{% endif %}
order by ordinal_position
{% endcall %}
{% set table = load_result('get_columns_in_relation').table %}
{{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}
It's because that in the postgres driver where risingwave's driver derives, character_maximum_length
is a required param.
RisingWave supports COMMENT ON
syntax, as documented here: https://docs.risingwave.com/docs/current/sql-comment-on/
However, if I set the persist_docs
config like this, as documented at https://docs.getdbt.com/reference/resource-configs/persist_docs:
+persist_docs:
relation: true
columns: true
Then dbt will fail with an error like this:
10:11:19 SQL status: CREATE_MATERIALIZED_VIEW in 8.0 seconds
10:11:19 Using risingwave connection "model.tt_rw_staging.stg_my_view.v1"
10:11:19 On model.tt_rw_staging.stg_my_view.v1: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "thumbtack_rw", "target_name": "local", "node_id": "model.tt_rw_staging.stg_my_view.v1"} */
comment on materialized_view "thumbtack"."dbt_jamesjohnston_stg_business"."stg_my_view_v1" is $dbt_comment_literal_block$This is the description of my table
$dbt_comment_literal_block$;
10:11:19 Postgres adapter: Postgres error: Failed to run the query
Caused by:
sql parser error: Expected comment object_type, found: materialized_view at line:5, column:31
Near "
comment on materialized_view"
Obviously this is invalid SQL.... I'm guessing the feature is not supported yet.
Definitely a low-priority nice-to-have for me, since I'll also be looking at using the dbt documentation generator. But in the future it would be nice if anyone directly browsing the database can have these descriptions close at hand.
dbt_utils.get_relations_by_pattern is a commonly used function that can list the relations that match the pattern.
SELECT
DISTINCT table_schema AS "table_schema",
table_name AS "table_name",
CASE
table_type
WHEN 'BASE TABLE' THEN 'table'
WHEN 'FOREIGN' THEN 'external'
WHEN 'MATERIALIZED VIEW' THEN 'materializedview'
ELSE LOWER(table_type)
END AS "table_type"
FROM
dev.information_schema.tables
WHERE
table_schema ILIKE 'public'
AND table_name ILIKE 'test_table%'
AND table_name NOT ILIKE ''
Now the issue is that ILIKE is unsupported by RisingWave.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.