risingwavelabs / dbt-risingwave Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 2.0 124 KB

License: Apache License 2.0

Python 100.00%

dbt-risingwave's People

Contributors

Stargazers

Watchers

Forkers

siddyp northvolt

dbt-risingwave's Issues

failed to run the scaffold project generated by `dbt init`

Reproduction step:

dbt init
dbt run

dev=> CREATE VIEW "dev"."public"."my_first_dbt_model" AS (
dev(>     WITH source_data AS (
dev(>         SELECT
dev(>             1 AS id
dev(>         UNION ALL
dev(>         SELECT
dev(>             NULL AS id
dev(>     )
dev(>     SELECT
dev(>         *
dev(>     FROM
dev(>         source_data
dev(> );

ERROR:  QueryError: Invalid input syntax: UNION types INT32 of column id is different from types VARCHAR of column id

Essentially, it's because RW is unable to run the following query

select 1 as id union all select null as id;

The query compiler incorrectly recognizes null as varchar.

remove the restriction of relation name max length is 63 character.

RisingWave dbt plugin will create an index by constructing his table name and column name and it is easy to exceed the 63 character limitation.

Change dbt run to not drop models by default unless full refresh option is provided

According to the dbt community discussion dbt-labs/dbt-core#6911, we decide to change the dbt run default behavior to not drop models by default unless --full-refresh option is provided.

feat: Rerun only files which changed

From user:

About the dbt plugin, looks like it’s working pretty well but there is a major problem with it.
It always recreate all the models. So even if I changed just one file the run itself is very slow.
Do you think it’s possible to change this? So files which hasn’t changed since the last run will be skipped

Support dbt Core 1.7

I would like to use a new feature in dbt Core 1.7. Unfortunately the connector is only compatible with dbt 1.6:

dbt-risingwave/setup.py

Line 26 in 0339c2f

install_requires=["dbt-postgres~=1.6.0"],

Would be nice if the support for dbt 1.7 can be added.

use 'materialized_view' instead of 'materializedview'

https://docs.getdbt.com/blog/announcing-materialized-views dbt newly supports the materialized_view model in dbt-postgres.

{{
config(
     materialized='materialized_view',
)
}}

We should maintain the compatibility with other systems.

set up the CI workflow of the driver testing

I tried to add a test today:

pytest

But I encountered this issue:
risingwavelabs/risingwave#6773

disallow creating table model

We currently don't support CTAS (CREATE TABLE AS). We should provide an error to the user and ensure that it's not compilable.

Discussion: Support incremental materialization

Currently, dbt-risingwave doesn't support incremental materialization and recommends users use materialized_view instead, because RisingWave could maintain materialized views incrementally in real-time. However, dbt itself is well known for the incremental materialization, many users know how incremental works and how to use it. When they fully understand the trade-off between real-time stream processing and incremental batch processing, you want to use incremental materialization to optimize their workload. E.g. they can run dbt for incremental models every midnight.

Materialized views are fully refreshed

From the dbt docs a materialized view and a incremental should only fully refresh if it does not already exist or if you supply the ´--full-refresh´ parameter.

The risingwave docs says that it implements dbt's incremental mode i.e. https://docs.getdbt.com/docs/build/incremental-models but I'm not certain that it is implemented correctly.

In dbt-risingwave every time I run dbt run, all materialized views {{ config(materialized='materializedview') }} are dropped and re-created which is not expected when reading the official dbt documentation.

feat: create index use a readable and deterministic name instead of a md5 name

Since the Postgres dbt plugin would define a md5 name for indexes each time run will generate a different name unsuitable for RisingWave. I propose constructing the index name by __dbt_index_ + table_name + _column_names instead.

feat: support create table_with_connector

risingwave's driver always maps varchar to varchar(0)


{% macro risingwave__get_columns_in_relation(relation) -%}
  {% call statement('get_columns_in_relation', fetch_result=True) %}
      select
          column_name,
          data_type,
          0 as character_maximum_length, -- todo
          0 as numeric_precision,
          0 as numeric_scale

      from {{ relation.information_schema('columns') }}
      where table_name = '{{ relation.identifier }}'
        {% if relation.schema %}
        and table_schema = '{{ relation.schema }}'
        {% endif %}
      order by ordinal_position

  {% endcall %}
  {% set table = load_result('get_columns_in_relation').table %}
  {{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}

It's because that in the postgres driver where risingwave's driver derives, character_maximum_length is a required param.

Support on_configuration_change on materialized view

https://docs.getdbt.com/reference/resource-configs/on_configuration_change

Support persisting documentation

RisingWave supports COMMENT ON syntax, as documented here: https://docs.risingwave.com/docs/current/sql-comment-on/

However, if I set the persist_docs config like this, as documented at https://docs.getdbt.com/reference/resource-configs/persist_docs:

    +persist_docs:
      relation: true
      columns: true

Then dbt will fail with an error like this:

10:11:19  SQL status: CREATE_MATERIALIZED_VIEW in 8.0 seconds
10:11:19  Using risingwave connection "model.tt_rw_staging.stg_my_view.v1"
10:11:19  On model.tt_rw_staging.stg_my_view.v1: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "thumbtack_rw", "target_name": "local", "node_id": "model.tt_rw_staging.stg_my_view.v1"} */



  comment on materialized_view "thumbtack"."dbt_jamesjohnston_stg_business"."stg_my_view_v1" is $dbt_comment_literal_block$This is the description of my table
$dbt_comment_literal_block$;


10:11:19  Postgres adapter: Postgres error: Failed to run the query

Caused by:
  sql parser error: Expected comment object_type, found: materialized_view at line:5, column:31
Near "
  comment on materialized_view"

Obviously this is invalid SQL.... I'm guessing the feature is not supported yet.

Definitely a low-priority nice-to-have for me, since I'll also be looking at using the dbt documentation generator. But in the future it would be nice if anyone directly browsing the database can have these descriptions close at hand.

unable to run `dbt_utils.get_relations_by_pattern`

dbt_utils.get_relations_by_pattern is a commonly used function that can list the relations that match the pattern.

https://github.com/dbt-labs/dbt-utils/blob/6ba7b660b1d1b4e3b41cb0cf6c3c0e4a70ae54e4/macros/sql/get_tables_by_pattern_sql.sql#L6C10-L6C44

SELECT
    DISTINCT table_schema AS "table_schema",
    table_name AS "table_name",
    CASE
        table_type
        WHEN 'BASE TABLE' THEN 'table'
        WHEN 'FOREIGN' THEN 'external'
        WHEN 'MATERIALIZED VIEW' THEN 'materializedview'
        ELSE LOWER(table_type)
    END AS "table_type"
FROM
    dev.information_schema.tables
WHERE
    table_schema ILIKE 'public'
    AND table_name ILIKE 'test_table%'
    AND table_name NOT ILIKE ''

Now the issue is that ILIKE is unsupported by RisingWave.