Code Monkey home page Code Monkey logo

dbt-fundamentals's Introduction

Repository for the dbt courses

Data Lineage

dbt documentation

https://courses.getdbt.com/courses https://docs.getdbt.com/guides/best-practices https://docs.getdbt.com/docs/introduction https://docs.getdbt.com/reference/references-overview

Snowflake instance

https://qr92018.eu-west-2.aws.snowflakecomputing.com/

Setup

Install dbt in a virtual environment

pip install --no-dependencies -r requirements.txt

Create a ~/.dbt/profiles.yml

personal_profile:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: ****
      user: ****
      password: ****
      role: ****
      database: ****
      warehouse: ****
      schema: ****
      threads: 8
      client_session_keep_alive: False
      query_tag: ****
      connect_retries: 0
      connect_timeout: 10
      retry_on_database_errors: False
      retry_all: False
      reuse_connections: False

Check connection

dbt deps
dbt debug

Create database, schemas and tables

dbt run-operation create_db_objects
dbt run-operation create_sample_src_tables

Build and test dbt models.

dbt build

Notes on materializations in dbt

Materialization defines how dbt builds the models (select statements).

  • Tables: Copy of the data from upstream into a NEW table.
    • Higher build time and more storage usage, but shorter query time
    • If new record added in upstream stable, model/table is rebuilt from scratch.
  • Views: No copy, data remains in the upstream table, and the query is run every time we do a select on the view.
    • Quicker build time and less storage usage, but longer query time
    • If new record added in upstream stable, it will be part of the view/query, but slower.
  • Ephemeral: Brings CTE into downstream models (another select statement), but no persistent storage, do not exist in the DB.
    • Can be used if too many CTEs in a model, and it makes sense to break it down, or that the CTE can be reused in other downstream models, but we don't want to store the data. Help reduces the amount of tables in the WH, but harder to debug, cannot be queried directly.
  • Incremental: Only brings in new records in upstream tables, do not rebuild the table from scratch
  • Snapshot: Look at changed records, if anything changes, bring the updated records as a NEW row - no loss of information

Generally, start with a view, when it takes too long to query, "upgrade" the model to a table. When the table takes too long to build, consider upgrading to an incremental table. Upgrading to an incremental table is not straightforward, follow the dbt documentation

Node selection syntax

https://docs.getdbt.com/reference/node-selection/syntax

Tests

  • Native tests and packages (dbt_expectations) in yml
  • Singular tests: sql files in the tests/ folder, with reference to models. Tests business logic and shows up in the lineage graph.
  • Generic tests, macro-like, parameterized sql, which are reusable across models. Usually start with a singular test (pure sql) then "promote" to a generic tests for reusability.
  • Override native tests by creating a generic tests with the same name, e.g 'not_null', for customization

dbt-fundamentals's People

Contributors

gbourniq avatar guillaume-sig avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.