Code Monkey home page Code Monkey logo

dbt-tpch's Introduction

dbt TPC-H Sample

This is a dbt sample project for Snowflake using the TPC-H example dataset that ships as a shared database with Snowflake.

More details can be found on the TPC website and in the specification document.

The project is laid out as follows:

  • _schema contains schema defintions and tests in one .yml file per model
  • _source contains source definitions
  • base contains ephemeral base models that serve as wrappers around source models to define column names and data types where necessary
  • ods represents an Operational Data Store (ODS), i.e. a mostly normalized view of the data. These models may contain more columns than we may choose to publish to the dimensional data warehouse, but don't contain any reporting models.
  • wh represents the Dimensional Data Warehouse (WH). These models use a star schema methodology made up of fact (fct_*) and dimension (dim_*) tables. In addition, this schema contains report models (rpt_*) that combine fact and dimension tables for business reporting.

Profile

  • Add a new profile to ~/.dbt/profiles.yml called tpch.
tpch:
    target: dev
    outputs:
        prod:
            type: snowflake
            threads: 8
            account: <account>
            user: <user>
            password: <password>
            role: <role>
            database: <target_database>
            warehouse: <snowflake_warehouse>
            schema: <default_schema>

        dev:
            type: snowflake
            threads: 8
            account: <account>
            user: <user>
            password: <password>
            role: <role>
            database: <target_database>
            warehouse: <snowflake_warehouse>
            schema: <default_schema>

Packages

This project make use of the dbt_utils package, so you will need to call dbt deps before running any model to ensure dbt can combile all package macros.

Scaling Factor

Also, note that you can change the scaling factor of the TPCH dataset by switching the source database in _source/source_tpch.yml from the default of 10 to 100 or 1000 by changing the database name accordingly.

version: 2

sources:
  - name: tpch
    database: SNOWFLAKE_SAMPLE_DATA
    schema: TPCH_SF10
    loader: Snowflake

...

Snowflake Usage

Using an X-Small warehouse (1 credit / hour), the project currently runs in about 5 minutes against the TPCH_SF10 database.



dbt-tpch's People

Contributors

clausherther avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.