Code Monkey home page Code Monkey logo

redshift's Introduction

Redshift data models and utilities

dbt models for Redshift warehouses.

Installation instructions

  1. Include this package in your packages.yml -- check here for installation instructions.
  2. Run dbt deps

Models

This package provides a number of base models for Redshift system tables, as well as a few utility views that usefully combine the base models.

Base Models

Each of these base models maps 1-to-1 with the underlying Redshift table. Some renaming has been performed to make the field names grokable.

Ephemeral Models

These ephemeral models simplify some of Redshift's field naming and logic, to make the data more usable.

  • redshift_cost: transforms the start and max explain cost values from stl_explain into floating point values

View Models

These views are designed to make debugging your Redshift cluster more straightforward. They are, in effect, materializations of the Diagnostic Queries for Query Tuning from Redshift's documentation.

  • redshift_admin_queries: Simplified view of queries, including explain cost, execution times, and queue times.
  • redshift_admin_table_stats: Gives insight on tables in your warehouse. Includes information on sort and dist keys, table size on disk, and more.
  • redshift_admin_dependencies: Simplified view of pg_depend, showing any dependent objects (views) for a given source object

These views are designed to make user privilege management more straightforward.

Introspection Models

These models (default ephemeral) make it possible to inspect tables, columns, constraints, and sort/dist keys of the Redshift cluster. These models are used to build column compression queries, but may also be generally useful.

Macros

compress_table (source)

This macro returns the SQL required to auto-compress a table using the results of an analyze compression query. All comments, constraints, keys, and indexes are copied to the newly compressed table by this macro. Additionally, sort and dist keys can be provided to override the settings from the source table. By default, a backup table is made which is not deleted. To delete this backup table after a successful copy, use drop_backup flag.

Macro signature:

{{ compress_table(schema, table,
                  drop_backup=False,
                  comprows=none|Integer,
                  sort_style=none|compound|interleaved,
                  sort_keys=none|List<String>,
                  dist_style=none|all|even,
                  dist_key=none|String) }}

Example usage:

{{
  config({
    "materialized":"table",
    "sort": "id",
    "dist": "id",
    "post-hook": [
      "{{ redshift.compress_table(this.schema, this.table, drop_backup=False) }}"
    ]
  })
}}

unload_table (source)

This macro returns the SQL required to unload a Redshift table to one or more files on S3. The macro replicates all functionality provided by Redshift's UNLOAD command.

Macro signature:

{{ unload_table(schema,
                table,
                s3_path,
                iam_role=None|String,
                aws_key=None|String,
                aws_secret=None|String,
                aws_token=None|String,
                aws_region=None|String,
                manifest=Boolean,
                header=Boolean,
                format=None|String,
                delimiter=String,
                null_as=String,
                max_file_size=String,
                escape=Boolean,
                compression=None|GZIP|BZIP2,
                add_quotes=Boolean,
                encrypted=Boolean,
                overwrite=Boolean,
                cleanpath=Boolean,
                parallel=Boolean,
                partition_by=none|List<String>
) }}

Example usage:

{{
  config({
    "materialized":"table",
    "sort": "id",
    "dist": "id",
    "post-hook": [
      "{{ redshift.unload_table(this.schema,
                                this.table,
                                s3_path='s3://bucket/folder',
                                aws_key='abcdef',
                                aws_secret='ghijklm',
                                header=True,
                                delimiter='|') }}"
    ]
  })
}}

redshift_maintenance_operation (source)

This macro is intended to be run as an operation. It vacuums and analyzes each table, with verbose logging.

The user who runs this operation must be a super user.

$ dbt run-operation redshift_maintenance
Running with dbt=0.14.2
06:35:33 + 1 of 478 Vacuuming "analytics"."customer_orders"
06:35:33 + 1 of 478 Analyzing "analytics"."customer_orders"
06:35:33 + 1 of 478 Finished "analytics"."customer_orders" in 0.29s
06:35:33 + 2 of 478 Vacuuming "analytics"."customer_payments"
06:35:33 + 2 of 478 Analyzing "analytics"."customer_payments"
06:35:33 + 2 of 478 Finished "analytics"."customer_payments" in 0.28s

The command can also be run with optional parameters to exclude schemas, either with exact or regex matching. This can be useful for cases where the models (or the schemas themselves) tend to be short-lived and don't require vacuuming. For example:

$ dbt run-operation redshift_maintenance --args '{exclude_schemas: ["looker_scratch"], exclude_schemas_like: ["sinter\\_pr\\_%"]}'

You can also implement your own query to choose which tables to vacuum. To do so, create a macro in your own project named vacuumable_tables_sql, following the same pattern as the macro in this package. Here's an example:

-- my_project/macros/redshift_maintenance_operation.sql
{% macro vacuumable_tables_sql() %}
{%- set limit=kwargs.get('limit') -%}
select
    current_database() as table_database,
    table_schema,
    table_name
from information_schema.tables
where table_type = 'BASE TABLE'

order by table_schema, table_name
{% if limit %}
limit ~ {{ limit }}
{% endif %}
{% endmacro %}

When you run the redshift_maintenance macro, your version of vacuumable_tables_sql will be respected. You can also add arguments to your version of vacuumable_tables_sql by following the pattern in the vacuumable_tables_sql macro in this package.

Note: This macro will skip any relations that are dropped in the time betwen running the initial query, and the point at which you try to vacuum it. This results in a message like so:

13:18:22 + 1 of 157 Skipping relation "analytics"."dbt_claire"."amazon_orders" as it does not exist

Contributing

Additional contributions to this repo are very welcome! Check out this post on the best workflow for contributing to a package. All PRs should only include functionality that is contained within all Redshift deployments; no implementation-specific details should be included.

redshift's People

Contributors

abelsonlive avatar brunomurino avatar clrcrl avatar cmcarthur avatar drewbanin avatar jeremyyeo avatar joellabes avatar jtcohen6 avatar jthandy avatar kochalex avatar louisguitton avatar mazhelis avatar nolan-redox avatar ryanaustincarlson avatar rynmccrmck avatar sanjanasen96 avatar tconbeer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.