Code Monkey home page Code Monkey logo

oliver-twist's Introduction

Alt text

oliver-twist

DAG Auditor

Build status badge Code style: black PyPI PyPI - Python Version GitHub - License

oliver-twist is a dag auditing tool that audits the DBT DAG and generates a summary report. There is a docs site, including descriptions of all currently implemented rules.

please sir, can I automate my DAG auditing

Getting Started

To get started, install the package

$ pip install olivertwist

and then run it by passing it your dbt manifest JSON

olivertwist check manifest.json

This will report any failures to the console, and also in HTML format in a directory called target. You can optionally auto-open the report in a browser with:

olivertwist check manifest.json --browser

You can also tell Oliver to load and run your own custom rules using the --add-rules-from option. See documentation for full details.

Full options are available with:

olivertwist check --help

Configuration

All rules are enabled by default. To change this you need a configuration file called olivertwist.yml in the same directory you are running olivertwist. An example configuration is shown below:

version: '1.0'
universal:
  - id: no-rejoin-models
    enabled: false
  - id: no-disabled-models
    enabled: true

There is a command to help you generate the config automatically:

olivertwist config

This will show all the available rules and allow you to toggle the ones that you want to enforce.

Local Development

Clone this repo and install the project:

poetry install

Install pre-commit hooks for linting

This is optional, but highly recommended to avoid annoying linting failure in CI.

poetry run pre-commit install

To run the pre-commit hooks locally:

poetry run pre-commit run

To get the latest versions of the dependencies and to update the poetry.lock file run:

poetry update

To run oliver-twist and generate the summary report run:

poetry run olivertwist example_manifest.json

Working with diagrams

To update and regenerate the images that illustrate rule failures in the documentation follow the next steps:

  • update the diagrams using the mermaid syntax
  • install yarn
  • cd docs/diagrams
  • ./generate.sh
  • inspect the generated images in ./docs/diagrams/output/
  • if you're happy with the results, run ./copy.sh so that they are copied over to ./docs/images
  • you can now reference those images. e.g. in .docs/rules.md

Creating a distribution

poetry build --format wheel

oliver-twist's People

Contributors

bloomonkey avatar georgim0 avatar samwedge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

oliver-twist's Issues

Add execution time

Would like to add run_results.json to the processing engine, including the execution time per model. Have some code written for this locally and would appreciate guidance on how to commit to your repo for review.

IndexError: list index out of range

Hey team,
thanks for the great work with olivertwist!

We are using olivertwist in the CircleCI pipeline in one of our dbt projects and it works great. When rolling it out to other dbt projects I encountered an issue. Set up for olivertwist is the same and I am not aware of any difference in the dbt project that could explain this.

I am not sure how to debug this, so wanted to check if you have any ideas or pointers for me. I also had talked to a colleague that had the same issue in a different dbt project, so it seems not just my local set up.

Let me know if there is a better place or way to report issues like this. And also if you would need more information around set up to troubleshoot this.

Thank you!

Version: olivertwist 0.2

Traceback:

Traceback (most recent call last):
  File "/.venv/bin/olivertwist", line 8, in <module>
    sys.exit(main())
  File "/.venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/.venv/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/.venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/.venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/.venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/.venv/lib/python3.8/site-packages/olivertwist/main.py", line 66, in check
    results = rule_engine.run(manifest)
  File "/.venv/lib/python3.8/site-packages/olivertwist/ruleengine/engine.py", line 44, in run
    return [Result(rule, *rule.apply(manifest)) for rule in self.rules]
  File "/.venv/lib/python3.8/site-packages/olivertwist/ruleengine/engine.py", line 44, in <listcomp>
    return [Result(rule, *rule.apply(manifest)) for rule in self.rules]
  File "/.venv/lib/python3.8/site-packages/olivertwist/ruleengine/rule.py", line 30, in apply
    return self.func(manifest)
  File "/.venv/lib/python3.8/site-packages/olivertwist/rules/no_references_outside_of_its_staging_area.py", line 35, in no_references_outside_of_its_own_staging_area
    return list(passes), list(failures)
  File "/.venv/lib/python3.8/site-packages/olivertwist/rules/no_references_outside_of_its_staging_area.py", line 24, in staging_depends_on_staging_in_another_area
    different_staging_area_refs = [
  File "/.venv/lib/python3.8/site-packages/olivertwist/rules/no_references_outside_of_its_staging_area.py", line 28, in <listcomp>
    if not manifest.get_node(p).area == node.area
  File "/.venv/lib/python3.8/site-packages/olivertwist/manifest.py", line 95, in area
    return self.data["fqn"][2]
IndexError: list index out of range

Pluggable rule engine

It would be good if adding rules was as simple as telling CLI to look in a directory containing .py scripts that look something like:

@Rule(link="http://example.com/my-rule")
def obey_my_rule():
   ...
   

a rule check specification

add the following rules:
lineage checks:

  • ๐Ÿ†— Disabled Scripts
  • ๐Ÿ†— Orphan models
  • ๐Ÿ†— Rejoined models. Given following model dependency: A -> B -> C. Model C should not depend on A
  • ๐Ÿ†— Staging scripts referencing a staging script in a different area
  • ๐Ÿ†— Staging scripts referencing a mart
  • ๐Ÿ†— Marts directly referencing a source

meta checks on physical models(model that is not ephemeral):

advanced lineage check:

  • detect similar sub graphs which they originate from the same set of sources to figure out if there is a potential code reuse(enforce single source of truth, thinking that it would highlight duplicate code)

incompatible python packages for dbt 0.20.0

Hiya!

I'm trying to update to dbt 0.20.0, and it seems dbt now requires Jinja2 2.11.3 (see here) and Olivertwist requires Jinja2 2.11.2 (see here).

Obviously this is causing an error for me.

image

Let me know if there is a better place or way to report issues like this. And also if you would need more information around set up to troubleshoot this.

Thank you in advance!

Version: olivertwist 0.2

Is this project still being maintained?

I'm thinking about using oliver-twist to help with auditing our dbt project and just wondering if this project is being actively maintained/accepting feature requests/etc?

Use graph.pickle dbt artefact

Given we parse manifest.json into a networkx graph it feels sensible to just parse the pickled version output by DBT

Make rules configurable

Read rule configuration from a single config yml file

global:
    - id: <rule_name_1>
      enabled: false
    - id: <rule_name_2>
      enabled: false

All rules will be enabled be default.

Not compatible with dbt 0.18.1

The specified version of colorama (0.4.4) is not compatible with the latest dbt release (0.18.1) which depends on colorama (>=0.3.9,<0.4.4). This means you cannot have olivertwist & dbt within the same poetry project

To reproduce

$ poetry init --name example -n
$ poetry add "dbt=^0.18.1"
$ poetry add "olivertwist=^0.1.2"

Updating dependencies
Resolving dependencies... (0.3s)

[SolverProblemError]
Because no versions of olivertwist match >0.1.2,<0.2.0
 and olivertwist (0.1.2) depends on colorama (>=0.4.4,<0.5.0), olivertwist (>=0.1.2,<0.2.0) requires colorama (>=0.4.4,<0.5.0).
And because dbt-core (0.18.1) depends on colorama (>=0.3.9,<0.4.4), olivertwist (>=0.1.2,<0.2.0) is incompatible with dbt-core (0.18.1).
And because dbt (0.18.1) depends on dbt-core (0.18.1)
 and no versions of dbt match >0.18.1,<0.19.0, olivertwist (>=0.1.2,<0.2.0) is incompatible with dbt (>=0.18.1,<0.19.0).
So, because example depends on both dbt (^0.18.1) and olivertwist (^0.1.2), version solving failed.

From a quick look, could colorama be removed as a dependency entirely?

Load custom rules

Enable users to load their own rules.

We want to write rules against our own metadata which wouldn't be generally reusable.

Ignore rule on an individual model

Right now rules appear to be applied to all models by default. An ability to configure a way to ignore a specific model would be really appreciated especially as one is adopting oliver twist it can sometimes require a significant refactor to get all the models in line with the auditing rules. Being able to specific specific models to be ignored would be quite useful. This probably should be doable either in the olivertwist.yml file or in the model.yml file in the meta field specifying some kind of audit ignore

Possible to run an empty rule set

With configuration file, it is now possible to run with no rules enabled.
Currently this result in a success response from check.
It would be better to exit early with a warning that no enabled rules were loaded.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.