Code Monkey home page Code Monkey logo

Comments (3)

RobMcZagBDS avatar RobMcZagBDS commented on June 29, 2024

Thank you @ernestoongaro

The current unit-test functionality is perfectly suited to very specific and narrow testing, where you just pick the few columns you need, mock the macro calls to the return value you expect and check that you get the desired output.
It is a lot like unit tests done mocking everything, but that one tiny bit of complex logic that you want to test.
Great power, but you can paint a building with the same brush you paint small details in fine art paintings.

The typical case I had in mind is when you are trying to have a wider validation of a model, so you got some sample data and you have isolated a few rows that cover the different use cases and you received or manually verified the expected output for them, so you would keep these input and expected rows in a table and use them, eventually adding more rows when new use cases are found. One great case is adding the use case data when you find out a bug and fix it.

In this situation it is also useful to note that the output / expectation of one model easily becomes the input for the next model to test, so you could almost visualize the sequence as a chain of known set of inputs and their expected outputs down the lineage line.

I would suggest a format of ref or source and then as rows we could put the query that references the ref or source.

from dbt-core.

dbeatty10 avatar dbeatty10 commented on June 29, 2024

@RobMcZagBDS and @ernestoongaro thank you both for raising this issue 🤩

After discussing with @graciegoheen, this isn't something we’d prioritize anytime soon, but we will continue to listen for how many folks are asking for this.

A large reason for our prioritization is the complexity that would be involved in implementing this. Here is a summary of some of the obstacles identified by @gshank:

  • SQL fixtures: Each one would need to be compiled, requiring additional code and refactoring.
  • Extra fields: New fields like compiled_sql would need to be added.
  • Dependency handling: Handling dependencies (depends_on) would be difficult because fixture nodes are created dynamically and don’t exist during the initial parsing stage. The unit testing manifest might need separate depends_on structures for each fixture.
  • Additional unknowns: there may be remaining things that would be difficult to handle or that we'd have a hard time detecting.

from dbt-core.

RobMcZag avatar RobMcZag commented on June 29, 2024

Thank you Dough, Grace, Gerda and Ernesto for looking into it.
I understand the technical difficulties, and somehow we can cope with the current limitations even if it is not as elegant and maintainable as it would by being able to use a source() reference.

Maybe it is just my feeling, but I would prefer to have a "FIXTURES" schema with "TABLE_XXX" and "TABLE_XXX__EXPECTATION" (if the expectation it is not the same as next input in the pipeline "TABLE_YYY") and select the rows and columns with SQL than have a similar collection of CSV or SQL files in a folder inside the repository.
We can do that with the current SQL feature by hardcoding the DB & SCHEMA. Do we have variables in the context?

BTW in the docs it would be nice to have a better description of what you expect from a SQL file.
My gut feeling is a piece of SQL that when run returns the desired rows and columns, but not 100% sure and have not yet experimented with it.

from dbt-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.