Code Monkey home page Code Monkey logo

data_expectations's Introduction

Data Expectations

Are your data meeting your expectations?


License Regression Suite Static Analysis codecov Downloads Code style: black PyPI Latest Release FOSSA Status

Data Expectations is a Python library which takes a delarative approach to asserting qualities of your datasets. Instead of tests like is_sorted to determine if a column is ordered, the expectation is column_values_are_increasing. Most of the time you don't need to know how it got like that, you are only interested what the data looks like now.

Expectations can be used alongside, or in place of a schema validator, however Expectations is intended to perform validation of the data in a dataset, not the structure of a table. Records should be a Python dictionary (or dictionary-like object) and can be processed one-by-one, or against an entire list of dictionaries.

Data Expectations was inspired by the great Great Expectations library, but we wanted something lighter and easier to quickly set up and run. Data Expectations can do less, but it does it with a fraction of the effort and has zero dependencies.

Use Cases

  • Use Data Expectations was as a step in data processing pipelines, testing the data conforms to expectations before it is committed to the warehouse.
  • Use Data Expectations to simplify validating user supplied values.

Provided Expectations

  • expect_column_to_exist (column)
  • expect_column_values_to_not_be_null (column)
  • expect_column_values_to_be_of_type (column, expected_type, ignore_nulls:true)
  • expect_column_values_to_be_in_type_list (column, type_list, ignore_nulls:true)
  • expect_column_values_to_be_more_than (column, threshold, ignore_nulls:true)
  • expect_column_values_to_be_less_than (column, threshold, ignore_nulls:true)
  • expect_column_values_to_be_between (column, maximum, minimum, ignore_nulls:true)
  • expect_column_values_to_be_increasing (column, ignore_nulls:true)
  • expect_column_values_to_be_decreasing (column, ignore_nulls:true)
  • expect_column_values_to_be_in_set (column, symbols, ignore_nulls:true)
  • expect_column_values_to_match_regex (column, regex, ignore_nulls:true)
  • expect_column_values_to_match_like (column, like, ignore_nulls:true)
  • expect_column_values_length_to_be (column, length, ignore_nulls:true)
  • expect_column_values_length_to_be_between (column, maximum, minimum, ignore_nulls:true)

Install

pip install data_expectations

Data Expectations has no external dependencies, can be used ad hoc and in-the-moment without complex set up.

Example Usage

Testing Python Dictionaries

import data_expectations as de
from data_expectations import Expectation
from data_expectations import Behaviors

TEST_DATA = {"name": "charles", "age": 12}

set_of_expectations = [
    Expectation(Behaviors.EXPECT_COLUMN_TO_EXIST, column="name"),
    Expectation(Behaviors.EXPECT_COLUMN_TO_EXIST, column="age"),
    Expectation(Behaviors.EXPECT_COLUMN_VALUES_TO_BE_BETWEEN, column="age", config={"minimum": 0, "maximum": 120}),
]

expectations = de.Expectations(set_of_expectations)
try:
    de.evaluate_record(expectations, TEST_DATA)
except de.errors.ExpectationNotMetError:  # pragma: no cover
    print("Data Didn't Meet Expectations")

Testing individual Values:

import data_expectations as de
from data_expectations import Expectation
from data_expectations import Behaviors

expectation = Expectation(Behaviors.EXPECT_COLUMN_VALUES_TO_BE_BETWEEN, column="age", config={"minimum": 0, "maximum": 120})

try:
    expectation.test_value(55)
except de.errors.ExpectationNotMetError:  # pragma: no cover
    print("Data Didn't Meet Expectations")

data_expectations's People

Contributors

cclauss avatar fossabot avatar joocer avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

fossabot

data_expectations's Issues

Create Table Expectations

Expect Table Row Count To Equal
Expect Table Row Count To Be More Than
Expect Table Row Count To Be Less Than
Expect Table Row Count To Be Between

Expect Table Column Count To Equal
Expect Table Column Count To Be More Than
Expect Table Column Count To Be Less Than
Expect Table Column Count To Be Between

Expect Table To Have No Duplicate Rows

Expect Column Values To Add To

Expect Column Values To Not Have Outliers (3 standard deviations)
Expect Column Values Mean To Equal
Expect Column Values Mean To Be Between
Expect Column Values Mean To Be More Than
Expect Column Values Mean To Be Less Than

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.