Code Monkey home page Code Monkey logo

data-factory-testing-framework's Introduction

Data Factory - Testing Framework ๐Ÿ› ๏ธ

A stand-alone test framework that allows to write unit tests for Data Factory pipelines on Microsoft Fabric and Azure Data Factory.

The framework is currently in Public Preview and is not officially supported by Microsoft.

Features ๐Ÿš€

The framework evaluates pipeline and activity definitions which can be asserted. It does so by providing the following features:

  1. Evaluate expressions by using the framework's internal expression parser. It supports all the functions and arguments that are available in the Data Factory expression language.
  2. Test an activity with a specific state and assert the evaluated expressions.
  3. Test a pipeline run by verifying the execution flow of activities for specific input parameters and assert the evaluated expressions of each activity.

The framework does not support running the actual pipeline. It only gives you the ability to test the pipeline and activity definitions.

High-level example ๐Ÿ’ก

Given a WebActivity with a typeProperties.url property containing the following expression:

@concat(pipeline().globalParameters.BaseUrl, variables('Path'))

A simple test to validate that the concatenation is working as expected could look like this:

    # Arrange
    activity = pipeline.get_activity_by_name("webactivity_name")
    state = PipelineRunState(
        parameters=[
            RunParameter(RunParameterType.Global, "BaseUrl", "https://example.com"),
        ],
        variables=[
            PipelineRunVariable("Path", "some-path"),
        ])

    # Act
    activity.evaluate(state)

    # Assert
    assert "https://example.com/some-path" == activity.type_properties["url"].result

Why โ“

Data Factory does not support unit testing, nor testing of pipelines locally. Having integration and e2e tests running on an actual Data Factory instance is great, but having unit tests on top of them provides additional means of quick iteration, validation and regression testing. Unit testing with the Data Factory Testing Framework has the following benefits:

  • Runs locally with immediate feedback
  • Easier to cover a lot of different scenarios and edge cases
  • Regression testing

Concepts ๐Ÿ“š

The following pages go deeper into different topics and concepts of the framework to help in getting you started.

Basic ๐ŸŒฑ

  1. Repository setup
  2. Installing and initializing the framework
  3. State
  4. Activity testing
  5. Pipeline testing

If you are a not that experienced with Python, you can follow the Getting started guide to get started with the framework.

Advanced ๐Ÿ”ฌ

  1. Debugging your activities and pipelines
  2. Development workflow
  3. Overriding expression functions
  4. Framework internals

Examples ๐Ÿ“

More advanced examples demonstrating the capabilities of the framework:

Fabric:

  1. Batch job example

Azure Data Factory:

  1. Copy blobs example
  2. Batch job example

Limitations โš ๏ธ

The framework has the following limitations at the moment:

  1. The framework reimplements the Data Factory expression language in Python. This means that it might not be 100% accurate with the actual Data Factory language, especially when it comes to xml functions.
  2. The framework does not support automatic type conversion of arguments passed to functions in expressions.

These limitations are top priority to be resolved in the future.

Contributing ๐Ÿค

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks โ„ข๏ธ

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

data-factory-testing-framework's People

Contributors

arjendev avatar carlossardo avatar jayachithra avatar leonardhd avatar liliamleme avatar microsoft-github-operations[bot] avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar ydaponte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-factory-testing-framework's Issues

Publish alpha release for pyPI

We want to publish the pyPI package indicating it is a alpha release for now.

The Version specifiers based on (PEP 440) see Python Packaging User Guide requires us to use a releases that MUST be numbered in a consistently increasing fashion, otherwise automated tools will not be able to upgrade them correctly.

As such we would like to publish for now in a X.YaN which indicates an alpha release and keep options open to adopt semantic versioning, etc. later one.

The easiest and a consistent approach is to use 0.0.0.a${{ github.run_number}} for now, and evolve from there.

Explore and test framework - 4

Parent Task: #22
Goal: Get acquainted with the framework, test it and provide feedback.

Success criteria:

  • Test the framework with the examples provided in the repo.
  • Get acquainted with the code
  • Provide feedback (document it in the issue) and create issues to address items

Implement all functions in functions_repository in python version

We can separate the work in the following groups. Please order by amount of work and likelihood of usage:

Tasks

Add guidance on how to approach the framework for a new pipeline

Goal: improve the guidance, by adding a step-by-step on the things that need to be considered when starting to test a new pipeline.

Success criteria:

  • Add guidance (might include visuals) on how to approach the framework for new pipelines (that are not in the examples)
  • Add more examples to include often used tasks like the Copy activity for example

Publish NuGet under Microsoft org

Goal: Investigate how to publish official Microsoft NuGets and if possible find people that had experienced the process.

Success Criteria:

  • Identify official documentation to publish Microsoft NuGets that follow best practices and guidelines
  • Identify potential point of contact in the internal team to ask questions when needed

Tasks

No tasks being tracked yet.

Add clarity on json validity

Goal: create clarity on the ReadMe on expression validation

Success Criteria:
[ ] Add in the README the assumption that the json file is valid
[ ] Add note on npm package can be used on the CI pipelines to do that validation
[ ] Remove expression example on the README file

Explore and test framework - 3

Parent Task: #22
Goal: Get acquainted with the framework, test it and provide feedback.

Success criteria:

  • Test the framework with the examples provided in the repo.
  • Get acquainted with the code
  • Provide feedback (document it in the issue) and create issues to address items

Evaluate child pipelines

Goal: Allows to validate whether different pipelines work well together

Success criteria:

  • Can be enabled or disabled on a per test basis
  • If enabled, upon evaluation of an ExecutePipelineActivity by the TestFramework, it returns activities of the referenced child pipeline
  • Execution of child pipeline should have its own scope

Create Fabric pipelines to use for testing

Goal: Create Fabric pipelines to use as examples in the repo

Success criteria:
[ ] Create Fabric pipelines (minimum 2) with typical activities to be used as examples in the repo

Explore and test framework - 1

Parent Task: #22

Goal: Get acquainted with the framework, test it and provide feedback.

Success criteria:

  • Test the framework with the examples provided in the repo.
  • Get acquainted with the code
  • Provide feedback (document it in the issue) and create issues to address items

Explore and test framework - 2

Parent Task: #22
Goal: Get acquainted with the framework, test it and provide feedback.

Success criteria:

  • Test the framework with the examples provided in the repo.
  • Get acquainted with the code
  • Provide feedback (document it in the issue) and create issues to address items

Create Fabric pipelines to use for testing

Goal: Create Fabric pipelines to use as examples in the repo

Success criteria:
[ ] Create Fabric pipelines with typical activities to be used as examples in the repo

ExpressionParsingError for expressions containing formatDateTime

I have a pipeline with some expressions containing formatDateTime function as shown below and always get an ExpressionParsingError exception:

"File_Name": { "value": "@concat(item().TABLESPACE_NAME, '__', item().TABLE_NAME,'.v', formatDateTime(utcnow(), 'yyyyMMdd_HHmmss'), '.parquet')", "type": "Expression" }

Looks just like a typo in the framework code, there I find formatDataTime instead of formatDateTime.

P.S.: Very nice framework you created here for ADF testing. Thanks ๐Ÿ‘

pre-commit commitlint fails

Pre-commit hooks fail on commitlint:

commitlint...............................................................Failed
- hook id: commitlint
- exit code: 1

โง—   input: refactor: make functions internal
โœ–   Please add rules to your `commitlint.config.js`
    - Getting started guide: https://commitlint.js.org/#/?id=getting-started
    - Example config: https://github.com/conventional-changelog/commitlint/blob/master/%40commitlint/config-conventional/index.js [empty-rules]

โœ–   found 1 problems, 0 warnings
โ“˜   Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint

Related issue alessandrojcm/commitlint-pre-commit-hook#63

[Spike] Convert framework to Python

Goal: The goal of this spike is to check how feasible it is to convert the framework to Python and investigate what the different options are

Outcome:

  • What's the expected effort to convert this framework to Python?
  • Can we re-use reflection and generic-like features within Python? If not, what does this mean?
  • Can we generate models with dfe (data factory expression) types?

Check types of RunParameter to catch errors and give user feedback

Currently we accept any type of value in test_framework.evaluate_pipeline(...). However, supported are only primitives and dict.

If the user, for example passes:

state = [
RunParameter(
    RunParameterType.Pipeline, "Example", {},
)
]

vs

state = [
RunParameter(
    RunParameterType.Pipeline, "Example", {}
)
]

python constructs a tuple and not a dict, causing error in the evaluation mechanism.

TestFramework not working

Hello,

First time opening a bug, so please share any feedback if something is not right.

I installed the library from PyPi both locally and on a Databricks cluster with no issues.
pip install data-factory-testing-framework

When I'm trying to import TestFramework, I get the following error both locally and on ADB.
from data_factory_testing_framework import TestFramework
ImportError: cannot import name 'TestFramework' from 'data_factory_testing_framework' (/opt/homebrew/lib/python3.11/site-packages/data_factory_testing_framework/__init__.py)

If I run something like import data_factory_testing_framework it doesn't complain.

Not sure how to proceed.

Thanks.

Improve docs

Tasks

  1. documentation
    arjendev
  2. documentation
    ydaponte
  3. documentation
    arjendev
  4. documentation
    arjendev

Add testing terminology to the repo

Goal: describe the testing terminology assumed in this repo: unit testing and functional testing in the context of data pipelines.

Success criteria:

  • Add testing terminology to the main ReadMe.

Explore and test framework

Goal: Get acquainted with the framework, test it and provide feedback.

Success criteria:

  • Test the framework with the examples provided in the repo.
  • Get acquainted with the code
  • Provide feedback (document it in the issue) and create issues to address items

Tasks

  1. documentation
    sreedhar-guda
  2. documentation
    ydaponte
  3. documentation
    LiliamLeme
  4. documentation
    rajalakshmisenthil
  5. enhancement
    LiliamLeme
  6. enhancement
    ydaponte

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.