Code Monkey home page Code Monkey logo

nutter's Introduction

Nutter

Overview

The Nutter framework makes it easy to test Databricks notebooks. The framework enables a simple inner dev loop and easily integrates with Azure DevOps Build/Release pipelines, among others. When data or ML engineers want to test a notebook, they simply create a test notebook called test_<notebook_under_test>.

Nutter has 2 main components:

  1. Nutter Runner - this is the server-side component that is installed as a library on the Databricks cluster
  2. Nutter CLI - this is the client CLI that can be installed both on a developers laptop and on a build agent

The tests can be run from within that notebook or executed from the Nutter CLI, useful for integrating into Build/Release pipelines.

Nutter Runner

Cluster Installation

The Nutter Runner can be installed as a cluster library, via PyPI.

For more information about installing libraries on a cluster, review Install a library on a cluster.

Nutter Fixture

The Nutter Runner is simply a base Python class, NutterFixture, that test fixtures implement. The runner runtime is a module you can use once you install Nutter on the Databricks cluster. The NutterFixture base class can then be imported in a test notebook and implemented by a test fixture:

from runtime.nutterfixture import NutterFixture, tag
class MyTestFixture(NutterFixture):
   …

To run the tests:

result = MyTestFixture().execute_tests()

To view the results from within the test notebook:

print(result.to_string())

To return the test results to the Nutter CLI:

result.exit(dbutils)

Note: The call to result.exit, behind the scenes calls dbutils.notebook.exit, passing the serialized TestResults back to the CLI. At the current time, print statements do not work when dbutils.notebook.exit is called in a notebook, even if they are written prior to the call. For this reason, it is required to temporarily comment out result.exit(dbutils) when running the tests locally.

The following defines a single test fixture named 'MyTestFixture' that has 1 TestCase named 'test_name':

from runtime.nutterfixture import NutterFixture, tag
class MyTestFixture(NutterFixture):
   def run_test_name(self):
      dbutils.notebook.run('notebook_under_test', 600, args)

   def assertion_test_name(self):
      some_tbl = sqlContext.sql('SELECT COUNT(*) AS total FROM sometable')
      first_row = some_tbl.first()
      assert (first_row[0] == 1)

result = MyTestFixture().execute_tests()
print(result.to_string())
# Comment out the next line (result.exit(dbutils)) to see the test result report from within the notebook
result.exit(dbutils)

To execute the test from within the test notebook, simply run the cell containing the above code. At the current time, in order to see the below test result, you will have to comment out the call to result.exit(dbutils). That call is required to send the results, if the test is run from the CLI, so do not forget to uncomment after locally testing.

Notebook: (local) - Lifecycle State: N/A, Result: N/A
============================================================
PASSING TESTS
------------------------------------------------------------
test_name (19.43149897100011 seconds)


============================================================

Test Cases

A test fixture can contain 1 or more test cases. Test cases are discovered when execute_tests() is called on the test fixture. Every test case is comprised of 1 required and 3 optional methods and are discovered by the following convention: prefix_testname, where valid prefixes are: before_, run_, assertion_, and after_. A test fixture that has run_fred and assertion_fred methods has 1 test case called 'fred'. The following are details about test case methods:

  • before_(testname) - (optional) - if provided, is run prior to the 'run_' method. This method can be used to setup any test pre-conditions

  • run_(testname) - (optional) - if provider, is run after 'before_' if before was provided, otherwise run first. This method is typically used to run the notebook under test

  • assertion_(testname) (required) - run after 'run_', if run was provided. This method typically contains the test assertions

Note: You can assert test scenarios using the standard assert statement or the assertion capabilities from a package of your choice.

  • after_(testname) (optional) - if provided, run after 'assertion_'. This method typically is used to clean up any test data used by the test

A test fixture can have multiple test cases. The following example shows a fixture called MultiTestFixture with 2 test cases: 'test_case_1' and 'test_case_2' (assertion code omitted for brevity):

from runtime.nutterfixture import NutterFixture, tag
class MultiTestFixture(NutterFixture):
   def run_test_case_1(self):
      dbutils.notebook.run('notebook_under_test', 600, args)

   def assertion_test_case_1(self):
     …

   def run_test_case_2(self):
      dbutils.notebook.run('notebook_under_test', 600, args)

   def assertion_test_case_2(self):
     …

result = MultiTestFixture().execute_tests()
print(result.to_string())
#result.exit(dbutils)

before_all and after_all

Test Fixtures also can have a before_all() method which is run prior to all tests and an after_all() which is run after all tests.

from runtime.nutterfixture import NutterFixture, tag
class MultiTestFixture(NutterFixture):
   def before_all(self):
      …

   def run_test_case_1(self):
      dbutils.notebook.run('notebook_under_test', 600, args)

   def assertion_test_case_1(self):
     …

   def after_all(self):
      …

Multiple test assertions pattern with before_all

It is possible to support multiple assertions for a test by implementing a before_all method, no run methods and multiple assertion methods. In this pattern, the before_all method runs the notebook under test. There are no run methods. The assertion methods simply assert against what was done in before_all.

from runtime.nutterfixture import NutterFixture, tag
class MultiTestFixture(NutterFixture):
   def before_all(self):
     dbutils.notebook.run('notebook_under_test', 600, args) 
      …

   def assertion_test_case_1(self):
      …

   def assertion_test_case_2(self):
     …

   def after_all(self):
      …

Guaranteed test order

After test cases are loaded, Nutter uses a sorted dictionary to order them by name. Therefore test cases will be executed in alphabetical order.

Sharing state between test cases

It is possible to share state across test cases via instance variables. Generally, these should be set in the constructor. Please see below:

class TestFixture(NutterFixture):
  def __init__(self):
    self.file = '/data/myfile'
    NutterFixture.__init__(self)

Running test fixtures in parallel

Version 0.1.35 includes a parallel runner class NutterFixtureParallelRunner that facilitates the execution of test fixtures concurrently. This approach could significantly increase the performance of your testing pipeline.

The following code executes two fixtures, CustomerTestFixture and CountryTestFixture in parallel.

from runtime.runner import NutterFixtureParallelRunner
from runtime.nutterfixture import NutterFixture, tag
class CustomerTestFixture(NutterFixture):
   def run_customer_data_is_inserted(self):
      dbutils.notebook.run('../data/customer_data_import', 600)

   def assertion_customer_data_is_inserted(self):
      some_tbl = sqlContext.sql('SELECT COUNT(*) AS total FROM customers')
      first_row = some_tbl.first()
      assert (first_row[0] == 1)

class CountryTestFixture(NutterFixture):
   def run_country_data_is_inserted(self):
      dbutils.notebook.run('../data/country_data_import', 600)

   def assertion_country_data_is_inserted(self):
      some_tbl = sqlContext.sql('SELECT COUNT(*) AS total FROM countries')
      first_row = some_tbl.first()
      assert (first_row[0] == 1)

parallel_runner = NutterFixtureParallelRunner(num_of_workers=2)
parallel_runner.add_test_fixture(CustomerTestFixture())
parallel_runner.add_test_fixture(CountryTestFixture())

result = parallel_runner.execute()
print(result.to_string())
# Comment out the next line (result.exit(dbutils)) to see the test result report from within the notebook
# result.exit(dbutils)

The parallel runner combines the test results of both fixtures in a single result.

Notebook: N/A - Lifecycle State: N/A, Result: N/A
Run Page URL: N/A
============================================================
PASSING TESTS
------------------------------------------------------------
country_data_is_inserted (11.446587234000617 seconds)
customer_data_is_inserted (11.53276599000128 seconds)


============================================================

Command took 11.67 seconds -- by [email protected] at 12/15/2022, 9:34:24 PM on Foo Cluster

Nutter CLI

The Nutter CLI is a command line interface that allows you to execute and list tests via a Command Prompt.

Getting Started with the Nutter CLI

Install the Nutter CLI

pip install nutter

Note: It's recommended to install the Nutter CLI in a virtual environment.

Set the environment variables.

Linux

export DATABRICKS_HOST=<HOST>
export DATABRICKS_TOKEN=<TOKEN>

Windows PowerShell

$env:DATABRICKS_HOST="HOST"
$env:DATABRICKS_TOKEN="TOKEN"

Note: For more information about personal access tokens review Databricks API Authentication.

Listing test notebooks

The following command list all test notebooks in the folder /dataload

nutter list /dataload

Note: The Nutter CLI lists only tests notebooks that follow the naming convention for Nutter test notebooks.

By default the Nutter CLI lists test notebooks in the folder ignoring sub-folders.

You can list all test notebooks in the folder structure using the --recursive flag.

nutter list /dataload --recursive

Executing test notebooks

The run command schedules the execution of test notebooks and waits for their result.

Run single test notebook

The following command executes the test notebook /dataload/test_sourceLoad in the cluster 0123-12334-tonedabc with the notebook_param key-value pairs of {"example_key_1": "example_value_1", "example_key_2": "example_value_2"} (Please note the escaping of quotes):

nutter run dataload/test_sourceLoad --cluster_id 0123-12334-tonedabc --notebook_params "{\"example_key_1\": \"example_value_1\", \"example_key_2\": \"example_value_2\"}"

Note: In Azure Databricks you can get the cluster ID by selecting a cluster name from the Clusters tab and clicking on the JSON view.

Run multiple tests notebooks

The Nutter CLI supports the execution of multiple notebooks via name pattern matching. The Nutter CLI applies the pattern to the name of test notebook without the test_ prefix. The CLI also expects that you omit the prefix when specifying the pattern.

Say the dataload folder has the following test notebooks: test_srcLoad and test_srcValidation with the notebook_param key-value pairs of {"example_key_1": "example_value_1", "example_key_2": "example_value_2"}. The following command will result in the execution of both tests.

nutter run dataload/src* --cluster_id 0123-12334-tonedabc --notebook_params "{\"example_key_1\": \"example_value_1\", \"example_key_2\": \"example_value_2\"}" 

In addition, if you have tests in a hierarchical folder structure, you can recursively execute all tests by setting the --recursive flag.

The following command will execute all tests in the folder structure within the folder dataload.

nutter run dataload/ --cluster_id 0123-12334-tonedabc --recursive

Parallel Execution

By default the Nutter CLI executes the test notebooks sequentially. The execution is a blocking operation that returns when the job reaches a terminal state or when the timeout expires.

You can execute mutilple notebooks in parallel by increasing the level of parallelism. The flag --max_parallel_tests controls the level of parallelism and determines the maximum number of tests that will be executed at the same time.

The following command executes all the tests in the dataload folder structure, and submits and waits for the execution of at the most 2 tests in parallel.

nutter run dataload/ --cluster_id 0123-12334-tonedabc --recursive --max_parallel_tests 2

Note: Running tests notebooks in parallel introduces the risk of data race conditions when two or more tests notebooks modify the same tables or files at the same time. Before increasing the level of parallelism make sure that your tests cases modify only tables or files that are used or referenced within the scope of the test notebook.

Nutter CLI Syntax and Flags

Run Command

SYNOPSIS
    nutter run TEST_PATTERN CLUSTER_ID <flags>

POSITIONAL ARGUMENTS
    TEST_PATTERN
    CLUSTER_ID
FLAGS
    --timeout              Execution timeout in seconds. Integer value. Default is 120
    --junit_report         Create a JUnit XML report from the test results.
    --tags_report          Create a CSV report from the test results that includes the test cases tags.
    --max_parallel_tests   Sets the level of parallelism for test notebook execution.
    --recursive            Executes all tests in the hierarchical folder structure. 
    --poll_wait_time       Polling interval duration for notebook status. Default is 5 (5 seconds).
    --notebook_params      Allows parameters to be passed from the CLI tool to the test notebook. From the 
                           notebook, these parameters can then be accessed by the notebook using 
                           the 'dbutils.widgets.get('key')' syntax.

Note: You can also use flags syntax for POSITIONAL ARGUMENTS

List Command

NAME
    nutter list

SYNOPSIS
    nutter list PATH <flags>

POSITIONAL ARGUMENTS
    PATH
FLAGS
    --recursive         Lists all tests in the hierarchical folder structure.

Note: You can also use flags syntax for POSITIONAL ARGUMENTS

Integrating Nutter with Azure DevOps

You can run the Nutter CLI within an Azure DevOps pipeline. The Nutter CLI will exit with non-zero code when a test case fails or the execution of the test notebook is not successful.

The following Azure DevOps pipeline installs nutter, recursively executes all tests in the workspace folder /Shared/ and publishes the test results.

Note: The pipeline expects the Databricks cluster, host and API token as pipeline varibles.

# Starter Nutter pipeline

trigger:
- develop

pool:
  vmImage: 'ubuntu-latest'

steps:
- task: UsePythonVersion@0
  inputs:
    versionSpec: '3.5'

- script: |
    pip install nutter
  displayName: 'Install Nutter'

- script: |
    nutter run /Shared/ $CLUSTER --recursive --junit_report
  displayName: 'Execute Nutter'
  env:
      CLUSTER: $(clusterID)
      DATABRICKS_HOST: $(databricks_host)
      DATABRICKS_TOKEN: $(databricks_token)

- task: PublishTestResults@2
  inputs:
    testResultsFormat: 'JUnit'
    testResultsFiles: '**/test-*.xml'
    testRunTitle: 'Publish Nutter results'
  condition: succeededOrFailed()

In some scenarios, the notebooks under tests must be executed in a pre-configured test workspace, other than the development one, that contains the necessary pre-requisites such as test data, tables or mounted points. In such scenarios, you can use the pipeline to deploy the notebooks to the test workspace before executing the tests with Nutter.

The following sample pipeline uses the Databricks CLI to publish the notebooks from triggering branch to the test workspace.

# Starter Nutter pipeline

trigger:
- develop

pool:
  vmImage: 'ubuntu-latest'

steps:
- task: UsePythonVersion@0
  inputs:
    versionSpec: '3.5'

- task: configuredatabricks@0
  displayName: 'Configure Databricks CLI'
  inputs:
    url: $(databricks_host)
    token: $(databricks_token)

- task: deploynotebooks@0
  displayName: 'Publish notebooks to test workspace'
  inputs:
    notebooksFolderPath: '$(System.DefaultWorkingDirectory)/notebooks/nutter'
    workspaceFolder: '/Shared/nutter'

- script: |
    pip install nutter
  displayName: 'Install Nutter'

- script: |
    nutter run /Shared/ $CLUSTER --recursive --junit_report
  displayName: 'Execute Nutter'
  env:
      CLUSTER: $(clusterID)
      DATABRICKS_HOST: $(databricks_host)
      DATABRICKS_TOKEN: $(databricks_token)

- task: PublishTestResults@2
  inputs:
    testResultsFormat: 'JUnit'
    testResultsFiles: '**/test-*.xml'
    testRunTitle: 'Publish Nutter results'
  condition: succeededOrFailed()

Debugging Locally

If using Visual Studio Code, you can use the example_launch.json file provided, editing the variables in the <> symbols to match your environment. You should be able to use the debugger to see the test run results, much the same as you would in Azure Devops.

Contributing

Contribution Tips

  • There's a known issue with VS Code and the lastest version of pytest.
    • Please make sure that you install pytest 5.0.1
    • If you installed pytest using VS Code, then you are likely using the incorrect version. Run the following command to fix it:
pip install --force-reinstall pytest==5.0.1

Creating the wheel file and manually test wheel locally

  1. Change directory to the root that contains setup.py
  2. Update the version in the setup.py
  3. Run the following command: python3 setup.py sdist bdist_wheel
  4. (optional) Install the wheel locally by running: python3 -m pip install

Contribution Guidelines

If you would like to become an active contributor to this project please follow the instructions provided in Microsoft Azure Projects Contribution Guidelines.


This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

nutter's People

Contributors

giventocode avatar microsoft-github-operations[bot] avatar microsoftopensource avatar milosjava avatar robbagby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nutter's Issues

Assertions on notebook context

Is there a way to make assertions on a notebook context "imported" with dbutils.notebook.run("some_nb", 600, {})?

In order to clarify, I can have a notebook some_nb.py like this:

# Databricks notebook source
# some_nb.py
MY_VAR = 100

and my nutter notebook test_some_nb.py:

# Databricks notebook source
# test_some_nb.py

# MAGIC %pip install nutter

# COMMAND ----------

from runtime.nutterfixture import NutterFixture

class TestSomeNb(NutterFixture):
    def before_all(self):
        dbutils.notebook.run("some_nb", 600, {})

    def assertion_value(self):
        assert MY_VAR == 100

# COMMAND ----------

result = TestSomeNb().execute_tests()
result.exit(dbutils)

This is obviously broken, since MY_VAR isn't in the scope of "test_some_nb.py", even though I have run the dependant notebook.

Is there a way to this without explicitly running the notebook with the following command?

# COMMAND ----------

# MAGIC %run ./some_nb

Add Job Cluster Support

Instead of requiring a cluster id to be provided during nutter execution, can job cluster support be added? In our use case, we would only need to supply policy id and not an entire cluster configuration, although if this is implemented it seems like it would make sense to support full-fledge integration with the jobs 2.1 API and creating a job cluster.

Even if the parameter for executing this allowed the cluster configuration to be passed through to the 2.1 api execution, it would be very useful in scenarios where you don't want to have your tests executing on an interactive cluster but instead be run as a job on a job cluster.

If more details are required of this ask please respond back, and I'll try to better articulate my statement above.

Variables starting with 'run_' are detected as test cases and hence failed tests in result

Details

If a variable name starts with 'run_', it is detected as a test case. So, we have a failed test in the result.

Code

from runtime.nutterfixture import NutterFixture, tag

class TestFixture(NutterFixture):
  
    def __init__(self):
      self.run_not_a_test = 1
      super().__init__()
      
    def assertion_test_01(self):
      assert True

result = TestFixture().execute_tests()
print(result.to_string(), type(result))

Given result

Notebook: N/A - Lifecycle State: N/A, Result: N/A
Run Page URL: N/A
============================================================
FAILING TESTS
------------------------------------------------------------
not_a_test (8.499999239575118e-06 seconds)

Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-3c24d4c3-2c7b-40f5-b888-7fb218c2a6af/lib/python3.8/site-packages/runtime/testcase.py", line 59, in execute_test
    raise NoTestCasesFoundError(
runtime.testcase.NoTestCasesFoundError: Both a run and an assertion are required for every test

NoTestCasesFoundError: Both a run and an assertion are required for every test

PASSING TESTS
------------------------------------------------------------
test_01 (3.399998604436405e-06 seconds)

============================================================
 <class 'common.testexecresults.TestExecResults'>

Expected Result

Notebook: N/A - Lifecycle State: N/A, Result: N/A
Run Page URL: N/A
============================================================
PASSING TESTS
------------------------------------------------------------
test_01 (5.500000042957254e-06 seconds)

============================================================
 <class 'common.testexecresults.TestExecResults'>

Is it possible to run a test case without running a notebook

In our production pipeline, we save intermediate data frames to Parquet.

A comparison of a Parquet file and an expected file is how we plan to test the code. It's not unit tests per se but close enough.
Can Nutter be used in such use case where the pipeline runs fully and then tests are performed on intermediate data frame representations?

allow using DATABRICKS_AAD_TOKEN

Hi,

We recently switched to using a service principle and Azure AD token instead of using a PAT.
However this is causing errors in Nutter as it is hardcoded to use the DATABRICKS_TOKEN, which we no longer use.

could you please add the option to use the Azure AD token?

Also, is the usage of DATABRICKS_HOST and DATABRICKS_TOKEN variables actually needed? Can't it use the databricks cli with the configuration I made in an earlier step in our devops pipeline?

Nutter sometimes report a client error 400, but the test notebook is still running

nutter sometimes fails with the following error while waiting for a test notebook to run:

--> Execution request: /Shared/Temp-CI/testing/test_MAIN
CRITICAL:NutterCLI:400 Client Error: Bad Request for url: https://adb-6447267702748866.6.azuredatabricks.net/api/2.0/jobs/runs/get-output?run_id=3052
 Response from server: 
 { 'error_code': 'INVALID_STATE',
  'message': 'Run result is empty. There may have been issues while saving or '
             'reading results.'}

However when I look at the state of the run mentioned in the error, it is in fact still running:

databricks runs get-output --run-id 3052
{
  "metadata": {
    "job_id": 3052,
    "run_id": 3052,
    "number_in_job": 1,
    "state": {
      "life_cycle_state": "RUNNING",
      "state_message": "In run"
    },
    "task": {
      "notebook_task": {
        "notebook_path": "/Shared/Temp-CI/testing/test_MAIN"
      }
    },
    "cluster_spec": {
      "existing_cluster_id": "0915-120444-foe167"
    },
    "cluster_instance": {
      "cluster_id": "0915-120444-foe167",
      "spark_context_id": "8879206558517982464"
    },
    "start_time": 1603279418051,
    "setup_duration": 3000,
    "execution_duration": 0,
    "cleanup_duration": 0,
    "creator_user_name": "[email protected]",
    "run_name": "dd2007c3-138f-11eb-a099-000d3adb5ca4",
    "run_page_url": "https://northeurope.azuredatabricks.net/?o=6447267702748866#job/3052/run/1",
    "run_type": "SUBMIT_RUN"
  },
  "notebook_output": {}
}

This does not happen consistently.

is it possible to generate code coverage with nutter ?

I am trying integrated UnitTests + Codecoverage and Sonar Cloud. I am looking at a way to generate code coverage reports.

When I tried to run code coverage. Using Code Coverage API like this

`
def run_tests():

cov = coverage.Coverage()
cov.start(source=)

TESTS!

cov.stop()
cov.save()
cov.html_report(directory='/dbfs/Users/Vinura/coverage_report')

`

The following happens. It runs the code coverage on cluster root. So is it possible to use nutter to generate code coverage reports?

MicrosoftTeams-image (5)

Unable to import NutterFixture from runtime.nutterfixture in Visual Studio Code

Visual Studio Code version - 1.45.1
Anaconda version - 1.9.12
Nutter version - 0.1.33
Issue details - When trying to import NutterFixture locally in VS Code (specifically using the line 'from runtime.nutterfixture import NutterFixture'), import fails with "ImportError DLL load failed: The file cannot be accessed by the system". When running the same file directly in my anaconda terminal, no error occurs and the file runs successfully.

Proposal: parallel test execution

This is a proposal to implement a new helper class that would allow executing tests in parallel from a test notebook.

The use case: we have a fairly large amount of tests to run (40) and each test takes about 3 minutes to run. We would like to run them in parallel. However, if we use the client-side parallelisation using the CLI, we need to create 40 notebooks (one per test), which is not very maintainable. It would be easier for us to have a single test notebook, that executes the tests in parallel on the cluster side.

Here is an idea of how the helper class could be used:

from runtime.runner import NutterRunner

all_tests = []

for d in test_data:
  if 'test1' in d:
    test = TestNotebookForTest1(d, other_params)
  elif 'test2' in d:
    test = TestNotebookForTest2(d, other_params)
  else:
    test = TestNotebookForOtherTest(d, other_params)
  all_tests.append(test)

# Run 8 tests in parallel
runner = NutterRunner(all_tests, 8)
all_results = runner.execute_tests()

print(all_results.to_string())

The signature for the constructor could look like this:

class NutterRunner(object):
    def __init__(self, tests, num_of_workers=1):
      # ...

Under the hood, the helper class would leverage the existing Scheduler class which already has an implementation of parallel workers.

Feedback welcome!

Parameterized test doesn't executes before_all and after_all for each interaction

If you create some unit-test using parameterized it doesn't executes before_all and after_all for each interaction (behavior expected in most common test frameworks). During my tests I could see it executes one time for each test and not for each interaction.

I couldn't find any docs saying it is compatible with parameterized library. Someone knows any option?

See the example below:

import uuid
from parameterized import parameterized
from runtime.nutterfixture import NutterFixture, tag

class TestParam(NutterFixture):      
    def before_all(self): 
        self.random_name = uuid.uuid4().hex
        print(f"started [{self.random_name}]")

    @parameterized.expand([ ("AAA"), ("BBB"), ("CCC")])
    def assertion_test(self, param1):
        print(f"processing [{param1}] with [{self.random_name}]")
        assert param1==param1
    
    def after_all(self): 
        print(f"finished [{self.random_name}]")

result = TestParam().execute_tests()
print(result.to_string())

Current results:

started [23c4c652b5364d44a0bcac132df51317]
processing [AAA] with [23c4c652b5364d44a0bcac132df51317]
processing [BBB] with [23c4c652b5364d44a0bcac132df51317]
processing [CCC] with [23c4c652b5364d44a0bcac132df51317]
finished [23c4c652b5364d44a0bcac132df51317]

Notebook: N/A - Lifecycle State: N/A, Result: N/A
Run Page URL: N/A
============================================================
PASSING TESTS
------------------------------------------------------------
test_0_AAA (1.4899997040629387e-05 seconds)
test_1_BBB (9.099996532313526e-06 seconds)
test_2_CCC (7.400005415547639e-06 seconds)
============================================================

Expected results:

started [23c4c652b5364d44a0bcac132df51317]
processing [AAA] with [23c4c652b5364d44a0bcac132df51317]
finished [23c4c652b5364d44a0bcac132df51317]

started [9999c652b5364d44a0bcac132df59999]
processing [BBB] with [9999c652b5364d44a0bcac132df59999]
finished [9999c652b5364d44a0bcac132df59999]

started [aaaac652b5364d44a0bcac132df5aaaa]
processing [CCC] with [aaaac652b5364d44a0bcac132df5aaaa]
finished [aaaac652b5364d44a0bcac132df5aaaa]

Notebook: N/A - Lifecycle State: N/A, Result: N/A
Run Page URL: N/A
============================================================
PASSING TESTS
------------------------------------------------------------
test_0_AAA (1.4899997040629387e-05 seconds)
test_1_BBB (9.099996532313526e-06 seconds)
test_2_CCC (7.400005415547639e-06 seconds)
============================================================

Is it possible to parametrise assertion names

Is it possible to insert variables within assertion function names? In the below, the function name will change based on the value of self.feature?

self.feature = 'abc'

  def assertion_{self.feature}_test(self):
      assert 1 == 1

Improve error message when the call to the parent class constructor is missing in a test fixture

Current behavior:
When a test fixture is created without calling the parent class initializer, e.g.:

class TestTransform(NutterFixture):
  def __init__(self):
    pass
  def run_testcase(self):
    pass
  def assertion_testcase(self):
    assert True

An attribute error is displayed when executing the test:

AttributeError: 'TestTransform' object has no attribute 'data_loader'
....

Expected:

It should return a custom exception with a better error message, e.g.:

InitializationError: If you have an __init__ method in your test class, please include a call to initialize the parent class. For example: NutterFixture.init(self)

Build results unexpectedly return no test cases returned

While running a build in Azure DevOps, nutter is occassionally returning a result of: "No test cases were returned." and a lengthy encrypted string as the notebook output. This generally can be resolved by rerunning the build, and seems to be inconsistent in its behavior. The notebook output in the build step is similar to the output if you execute a test locally without commenting out result.exit(dbutils), except it is roughly 5x the length.

I can not consistently reproduce this behavior, and cannot reproduce when running the test notebook locally. When running locally, the test cases within the notbook succeed without issue.

Must the run and assertion functions have the same name?

Looks like nutter only works when two functions are defined:

def run_test():
  pass

def assert_test():
  pass

When the two functions have different endings, the user gets an exception. However, it makes sense to run the notebook once and then test multiple assertions. Is this use case supposed?

Unable to use a SPN for auth running tests

Azure Databricks allows for authentication via SPN as well as PAT, but currently Nutter only supports the PAT token auth approach.

This recently blocked us using Nutter on an engagement with a customer because they wouldn't allow us to generate PAT tokens and insisted that we auth via SPN.

It would be really great if the Nutter cli could be configured use the standard SPN auth flow of the current user, instead of only allowing PAT

Adapt code so that it can also set access_list_permissions (which is something that is added in Jobs API 2.1)

Currently when you let the nutter run the tests, it starts jobs in databricks (using the 2.0 Jobs API). However, one of the big downsides of 2.0 vs 2.1 is that you cannot set permissions on your job in 2.0.

What will happen then often is that the nutter tests will run, they will show that an error has been made, but then only admins in databricks can actually see the results of the test in databricks (that is the default setting for jobs), others will see 'job not found'. Because we are generally running these tests in multiple environments, and not everyone can have admin access in all of those environments (we do not give admin access at all in production), this actually is a pretty big disadvantage.

If we could add access_list_permissions on the job as well, this would be mitigated. The underlying package databricks_api, which uses the underlying package databricks_cli, already allow for this.

nutter relevant code in common/apiclient.py

db = DatabricksAPI(host=config.host,
                           token=config.token
# NEW PROPOSED ADDITION -> add here jobs_api_version as an extra settings
)
        self.inner_dbclient = db

and

runid = self._retrier.execute(self.inner_dbclient.jobs.submit_run,
                                      run_name=name,
                                      existing_cluster_id=cluster_id,
                                      notebook_task=ntask,
# NEW PROPOSED ADDITION -> add access_list_permissions here
                                      )

databricks_api relevant code

class DatabricksAPI:
    def __init__(self, **kwargs):
        if "host" in kwargs:
            if not kwargs["host"].startswith("https://"):
                kwargs["host"] = "https://" + kwargs["host"]

        self.client = ApiClient(**kwargs)

databricks_cli sdk relevant code

class ApiClient(object):
    """
    A partial Python implementation of dbc rest api
    to be used by different versions of the client.
    """
    def __init__(self, user=None, password=None, host=None, token=None,
                 api_version=version.API_VERSION, default_headers={}, verify=True, command_name="", jobs_api_version=None):

And then nutter run needs to get an extra argument, these access_list_permissions, that are optional to be defined.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.