microsoft / nlu.devops Goto Github PK

View Code? Open in Web Editor NEW

36.0 11.0 24.0 1.05 MB

Tools supporting continuous integration and deployment for NLU services.

License: MIT License

C# 90.96% TypeScript 9.04%

nlu luis lex dotnet dotnet-cli continuous-integration continuous-deployment ci-cd ci cd

nlu.devops's Introduction

NLU.DevOps ·

Continuous integration and deployment of NLU models.

Getting Started
Contributing

Getting Started

Getting Started with the NLU.DevOps Azure DevOps extension

We have published an Azure DevOps extension that wraps the steps below into three pipeline tasks for training, testing and deleting your NLU model. To get started, install the NLU.DevOps extension to your Azure DevOps organization.

See the Azure DevOps extension overview for more details.

Detailed information for each Azure Pipelines task can be found in the docs folder:

Getting Started with the NLU.DevOps CLI

To install the NLU.DevOps CLI tool, run:

dotnet tool install -g dotnet-nlu

This will install the CLI tool to your default .NET Core tools path. See the documentation on the dotnet tool install command for more information on how to customize the installation directory or package source.

The CLI tool by default supports training and testing NLU models against LUIS and Lex.

Detailed information on the CLI tool sub-commands and arguments can be found in the docs folder:

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

For more information on reporting potential security vulnerabilities, see the Security overview.

nlu.devops's People

Contributors

Stargazers

Watchers

nlu.devops's Issues

'None' or null intents should be reported as TrueNegative

Currently, if the expected intent is 'None' or null, and the actual intent is 'None' or null, the result is reported as TruePositive. It should likely be reported as TrueNegative.

Add unit tests for NLU.DevOps.ModelPerformance.Tests

The current tests only evaluate the happy path, we need tests assert failure cases are produced.

Add .json extensions to the compare pipeline attachments

Currently, the artifacts are named statistics and metadata. We should add a .json extension.

Reduce size of statistics JSON file for compare command

Currently, each statistics.json file output from the compare command uses a JSON object for confusion matrix results. We can reduce the verbosity / size of the statistics.json by using arrays.

Add roles support to LUIS v2

We should ensure that values with roles in LUIS v2 are picked up as expected.

Add `--include` option to AzDO extension for NLUTrain/Test/Clean

We have a CLI option to include the build path to a NLU.DevOps CLI extension, we should expose this option via an input to NLUTrain/Test/Clean.

Change NLU.DevOps.ModelPerformance to generic JSON compare

Change NLU.DevOps.ModelPerformance to a configurable JSON compare library such that any two JSON values can be compared.

We could have a simple interface like IConfusionMatrixEvaluator:

public interface IConfusionMatrixEvaluator
{
    IEnumerable<TestCase> Evaluate(JToken expected, JToken actual);
}

E.g., for NLU today, we would create the following config:

{
  "intent": "default",
  "text": "string-no-punctuation",
  "entities": "NLU.DevOps.ModelPerformance.Comparers.EntitiesEvaluator"
}

Where EntitiesEvaluator is an implementation of IConfusionMatrixEvaluator and string-no-punctuation and default is syntactic sugar for other evaluator implementations.

We may want to have an option that allows you to evaluate all JSON properties in the expected and actual JSON values (not just the configured properties), in which case any unconfigured property value would just use the default evaluator.

Parameterize CLI on degree of parallelism for test runner

Currently, the degree of parallelism is hard-coded, but we should make it an optional CLI param that defaults to 1.

Fix assembly resolution for Microsoft.CognitiveServices.Speech

Since changing NLU.DevOps.CommandLine to load plugins dynamically, the Speech SDK is broken. The expectation for the Speech SDK is to be in the same directory as the running application, but it is now placed in a subfolder.

Add support for complex entity matching

Currently, we're only matching simple entities that have at most the recognized string in the text and a flat semantic string value. We'll want to add support for composite entities that are made up of groups of tokens in an utterance, or even entities that map to more complicated semantic values (for instance, dates).

Add utterance ID to each 'compare' test case for aggregation purposes

We currently generate M+1 test cases for an utterance + intent + entities comparison, where M is the number of entities we expect to recognize. We may want to also add a holistic test case where we say "Did we match the intent and all entities".

Add .nuspec so we can easily create package with PluginLoader

Ideally, we'd leverage McMaster.DotNetCore.Plugins to resolve the NLU providers rather than the custom solution we have now. The approach works, but when we include the provider assemblies with the approach outlined here, the generated .nupkg does not include the providers subfolder in the NLU.DevOps.CommandLine output.

The way to work around this is likely to provide a .nuspec that we can use to ensure the providers folder is included.

Include mechanism for passing through extra data in labeled utterances for training

In order to produce a multi-intent example using the NLU.DevOps.MockProvider library, we need a mechanism for passing through additional training data (e.g., multi-intent labels).

Remove direct require of `fs` from `NLUTest`

We changed import * as fs from "fs" to const fs = require("fs") in NLUTest to get ts-mock-imports working correctly. We can fix it.

Add support for LUIS batch test format

LUIS batch test format is slightly different than the generic utterance model. Namely:

{
    "intent": "<intent label>",
    "text": "<utterance text>",
    "entities": [
        {
            "entity": "<entity match text>",
            "startPos": <start pos>,
            "endPos": <end pos>
        }
    ]
}

So, "matchText" is replaced by "entity" and "matchIndex" is replaced by "startPos" and "endPos".

Set default visibility to private for AzDO extension

Set public: false on the VSS extension manifest. We switch it to public / true in the release pipeline.

Add unit tests for NLU.DevOps.CommandLine

We are sparse on unit tests for the logic in the console app, including behaviors for composition resolution, etc.

Update to the latest LUIS SDK for .NET

The LUIS SDK has been updated, and now also points to the V3 API. We should determine whether we need to keep a V2 option or if we should just deprecate / remove all together.

CLI option '--include' should support relative paths

Currently '--include' only works if you specify a full path. It should be a trivial change to have it support relative paths.

Investigate multi-intent support for NLU.DevOps

Some NLU providers may support multiple intents in a single utterance. E.g., "Book a flight to Paris and find me a hotel room". We may want to consider supporting these types of NLU providers. First step is to investigate what multi-intent providers are out there, and what changes we would need to make in order to support this.

Add scenario driven documentation

Current documentation covers variable setting and CLI usage, but scenario driven documentation would be more useful. E.g., adding a doc that talks about when it's worth using the generic utterance model for training, or adding a doc that talks about configuring pipelines to detect regressions.

Add example of entity value recognition to E2E tests with mock service.

Currently, the compare command E2E tests have no examples of entity value recognition, we should include an entity value comparison in the results.

Generalize AzDO web extension to visualize arbitrary confusion matrices

The NLU Results tab we are building to visualize confusion matrix results for NLU may be useful for a number of information retrieval problems. We should aim to generalize the web extension so it can be easily used for any IR data.

Add support for loading models from LUDown

LUDown is a concise format for representing NLU models. We should consider adding an option to load NLU models from LUDown, as well as an option to define test sets in LUDown.

ResourceExhausted StatusCode when running a large test set on dialogflow

Dialogflow has a 180 Standard Edition text query operations quota limit per minute
Run a large test set (~300 test utterances)
Error:
Unhandled Exception: System.AggregateException: One or more errors occurred. (Status(StatusCode=ResourceExhausted, Detail="Quota exceeded for quota metric 'dialogflow.googleapis.com/StandardTierTextQueryOperations' and limit 'StandardTierTextQueryOperationsPerMinutePerProject' of service 'dialogflow.googleapis.com' for consumer 'project_number:281834563702'.")) ---> Grpc.Core.RpcException: Status(StatusCode=ResourceExhausted, Detail="Quota exceeded for quota metric 'dialogflow.googleapis.com/StandardTierTextQueryOperations' and limit 'StandardTierTextQueryOperationsPerMinutePerProject' of service 'dialogflow.googleapis.com' for consumer 'project_number:281834563702'.")
at Google.Api.Gax.Grpc.ApiCallRetryExtensions.<>c__DisplayClass0_0`2.<b__0>d.MoveNext()

Allow `--service-settings` to take a raw LUIS JSON app definition

Currently, you need to wrap the LUIS JSON in a top-level property called appTemplate in the service settings. We should allow users to just use a vanilla LUIS JSON app definition. One option could be to check for a top-level property like luis_schema_version, and if it's there, assume we're just loading from a vanilla LUIS JSON file.

Investigate performance of 'compare' command for O(100K) utterances

~100K utterances seems to be taking ~30 seconds for dotnet nlu compare -e tests100k.json -a tests100k.json -m. We should analyze the perf and see if there are any obvious ways to speedup the comparison.

Add documentation on alternatives for training models

You can use other tools for importing NLU models to LUIS or Lex, and only use NLU.DevOps to test your model. We should add a doc that discusses how to do this (e.g., using any LUIS or Lex CLI) and potentially include a sample Azure Pipelines YAML.

Test command does not return highest scoring intent in results

Tests run against a model fail because the tool reports the wrong intent has been returned ('AddItem' rather than 'None'). Running the utterance manually in the LUIS single test window shows that the correct intent ('None') is being returned from LUIS.
Tool maybe selects the first intent in alphabetical order, rather than the highest scoring intent?

Setup Azure DevOps release pipeline to publish NuGet packages

We generate the .nupkg files for dotnet-nlu, NLU.DevOps.Models, and NLU.DevOps.Core in the NLU.DevOps CI build. We should set up a release pipeline to stage those artifacts for publishing to NuGet.

Add documentation for how multi-intent scenarios can be evaluated using NLU.DevOps

While we don't currently support any NLU providers that include multi-intent, we should document how multi-intent providers could be testing using the existing tools.

Add documentation to compare command regarding new options

Include -t and -m

Add input to customize tool path for NLU[Train|Test|Clean]

Add an input so we can customize the location of the tool path.

Add test provider to this repo to avoid using LUIS for CI

We should definitely have a CI for each of our providers, but we are currently running dummy tests on LUIS for unrelated code changes (e.g., changes to the NLU extension). We should publish the demo provider from https://github.com/rozele/dotnet-nlu-demo into this repo, and include it in the set of NuGet packages published. Once we do that, we should update some of the CI pipelines to use the "demo" provider instead of LUIS.

Set up Azure DevOps release pipeline for Azure DevOps extension

We should automate the process of releasing the Azure DevOps extension, with approval for the public release, and on every CI build for a test environment.

Support CRIS speech models in addition to standard speech reco for LUIS

We should also support CRIS in addition to the standard CogServices speech reco.

Add configuration class that manages the LUIS app settings

There are a lot of LUIS app settings, and currently there is some spaghetti code in the LuisNLUServiceFactory that extracts app name, version ID, keys, etc. and the configuration validation logic is split between the factory, the NLU service, and the LUIS client wrapper. We should at least put all the logic into a LuisConfiguration class and ideally put all the validation in a single place (with methods to validate for training, testing, and speech)

Add word error rate to text statistics

We should add statistics for word error rates for intents and entity types in the statistics.json output.

Add support for query data (e.g., LUIS dynamic lists)

Currently, only string queries are supported. It would be more interesting if each service could define it's own query capabilities. Thinking the interface might change to something like...

interface INLUService {
    LabeledUtterance Test(IQuery query)
}

Show LUIS response error messages when exception in LUIS Client occurs.

Show LUIS response error messages when exception in LUIS Client occurs. It is hard to debug right now when using in Azure DevOps pipeline tasks.

Add Azure DevOps Task to install NLU.DevOps extension

Add a task called NLUInstallExtension that will install a .NET Core CLI tool package in the correct location for integration with NLU.DevOps.

Move NLU pipeline to Windows Build Agent to work around bug

There is a bug with downloading artifacts on a Mac/Linux build agent, which prevents us from running the NLU pipeline on one of those agents. There is nothing specific in the nlu.yml pipeline that requires *nix, so we should move it over to Windows.

Setup Azure Pipelines build

Set up Azure Pipelines to build, test, run code coverage and pack the project for NuGet.

Split training and testing interfaces in INLUService

The NLU.DevOps tool is useful for strictly testing purposes. It should be trivial to implement a testing interface without having to worry about training (or throwing NotImplementedException).

TooManyRequests error while pooling for training status

Training jobs not complete. Polling again.
trce: NLU.DevOps.Luis.LuisNLUTrainClient[0]
Training jobs not complete. Polling again.
trce: NLU.DevOps.Luis.LuisNLUTrainClient[0]
Training jobs not complete. Polling again.
trce: NLU.DevOps.Luis.LuisNLUTrainClient[0]
Training jobs not complete. Polling again.
trce: NLU.DevOps.Luis.LuisNLUTrainClient[0]
Training jobs not complete. Polling again.
trce: NLU.DevOps.Luis.LuisNLUTrainClient[0]
Training jobs not complete. Polling again.

Unhandled Exception: System.AggregateException: One or more errors occurred. (Operation returned an invalid status code 'TooManyRequests') ---> Microsoft.Azure.CognitiveServices.Language.LUIS.Authoring.Models.ErrorResponseException: Operation returned an invalid status code 'TooManyRequests'

Add unit tests for LUIS batch JSON parser and fix "entity" bug

LUIS batch JSON refers to the entity type with "entity" instead of "entityType". We need to fix this, and add unit tests.

Add end-to-end testing for Bot Framework with NLU.DevOps

With the upcoming changes to NLU.DevOps.ModelPerformace, we'll be able to support arbitrary testing of JSON results from NLU. We should add a providers, NLU.DevOps.BotService, that supports both direct line via BotFramework and direct line speech via the Cognitive Service Speech SDK. The scenario would be as follows, consider the following tests:

[
  {
      "text": "add two avocados to my cart",
      "conversationID": 1,
      "userID": 1,
      "attachments": [
        [
           {
              "name": "avocado",
              "quantity": 2
           }
        ]
      ]
  },
  {
      "text": "remove one avocado and add two dozen eggs",
      "conversationID": 1,
      "userID": 1,
      "attachments": [
        [
           {
              "name": "avocado",
              "quantity": 1
           },
           {
              "name": "12 eggs",
              "quantity": "2"
           }
        ]
      ]
  }
]

In this case, we will set up the direct line channel to the bot and send two user queries "add two avocados to my cart" and "remove one avocado and add two dozen eggs".

The output may not contain any LUIS or other NLU provider specific info, but instead focuses on the end state contained in the activity.

Pretty print JSON in statistics.json attachment

We should use tab formatting in statistics.json when we stringify it in the NLUTest task.

Should we double report a FalsePositive and FalseNegative when an entity type is not correct?

Currently, when an entity is matched correctly, but the wrong type is identified, we report a FalseNegative for the missing correct entity type, and a FalsePositive for the incorrectly identified identity type. Perhaps in this specific case, we should only report a FalsePositive.

Include confidence scores in results output

Currently, dotnet nlu test outputs the top scoring intent with no confidence score. The same goes for recognized entities. For better analysis, it may be worthwhile to output the confidence scores.