microsoft / dstoolkit-mlops-base Goto Github PK

View Code? Open in Web Editor NEW

89.0 23.0 39.0 1.72 MB

Support ML teams to accelerate their model deployment to production leveraging Azure

License: MIT License

Jupyter Notebook 25.04% Python 74.96%

mlops azure-machine-learning dstoolkit

dstoolkit-mlops-base's People

Stargazers

Watchers

Forkers

kyoro1 dhinagaran-s rdzotz deepakas farah-saab ljq2021ms malikamalik mariamedp yvonnebarthp eserinanarslan giorgiococci michelole hajjim markverheulmicrosoft povelfk datanateb dalesayers mvbugge joanassantos adriangonzalezsanchez midwilshire adrianfz test-mass-forker-org-1 marcoabcardoso tectonia tuscar2001 gstw7 numericx lucacaivano gallogiulia snikjou jackman337 jueun0819 yj90 nina101 benalexkeen ak-dwivedi pavankumar-tanikella sathishcyberintelsys

dstoolkit-mlops-base's Issues

Remove the temporary fixed for Azure CLI 2.30.0

Work-around for auto-testing after PR

I put my PR today, and auto-testing failed, because we cannot use ACI(Azure Container Instances) in a region.
Can we have any work-around for the above like re-try the auto-testing?

FYI, I didn't change any python codes in the PR, and the processing flows must not be changed.

Add instructions on how to use self-hosted agents in Azure DevOps

The template currently uses Microsoft-hosted agents to run pipelines in Azure DevOps, which is the simplest way to run the jobs and very useful to set up a quick MLOps demo/showcase. However, our customer usually has some specific requirements, for example, security configuration, dependent software needed, etc.; and it can be easier with self-hosted as it gives us more control. Also, the private agent has performance advantages, for example, the ability to run incremental builds, start a job faster, etc.

The documentation needs to have instructions on how to set up a self-hosted agent and how to modify the template pipelines to use it.

Create Github Action Pipelines

We want to extend the devops pipeline to integrate github actions and infrastructure-as-code with terraform scripts. The resulting devops repo may follow this structure:

devops-pipelines

.ado
- pipeline-0 IaC with ARM templates (current)
- pipeline-1
- ....
.github
- pipeline-0 IaC with Terraform templates
- pipeline-1
- ....

Create CONTIBUTING file

We should have a CONTRIBUTING.md file to explain how to contribute to the repo, and remove the section from the README.
Example: https://github.com/microsoft/solution-accelerator-many-models/blob/master/CONTRIBUTING.md

It'd be nice to have detailed instructions (or links to official instructions, depending on the case) on how to:

Fork the repo or create a branch
Submit a bug
Create a pull request
Set up a test environment and pipelines so contributors can make sure everything works before submitting the PR.

To be adapted from the Contribution Guide that @FlorianPydde created in the internal wiki.

The latest version of az cli (2.30.0) break while running az commands

azure-cli 2.30.0 throws ERROR: {'Error': TypeError("init() got an unexpected keyword argument 'async_persist'",)} while running the pipeline: dstoolkit-mlops-base/invoke-aml-pipeline.template.yml at main · microsoft/dstoolkit-mlops-base (github.com)

As a workaround, we found that the az cli 2.30.0 will install azure-cli-ml version 1.5.0 as default
{
"experimental": false,
"extensionType": "whl",
"name": "azure-cli-ml",
"path": "/opt/az/azcliextensions/azure-cli-ml",
"preview": false,
"version": "1.5.0"
}
We need to downgrade the azure-cli 2.29.2 and the azure-cli-ml version 1.33.1 is installed correctly.

Deploy AML webservices and AML pipelines using deployment jobs

Deployment jobs in YAML pipelines, to be able to manage environments, define deployment strategies, etc.

Add extra environment version with custom Docker image

Add an example of environment with custom Dockerfile as another folder inside configuration/environments/.

People will then just need to change the AML_TRAINING_ENV_PATH / AML_BATCHINFERENCE_ENV_PATH in configuration/configuration-aml.variables.yml to this new path to use a custom Docker image for the AML pipelines. We should add instructions on how to do this and how to configure the environment (links to the official AML docs?) in docs/how-to.

Fix pipeline trigger

Pipelines are triggered when changes are made to documentation which should be disregarded

Artifact `DeploymentCode` already exists when running modeling pipeline

Pipeline fails because the artifacts in the three deployment stages are called the same

Model deployment with Docker image

The template currently relies on the azureml SDK to natively deploy the model as a real-time webservice in a selected compute, using Model.deploy. A common request from client is to provide a Docker file that a production team can deploy with a higher degree of flexibility (pod security, management, etc).

The template needs to implement a second scenario which leverages the Model.package functionality to create a Docker imagefile.

It would be nice to have a parameter in the deploy-model YAML template to choose which type of deployment the user wants:

Native deployment: the pipeline deploys the model in a webservice using AML, runs a smoke test, etc. (current behavior).
Docker image: the pipeline generates an artifact with this packaged model instead of deploying it as a webservice.

After the package has been created, a kubectl command may connect to the targeted AKS and run the docker image

Data preparation step in training pipeline

Add data prep as initial step in the training pipeline, where all feature engineering and train-test split work will be done. Providing by default the train and holdout test datasets will enforce good practices and avoid data leakage, thus accelerating the model performance analysis and reporting.

Train sub-dataset should be redirected to the train step (2nd step in the pipeline), and test sub-dataset should be redirected to the evaluation step (3rd in the pipeline). As a result, evaluation step should be modified to include the generation of evaluation metrics, while comparison with the current active model should be done later (as a part of the register step? or include a compare step in between?).

The train step can still have its own data splitting mechanism inside, to do any type of cross-validation needed to select the best model from all the approaches tested out.

Not found `SetupCICD.md` page

When I click the red rectangle part, it points the following URL

https://github.com/microsoft/dstoolkit-mlops-base/blob/main/docs/how-to/SetupCICD.md

But, we cannot find such page:

Support for local prediction

Currently, score.py is the only "src" file that has no main method and thus cannot be easily run locally.

It would be nice to to have that to ease testing during development.

Create AML pipeline endpoints when publishing pipelines

Pipeline endpoints give a fixed REST endpoint while ML pipelines are versioned and updated underneath.
Useful for invoking them from Azure Data Factory for example.

Configure the minimum TLS version for a storage account

Azure Storage is being created with TLS 1.0, the recommendation is to create storage with TLS 1.2
Key Vault is being created with public access, the recommendation is to create with private access

Common functionalities wrapping

As you see our github repositories, there are some similar functionalities like

https://github.com/microsoft/dstoolkit-classification-solution-accelerator/blob/main/src/utils.py
and
https://github.com/microsoft/dstoolkit-mlops-base/blob/main/src/utils.py

They define common functions like generating Workspace, getting Datasets etc.

The proposition is to come up with a pypi package that utilizes common functionalities (see attached ppt)
common_function_dstoolkit.pptx

Use MLflow as logging system

MLflow is becoming the most common model management library. Enabling it by default in the template is a requirement as the new AML SDK version will rely on it more heavily.

Changes needed:

Change AML run.log() methods by mlflow equivalent.
Use mlflow to log models as well.
Add examples on how to log images in experiment.

Instructions in AML docs: https://docs.microsoft.com/azure/machine-learning/how-to-use-mlflow

Refactor Compute Configuration files

Feature Request - Provide High Level Deployment method to higher environments

Hi,

I've used this for demoing to my customer and I think it would great to show how the azure-pipelines can be used to deploy to higher environments using the recommended approach of "compile once promote everywhere" off of the main branch.

As I am new to ML Ops, I'm not sure the recommended approach for deploying to higher environments

Should the training be part of the "compile once" continuous integration/build phase

dstoolkit-mlops-base/azure-pipelines/PIPELINE-1-modeling.yml

Line 76 in 322f451

- stage: training

and these pieces

dstoolkit-mlops-base/azure-pipelines/PIPELINE-1-modeling.yml

Line 104 in 322f451

#########################################

be part of the continuous deployment/promote everywhere

The high level of what I'm trying to understand is how the batch inference and training pipeline should fit into this flow

Common service connection issue forinvoke pipeline task

In the pipeline azure-pipelines/templates/utils/invoke-aml-pipeline.template.yml

- task: ms-air-aiagility.vss-services-azureml.azureml-restApi-task.MLPublishedPipelineRestAPITask@0

Reading online, and from my own experiences, the task will not run unless given a specific machine learning workspace service principal connection in DevOps. A typical service principal connection will not suffice.

No error message is available, hence this issue needs to be documented

Handle Multiple Models in deployment

When training multiple models, the ado pipelines should be able to deploy all trained models into other environments.

The change need to be applied to:

Training and registering script: should be able to save and register multiple models
Deployment scripts: multiple models should be deployed to the next stage (view open branch: https://github.com/microsoft/dstoolkit-mlops-base/blob/feature/62-many-model-deployment/operation/execution/deploy_model.py)
Scoring scripts: should be able to link to register models
Configuration file: should be able to contain a list for aml_model variable (see example AML_models: https://github.com/microsoft/dstoolkit-mlops-base/blob/feature/62-many-model-deployment/configuration/configuration-aml.variables.yml)
pipeline files: should be able to accept lists as arguments

Artefact migration between AML workspaces

Currently the template reruns the scripts in different environments. Although it ensures that automate retraining process works, this functionality should be defined as an integration test on a sample set rather than a mean of promoting artefacts. The template needs to implement a process that download and reuploads artefacts to the next AML workspace. This will lower cost and time to production.

Duplicated utils code

Some of the src/utils.py code seems to be duplicated inside the operations folder:

This is a bit confusing when deploying the template.

Upgrade to latest version of azureml-sdk

Upgrade SDK version to latest v1 version and make sure everything still runs.
As of today, latest version is 1.49.0, released on Feb 14, 2023: https://pypi.org/project/azureml-sdk/

Integrate many models with MLOPS pipeline

It will be good to integrate many models with mlops. Maybe create a folder in notebooks and src directory for different type of ml usecases ? https://github.com/microsoft/solution-accelerator-many-models. Folder name can be basic, manymodels etc .

Unit testing for AMLcomponents (test connection to workspace,)

Testing AML component before pushing new changes will improve infrastructure stability monitoring and accelerate delivery.

Change default auth method from pipelines to Service Principal

Pipeline execution will trigger AML InteractiveLoginAuthentication by default once we switch to Azure CLI 2.30 (see comment here). In preparation to that, we should change workspace authentication to use ServicePrincipalAuthentication as default method when run from pipelines (right now it's using CLI credentials).

microsoft / dstoolkit-mlops-base Goto Github PK

dstoolkit-mlops-base's People

Stargazers

Watchers

Forkers

dstoolkit-mlops-base's Issues

Recommend Projects

Recommend Topics

Recommend Org