Code Monkey home page Code Monkey logo

aml-run's Introduction

Integration Test Lint and Test

GitHub Action for training Machine Learning Models using Azure

Deprecation notice

This Action is deprecated. Instead, consider using the CLI (v2) to manage and interact with Azure Machine Learning jobs (runs) in GitHub Actions.

Important: The CLI (v2) is not recommended for production use while in preview.

Usage

The Azure Machine Learning training action will help you train your models on Azure Machine Learning using GitHub Actions.

Get started today with a free Azure account!

This repository contains a GitHub Action for training machine learning models using Azure Machine Learning in a few different ways, each with different capabilities. To submit a training run, you have to define your python file(s) that should run remotely as well as a config file corresponding to one of the supported methods of training

Dependencies on other GitHub Actions

  • Checkout Checkout your Git repository content into GitHub Actions agent.
  • aml-workspace This action requires an Azure Machine Learning workspace to be present. You can either create a new one or re-use an existing one using the action.
  • aml-compute You can use this action to create a new traininig environment if your workspace doesnt have one already.

Utilize GitHub Actions and Azure Machine Learning to train and deploy a machine learning model

This action is one in a series of actions that can be used to setup an ML Ops process. We suggest getting started with one of our template repositories, which will allow you to create an ML Ops process in less than 5 minutes.

  1. Simple template repository: ml-template-azure

    Go to this template and follow the getting started guide to setup an ML Ops process within minutes and learn how to use the Azure Machine Learning GitHub Actions in combination. This template demonstrates a very simple process for training and deploying machine learning models.

  2. Advanced template repository: aml-template

    This template demonstrates how the actions can be extended to include the normal pull request approval process and how training and deployment workflows can be split. More enhancements will be added to this template in the future to make it more enterprise ready.

Example workflow for training Machine Learning Models using Azure

name: My Workflow
on: [push, pull_request]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
    - name: Check Out Repository
      id: checkout_repository
      uses: actions/checkout@v2

    # AML Workspace Action
    - uses: Azure/aml-workspace@v1
      id: aml_workspace
      with:
        azure_credentials: ${{ secrets.AZURE_CREDENTIALS }}

    # AML Run Action
    - uses: Azure/aml-run@v1
      id: aml_run
      with:
        # required inputs as secrets
        azure_credentials: ${{ secrets.AZURE_CREDENTIALS }}
        # optional
        parameters_file: "run.json"

Inputs

Input Required Default Description
azure_credentials x - Output of az ad sp create-for-rbac --name <your-sp-name> --role contributor --scopes /subscriptions/<your-subscriptionId>/resourceGroups/<your-rg> --sdk-auth. This should be stored in your secrets
parameters_file "run.json" We expect a JSON file in the .cloud/.azure folder in root of your repository specifying details of your Azure Machine Learning Run. If you have want to provide these details in a file other than "run.json" you need to provide this input in the action.

azure_credentials (Azure Credentials)

Azure credentials are required to connect to your Azure Machine Learning Workspace. These may have been created for an action you are already using in your repository, if so, you can skip the steps below.

Install the Azure CLI on your computer or use the Cloud CLI and execute the following command to generate the required credentials:

# Replace {service-principal-name}, {subscription-id} and {resource-group} with your Azure subscription id and resource group name and any name for your service principle
az ad sp create-for-rbac --name {service-principal-name} \
                         --role contributor \
                         --scopes /subscriptions/{subscription-id}/resourceGroups/{resource-group} \
                         --sdk-auth

This will generate the following JSON output:

{
  "clientId": "<GUID>",
  "clientSecret": "<GUID>",
  "subscriptionId": "<GUID>",
  "tenantId": "<GUID>",
  (...)
}

Add this JSON output as a secret with the name AZURE_CREDENTIALS in your GitHub repository.

parameters_file (Parameter File)

The action tries to load a JSON file in the .cloud/.azure folder in your repository, which specifies details of your Azure Machine Learning Run. By default, the action is looking for a file with the name "run.json". If your JSON file has a different name, you can specify it with this parameter. Note that none of these values are required and in the absence, defaults will be created with a combination of the repo name and branch name.

A sample file can be found in this repository in the folder .cloud/.azure. The JSON file can include the following parameters:

Parameter Name Required Allowed Values Default Description
experiment_name str <REPOSITORY_NAME>-<BRANCH_NAME> The name of your experiment in AML, which must be 3-36 characters, start with a letter or a number, and can only contain letters, numbers, underscores, and dashes.
tags dict: {"": "", ...} null Tags to be added to the submitted run.
wait_for_completion bool true Indicates whether the action will wait for completion of the run
download_artifacts bool false Indicates whether the created artifacts and logs from runs, pipelines and steps will be downloaded to your GitHub workspace. This only works if wait_for_completion is set to true.
pipeline_publish bool false Indicates whether the action will publish the pipeline after submitting it to Azure Machine Learning. This only works if you submitted a pipeline.
pipeline_name str <REPOSITORY_NAME>-<BRANCH_NAME> The name of the published pipeline. This only works if you submitted a pipeline.
pipeline_version str null The version of the published pipeline. This only works if you submitted a pipeline.
pipeline_continue_on_step_failure bool false Indicates whether the published pipeline will continue execution of other steps in the PipelineRun if a step fails. This only works if you submitted a pipeline.
Inputs specific to method of training
Parameter Name Required Allowed Values Default Description
runconfig_python_file str "code/train/run_config.py" Path to the python script in your repository in which you define your run and return an Estimator, Pipeline, AutoMLConfig or ScriptRunConfig object.
runconfig_python_function_name str "main" The name of the function in your python script in your repository in which you define your run and return an Estimator, Pipeline, AutoMLConfig or ScriptRunConfig object. The function gets the workspace object passed as an argument.
  • Using a runconfig YAML file (default "code/train/run_config.yml"), which describes your Azure Machine Learning Script Run that you want to submit. You can change the default value with the runconfig_yaml_file parameter.
Parameter Name Required Allowed Values Default Description
runconfig_yaml_file str "code/train/run_config.yml" The name of your runconfig YAML file.
  • Using a Pipeline YAML file (default "code/train/pipeline.yml"), which describes your Azure Machine Learning Pipeline that you want to submit. You can change the default value with the pipeline_yaml_file parameter.
Parameter Name Required Allowed Values Default Description
pipeline_yaml_file str "code/train/pipeline.yml" The name of your pipeline YAML file.

Outputs

Output Description
experiment_name Name of the experiment of the run
run_id ID of the run
run_url URL to the run in the Azure Machine Learning Studio
run_metrics Metrics of the run (will only be provided if wait_for_completion is set to True)
run_metrics_markdown Metrics of the run formatted as markdown table (will only be provided if wait_for_completion is set to True)
published_pipeline_id Id of the published pipeline (will only be provided if you submitted a pipeline and pipeline_publish is set to True)
published_pipeline_status Status of the published pipeline (will only be provided if you submitted a pipeline and pipeline_publish is set to True)
published_pipeline_endpoint Endpoint of the published pipeline (will only be provided if you submitted a pipeline and pipeline_publish is set to True)
artifact_path Path of downloaded artifacts and logs from Azure Machine Learning (pipeline) run (will only be provided if wait_for_completion and download_artifacts is set to True)

Other Azure Machine Learning Actions

  • aml-workspace - Connects to or creates a new workspace
  • aml-compute - Connects to or creates a new compute target in Azure Machine Learning
  • aml-run - Submits a ScriptRun, an Estimator or a Pipeline to Azure Machine Learning
  • aml-registermodel - Registers a model to Azure Machine Learning
  • aml-deploy - Deploys a model and creates an endpoint for the model

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

aml-run's People

Contributors

ashishonce avatar awmatheson avatar lostmygithubaccount avatar marvinbuss avatar microsoftopensource avatar pulkitaggarwl avatar revodavid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aml-run's Issues

no proper error if training file path is wrong

if user gives wrong "runconfig_python_file", or the path is wrong , the code throws below error

image

this error is not explicit why issue has come and looks like the code inherently threw an error. while actually there was in exception during module import execution

[Brainstorming] - Roadmap

Feature brainstorming for GH integration

  • Pass model back to artifact in actions
  • Make a release flag so that the model will be made into a release automatically.
  • Create a docker image of the model

error while submitting an AML Pipeline

Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 271, in attempt_get_deps
blob_deps_to_file()
File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 263, in blob_deps_to_file
blob = request.urlopen(deps_url, context=ssl_context)
File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/code/main.py", line 240, in
main()
File "/code/main.py", line 187, in main
run.wait_for_completion(show_output=True)
File "/usr/local/lib/python3.8/site-packages/azureml/pipeline/core/run.py", line 294, in wait_for_completion
step_run.wait_for_completion(timeout_seconds=timeout_seconds - time_elapsed,
File "/usr/local/lib/python3.8/site-packages/azureml/pipeline/core/run.py", line 736, in wait_for_completion
return self._stream_run_output(timeout_seconds=timeout_seconds,
File "/usr/local/lib/python3.8/site-packages/azureml/pipeline/core/run.py", line 827, in _stream_run_output
print(final_details)
File "/usr/local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 129, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 766, in repr
steps = self._dataflow._get_steps()
File "/usr/local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 129, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 218, in _dataflow
dataprep().api._datastore_helper._set_auth_type(self._registration.workspace)
File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py", line 185, in _set_auth_type
get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(auth_type, json.dumps(auth_value)))
File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 19, in get_engine_api
_engine_api = EngineAPI()
File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 110, in init
self._message_channel = launch_engine()
File "/usr/local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/engine.py", line 333, in launch_engine
dependencies_path = runtime.ensure_dependencies()
File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 285, in ensure_dependencies
if not attempt_get_deps():
File "/usr/local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 279, in attempt_get_deps
raise NotImplementedError(err_msg + '\n' + _unsupported_help_msg)
NotImplementedError: Linux distribution debian 11. does not have automatic support.
.NET Core 2.1 can still be used via dotnetcore2 if the required dependencies are installed.
Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions.
Follow your distro specific instructions to install dotnet-runtime-* and replace * with 2.1.

How to get run.log() variable values in yaml file?

I am using github workflows and aml-run action, I am setting a custom variables using run.log("my_age", 10) and want to read value of this variable on .yml file for next step. How can I achieve it? I see {{outputs.run_metrics}} gives me this custom value but I am unable to pick it.

Convert to Markdown cannot handle dicts

image

If someone submits a pipeline to Azure Machine Learning, the convert_to_markdown function can receive nested dicts. The function cannot handle this and fails. This needs to be fixed.

Sample dict that needs to be parsed:
{'HD_9cbf07dc-db35-4700-adc3-92253a378992': {'best_child_by_primary_metric': {'metric_name': ['mse',
'mse',
'mse'],
'timestamp': ['2020-03-30 07:58:40.110108+00:00',
'2020-03-30 07:59:42.417631+00:00',
'2020-03-30 07:59:42.417631+00:00'],
'run_id': ['HD_9cbf07dc-db35-4700-adc3-92253a378992_2',
'HD_9cbf07dc-db35-4700-adc3-92253a378992_1',
'HD_9cbf07dc-db35-4700-adc3-92253a378992_1'],
'metric_value': [0.4896792910300023,
0.33284572706891424,
0.33284572706891424],
'final': [False, False, True]}},
'HD_9cbf07dc-db35-4700-adc3-92253a378992_0': {'AUC': 0.6562945178135493,
'RMSE': 0.05120842417634075,
'pAUC': 0.2812047761829226,
'mse': 0.9367719704449606,
'TimeSeries comparison': 'aml://artifactId/ExperimentRun/dcid.HD_9cbf07dc-db35-4700-adc3-92253a378992_0/TimeSeries comparison_1585555145.png'},
'HD_9cbf07dc-db35-4700-adc3-92253a378992_1': {'AUC': 0.3573103979675637,
'pAUC': 0.5616569664904285,
'RMSE': 0.6161010679565609,
'mse': 0.33284572706891424,
'TimeSeries comparison': 'aml://artifactId/ExperimentRun/dcid.HD_9cbf07dc-db35-4700-adc3-92253a378992_1/TimeSeries comparison_1585555146.png'},
'HD_9cbf07dc-db35-4700-adc3-92253a378992_2': {'AUC': 0.1200601984180878,
'pAUC': 0.5805687333099157,
'RMSE': 0.09691275977364577,
'mse': 0.4896792910300023,
'TimeSeries comparison': 'aml://artifactId/ExperimentRun/dcid.HD_9cbf07dc-db35-4700-adc3-92253a378992_2/TimeSeries comparison_1585555076.png'}}

Get environment variables from GitHub workflow into python script running in AzureML

I have two different pipelines in AzureML that I would like to trigger if any files related to them is changed in my repository. If there's only been changes to the files associated with one of the pipelines I only want to trigger that one and now the other. I've implemented this logic in a bash script that runs as a step in my github workflow and saves which of the pipelines to run to an environment variable.

My plan was then to make the aml-run action trigger a python script in azure-ml that will read in this environment variable and trigger the right pipelines. My problem is that if I set the environment variable in the Azure/aml-run@v1 step then it's not propagated to the run in azure-ml and I cannot retrieve it within the python script.

Is there any way I could get this variable into the python script I have defined in run_config.yml? I can see that there are arguments for setting both environment variables as well as arguments in the run_config.yml but would it be possible to change this based on what I send in to the aml-run action?

This action stops with an exception, when no metrics are collected with run context

If you are not using any kind of metric logging, using the run context from azureml sdk, this action crashes, because of an empty metrics list.
image

Following is my training script:

  • (should be *.py) test_train.txt
  • You can see that there is only the assignment of run context in line 20
  • no other use of run.log(...) or others
  • adding run.log(...) only one time, the pipeline completes the step successfully

Cannot import yaml library in runconfig_python_file, no traceback

Hi,

I am trying to utilize this GitHub Action for a simple CI pipeline.
Unfortunately it offers no helpful traceback when it fails. There seems to be an issue when I tray to extend the runconfig_python_file.
I am currently using the runconfig_python_file from the test folder an simply adding a few lines of code. One thing I tried to do, was reading a yaml file to make the script more configurable.

Unfortunately then the Action fails already when I just want to import the yaml library without any helpful error information.

Script:

import os
import yaml

from azureml.core import (
    Workspace,
    Experiment,
    ComputeTarget,
    Environment,
    ScriptRunConfig,
)

def main(workspace):
    ws = workspace
    print("Workspace Name: \t{}".format(ws.name))
    print("Resource Group: \t{}".format(ws.resource_group))
    print("Location: \t\t{}".format(ws.location))
    print("Subscription ID: \t{}".format(ws.subscription_id))

    compute_name = 'my-compute'
    compute_target = ws.compute_targets[compute_name]
    print("Compute Cluster Name: \t{}".format(compute_name))

    print("Loading Environment")
    my_env= Environment.from_conda_specification(
        name="my_env",
        file_path="code/environment.yml"
    )

    tasks_source_dir =  '/tasks' 

    print("Loading script parameters")
    script_args = [
        "--kernel", "linear",
        "--penalty", 1.0
    ]
    
    print("Creating run config")
    run_config = ScriptRunConfig(
        source_directory=tasks_source_dir,
        script="T01_Test_Task.py",
        arguments=script_args,
        run_config="",
        compute_target=compute_target,
        environment=my_env
    )

    return run_config

GitHub Action YAML:

# Actions train a model on Azure Machine Learning
name: Continous Integration
on:
  push:
    branches:
      - dev
    # paths:
    #   - 'code/*'
jobs:
  train:
    runs-on: ubuntu-latest
    steps:
    # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
    - name: Check Out Repository
      id: checkout_repository
      uses: actions/checkout@v2
      
    - name: Python Set Up
      uses: actions/setup-python@v4
      with:
        python-version: '3.8'
        # cache: 'pip'
    - run: pip install -r requirements.txt  # this requirements.txt contains pyyaml
    
    # Connect or Create the Azure Machine Learning Workspace
    - name: Connect/Create Azure Machine Learning Workspace
      id: aml_workspace
      uses: Azure/aml-workspace@v1
      with:
          azure_credentials: ${{ secrets.AZURE_CREDENTIALS }}
    
    # Connect or Create a Compute Target in Azure Machine Learning
    - name: Connect/Create Azure Machine Learning Compute Target
      id: aml_compute_training
      uses: Azure/aml-compute@v1
      with:
          azure_credentials: ${{ secrets.AZURE_CREDENTIALS }}
    
    # Submit a training run to the Azure Machine Learning
    - name: Submit training run
      id: aml_run
      uses: Azure/aml-run@v1
      with:
          azure_credentials: ${{ secrets.AZURE_CREDENTIALS }}
          parameters_file: "run.json"

Traceback (most recent call last):
  File "/code/main.py", line 240, in <module>
    main()
	Message: Failed to load RunConfiguration from path=code/train/run_config.yml name=None
	InnerException None
	ErrorResponse 
***
    "error": ***
        "code": "UserError",
        "message": "Failed to load RunConfiguration from path=code/train/run_config.yml name=None"
    ***
***
None
Error: Error when loading runconfig yaml definition your repository (Path: /code/train/run_config.yml).
Error: Error when loading pipeline yaml definition your repository (Path: /code/train/pipeline.yml).
Error: Error when loading python script or function in your repository which defines the experiment config (Script path: '/code/main.py', Function: 'main()').
Error: You have to provide either a yaml definition for your run, a yaml definition of your pipeline or a python script, which returns a runconfig (Pipeline, ScriptRunConfig, AutoMlConfig, Estimator, etc.). Please read the documentation for more details.
  File "/code/main.py", line [15](https://github.com/bayer-int/ch_daa_phinmo/runs/7499494632?check_suite_focus=true#step:10:16)3, in main
    raise AMLExperimentConfigurationException("You have to provide a yaml definition for your run, a yaml definition of your pipeline or a python script, which returns a runconfig. Please read the documentation for more details.")
utils.AMLExperimentConfigurationException: You have to provide a yaml definition for your run, a yaml definition of your pipeline or a python script, which returns a runconfig. Please read the documentation for more details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.