Code Monkey home page Code Monkey logo

aml-modules's Introduction

CI CD

Azure Machine Learning modules repository

Update: Now CI/CD is supported! See Continous deployment

This repository contains un-official Azure Machine Learning modules to use along with Designer. The modules contain a variety of functionalities to improve the current capabilities.

experiments

Modules:

The following modules are available.

Category Module name Description
Experiments Promote metrics to experiment When using the 'Evaluate Model' module, Azure ML will not report the computed metrics back to the experiment. If you are working using both designer and custom-code, this can prevent you from comparing the metrics across those runs inside of an experiment. This module promotes the metrics back to the experiment. In the case of evaluating more than one model, this module allows you to log either both of the metrics are the model with the best one. See documentation. Get YAML.
Feature Engineering Add lag column Adds one or many lag columns into the data set. The amount of time steps to go back can be indicated. This module also allows you to average all the lag columns (average lag). This is useful, image a time series with one sample per day, to form columns like "the average value of the series on the same day of week in the past three weeks". We can do that by indicating lag_columns=3, lag_by=7, average=True. Multiple columns can be generated by indicated comma-separated column names. See documentation. Get YAML.
Feature Engineering Add rolling window column Adds a rolling window column (formed with either backward or forward values). Values are aggregated based on a variety of aggregation functions depending on the type of the column. Multiple columns can be generated by indicated comma-separated column names. Get YAML.
Feature Engineering Aggregate column Groups values of columns in the dataset and aggregates the rest of them with functions depending on the type of data. All the columns will be aggregated if not indicated in the 'group by' selector. Multiple columns can be generated by indicated comma-separated column names. Get YAML.
Feature Engineering Split column values Splits the values of an string column into multiple rows or columns. It can also split them in an array-like. Get YAML.
Feature Engineering Explode column Runs the typical explode function over a column. It can explode into rows or into columns. Get YAML.
Feature Engineering Featurize date column Creates multiple feature columns based on a date column that can be use in a classification or regression problem later. Can handle either date columns as well as datetime. Get YAML.
Dimensionality reduction PCA extraction The module analyzes your data and creates a reduced feature set that captures all the information contained in the dataset, but in a smaller number of features. The module also creates a transformation that you can apply to new data, to achieve a similar reduction in dimensionality and compression of features, without requiring additional training. The computed eigenvectors are also returned as a dataset. See documentation. Get YAML.
Dimensionality reduction Trunked-SVD extration The module analyzes your data and creates a reduced feature set that captures all the information contained in the dataset, but in a smaller number of features. Works similar than PCA but without centring your data. Get YAML.
Dimensionality reduction Hicherical feature aggreation This module is a dimensionality reduction method that groups together features that look very similar, thus decreasing the number of features. The module performs a hierarchical clustering using a bottom up approach: each observation starts in its own cluster, and clusters are successively merged together. See documentation. Get YAML.
Transformations Apply Transformation (Scikit-learn API) The module allows you to apply a given transformation to a new dataset for consistent data handling. For example, if you used PCA to transform your feature space, you would want to use the same transformation was computed for training during the scoring phase as well. You can do this easily by saving the transformation, and then using Apply Transformation to the new data. This works analogy to the Apply Transformation Azure ML Designer module, but works with any transformation that implements the Scikit-learn API. See documentation. Get YAML.
Transformations Apply Transformation (Tensorflow API) The module allows you to apply a given transformation to a new dataset for consistent data handling. For example, if you used PCA to transform your feature space, you would want to use the same transformation was computed for training during the scoring phase as well. You can do this easily by saving the transformation, and then using Apply Transformation to the new data. This works analogy to the Apply Transformation Azure ML Designer module, but works with any transformation implemented as a compute graph in Tensorflow.
Scoring & Evaluation One-way ANOVA Computes the one-way ANOVA table for a linear model. Get YAML.
Scoring & Evaluation Score MLFlow model Loads a model registerd using MLFlow and runs it against a dataset. Get YAML.

Installation

You can install this modules in your workspace manually using the UI, an installation script, or by running our automated GitHub Action pipelines.

Manual installation

You can add a new module by following this steps:

  1. In your workspace, go to the Modules option inside the Assets section in the navigation menu.

  2. Click on option New module and then From Yaml file.

  3. Copy the URL of the YAML file of the module you want to install. For instance, for the module Promote metrics the Yaml file is https://github.com/santiagxf/aml-modules/blob/main/aml-promote-metrics/promote-metrics.spec.yaml. You can also do right click on the link Get YAML and select Copy link.

    Copy

  4. Paste the URL of the yaml file in the wizard and confirm the changes.

    'Manual installation'

  5. The new module will be available across all the pipelines in the workspace.

Important: Notice that you will need to perform this action for each module you want to install. If you have multiple Azure Machine Learning workspaces available, then you will have to perform the operation on each of them.

Using an installation script

In the folder scripts you will find a sample bash script to install all the modules in your workspace.

Ensure to:

  • Clone the current repository.
  • Have Azure CLI and Azure ML CLI installed.
  • Run it using a bash console.

Continous deployment

You can automate the deployment of the actions by using the CI/CD GitHub Actions workflow provided in this repository. The CI workflow can automatically run PyTest test over the modules code and validate modules against Azure ML API to ensure YML correctness. The CD workflow will take all the modules and install them inside the Azure ML Workspace. It will also upgrade the default version of the module to the indicated one.

image

Requirements:

  • A Service Principal created in the tenant.
  • The Client ID, Client Secret and Tenant ID associated with the indicated service principal.
  • Grant permissions to the service principal in the target resource group/Azure Machine Learning Workspace.

Configuration:

Configure the env sections in the CI.yml and CD.yml according to your workspace

    env:
      COMPONENTS_PATH: modules
      WORKSPACE: aa-ml-aml-showroom-ws
      RESOURE_GROUP: Analytics.Aml.Experiments.Workspaces

Note: This CI/CD workflow will look for any module definition inside the COMPONENTS_PATH and will try to install them in the workspace. If the component is already there, it will be upgraded too and the new default version upgraded.

aml-modules's People

Contributors

santiagxf avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.