Code Monkey home page Code Monkey logo

solution-accelerator-many-models's Introduction

Many Models Solution Accelerator Banner

Many Models Solution Accelerator

In the real world, many problems can be too complex to be solved by a single machine learning model. Whether that be predicting sales for each individual store, building a predictive maintanence model for hundreds of oil wells, or tailoring an experience to individual users, building a model for each instance can lead to improved results on many machine learning problems.

This Pattern is very common across a wide variety of industries and applicable to many real world use cases. Below are some examples we have seen where this pattern is being used.

  • Energy and utility companies building predictive maintenance models for thousands of oil wells, hundreds of wind turbines or hundreds of smart meters

  • Retail organizations building workforce optimization models for thousands of stores, campaign promotion propensity models, Price optimization models for hundreds of thousands of products they sell

  • Restaurant chains building demand forecasting models across thousands of restaurants 

  • Banks and financial institutes building models for cash replenishment for ATM Machine and for several ATMs or building personalized models for individuals

  • Enterprises building revenue forecasting models at each division level

  • Document management companies building text analytics and legal document search models per each state

Azure Machine Learning (AML) makes it easy to train, operate, and manage hundreds or even thousands of models. This repo will walk you through the end to end process of creating a many models solution from training to scoring to monitoring.

Prerequisites

To use this solution accelerator, all you need is access to an Azure subscription and an Azure Machine Learning Workspace that you'll create below.

While it's not required, a basic understanding of Azure Machine Learning will be helpful for understanding the solution. The following resources can help introduce you to AML:

  1. Azure Machine Learning Overview
  2. Azure Machine Learning Tutorials
  3. Azure Machine Learning Sample Notebooks on Github

Getting started

1. Deploy Resources

Start by deploying the resources to Azure. The button below will deploy Azure Machine Learning and its related resources:

2. Configure Development Environment

Next you'll need to configure your development environment for Azure Machine Learning. We recommend using a Notebook VM as it's the fastest way to get up and running. Follow the steps in EnvironmentSetup.md to create a Notebook VM and clone the repo onto it.

3. Run Notebooks

Once your development environment is set up, run through the Jupyter Notebooks sequentially following the steps outlined. By the end, you'll know how to train, score, and make predictions using the many models pattern on Azure Machine Learning.

There are two ways to train many models:

  1. Using a custom training script
  2. Using Automated ML

However, the steps needed to set the workspace up and prepare the datasets are the same no matter which option you choose.

Sequence of Notebooks

Contents

In this repo, you'll train and score a forecasting model for each orange juice brand and for each store at a (simulated) grocery chain. By the end, you'll have forecasted sales by using up to 11,973 models to predict sales for the next few weeks.

The data used in this sample is simulated based on the Dominick's Orange Juice Dataset, sales data from a Chicago area grocery store.

The functionality is broken into the notebooks folders designed to be run sequentially.

Before training the models

Notebook Description
00_Setup_AML_Workspace.ipynb Creates and configures the AML Workspace, including deploying a compute cluster for training.
01_Data_Preparation.ipynb Prepares the datasets that will be used during training and forecasting.

Using a custom training script to train the models:

The following notebooks are located under the Custom_Script/ folder.

Notebook Description
02_CustomScript_Training_Pipeline.ipynb Creates a pipeline to train a model for each store and orange juice brand in the dataset using a custom script.
03_CustomScript_Forecasting_Pipeline.ipynb Creates a pipeline to forecast future orange juice sales using the models trained in the previous step.

Using Automated ML to train the models:

The following notebooks are located under the Automated_ML/ folder.

Notebook Description
02_AutoML_Training_Pipeline.ipynb Creates a pipeline to train a model for each store and orange juice brand in the dataset using Automated ML.
03_AutoML_Forecasting_Pipeline.ipynb Creates a pipeline to forecast future orange juice sales using the models trained in the previous step.

How-to-videos

Watch these how-to-videos for a step by step walk-through of the many model solution accelerator to learn how to setup your models using both the custom training script and Automated ML.

Custom Script

Watch the video

Automated ML

Watch the video

Key concepts

ParallelRunStep

ParallelRunStep enables the parallel training of models and is commonly used for batch inferencing. This document walks through some of the key concepts around ParallelRunStep.

Pipelines

Pipelines allow you to create workflows in your machine learning projects. These workflows have a number of benefits including speed, simplicity, repeatability, and modularity.

Automated Machine Learning

Automated Machine Learning also referred to as automated ML or AutoML, is the process of automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.

Other Concepts

In additional to ParallelRunStep, Pipelines and Automated Machine Learning, you'll also be working with the following concepts including workspace, datasets, compute targets, python script steps, and Azure Open Datasets.

Contributing

This project welcomes contributions and suggestions. To learn more visit the contributing section.

Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

solution-accelerator-many-models's People

Contributors

aniththa avatar bhavangowdan avatar cartacios avatar deeptim123 avatar hchandola avatar krishnaanumalasetty avatar mariamedp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

solution-accelerator-many-models's Issues

many models pattern vs regular regression based models - responsible compute - accuracy improvements - available research?

Hi there! we've been exploring this pattern occasionally (frequently using facebook prophet), but these days, especially if at large scale (# of time series), I was wondering if there is actually any known research on accuracy improvements using this vs a more typical single fit regression approach, (using xgboost, lgbm), which should account easily for the additional granularity attributes.

Asking mainly because of compute considerations this pattern requires, especially as shown in the samples on this repo. Raising some responsible AI considerations honestly, is this really worth the effort and compute? Anyone studied this? Would love to know and deep dive.

Also taking into account that as far as I can remember Azure AutoML already has a forecast option for additional granularity attributes like stores, locations,etc. So not clear what's the benefit of parallel automl by store? Is it not redundant? (current version parameter is time_series_id_column_names)

Some related threads & refs:
multiple time series (ex: locations, skus,stores) | prophet (many models) vs regression (single model) | scalability
facebook/prophet#1687

Fine-Grained Time Series Forecasting At Scale With Facebook Prophet And Apache Spark
https://databricks.com/blog/2020/01/27/time-series-forecasting-prophet-spark.html

Error When Running Forecast Pipeline

I am currently following this notebook and now encountering an error below when the inference experiment is run on AML:

Response status code does not indicate success: 400 (BaseImage, BaseDockerfile, or BuildContext must be set for Docker-based environments.).
Microsoft.RelInfra.Common.Exceptions.ErrorResponseException: BaseImage, BaseDockerfile, or BuildContext must be set for Docker-based environments.

It failed in this step of the AzureML pipeline:
image

The following are the versions of the AML libraries I used:

  • azureml-sdk==1.34.0
  • azureml-train-automl==1.34.0
  • azureml-contrib-automl-pipeline-steps==1.34.1

Is this perhaps related to this PR where the environment needs to be explicitly specified in the training and inference settings?

Can you help to diagnose this issue? Thanks.

Model name can not be made unique to a given automl config / pipeline.

Say for example, that the many models training accelerator (MMA) is to be used with two different automl configs in two pipelines but the same training data and partitions. The registered models will have the same names in the registry. There is, as far as I can see currently no way to append information to the model name to make it unique in this situation.

I suggested solution would be to allow the user to optionally pass in a model name prefix or suffix. It then becomes the API consumers responsibility to make the name unique in what ever scope they choose.

Nested prefix '{}' for Azure File Share is currently not supported

The following cell in '02b_Train_AutoML' fails with an error:

from scripts.helper import get_training_output
import os

training_results_name = "training_results"

training_file = get_training_output(run, training_results_name, training_output_name)
all_columns = ["Framework", "Dataset", "Run", "Status", "Model", "Tags", "StartTime", "EndTime" , "ErrorType", "ErrorCode", "ErrorMessage" ]
df = pd.read_csv(training_file, delimiter=" ", header=None, names=all_columns)
training_csv_file = "training.csv"
df.to_csv(training_csv_file)
print("Training output has", df.shape[0], "rows. Please open", os.path.abspath(training_csv_file), "to browse through all the output.")

Error:

---------------------------------------------------------------------------
UserErrorException                        Traceback (most recent call last)
<ipython-input-28-cb9e2155c53e> in <module>
      4 training_results_name = "training_results"
      5 
----> 6 training_file = get_training_output(run, training_results_name, training_output_name)
      7 all_columns = ["Framework", "Dataset", "Run", "Status", "Model", "Tags", "StartTime", "EndTime" , "ErrorType", "ErrorCode", "ErrorMessage" ]
      8 df = pd.read_csv(training_file, delimiter=" ", header=None, names=all_columns)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/amlcomprh-d3/code/Users/rohoff/stable/many_models/solution-accelerator-many-models/Automated_ML/02b_Train_AutoML/scripts/helper.py in get_training_output(run, training_results_name, training_output_name)
     56 def get_training_output(run, training_results_name, training_output_name):
     57     from common.scripts.helper import get_output
---> 58     return get_output(run, training_results_name, training_output_name)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/amlcomprh-d3/code/Users/rohoff/stable/many_models/solution-accelerator-many-models/Automated_ML/common/scripts/helper.py in get_output(run, results_name, output_name)
     47     batch_run = next(run.get_children())
     48     batch_output = batch_run.get_output_data(output_name)
---> 49     batch_output.download(local_path=results_name)
     50 
     51     keep_root_folder(results_name, results_name)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/graph.py in download(self, local_path, overwrite, show_progress)
   4482             local_path=local_path,
   4483             overwrite=overwrite,
-> 4484             show_progress=show_progress)
   4485 
   4486     def as_download(self, input_name=None, path_on_compute=None, overwrite=None):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_aeva_provider.py in download(self, datastore_name, path_on_datastore, local_path, overwrite, show_progress)
   1533 
   1534         return datastore.download(target_path=local_path, prefix=path_on_datastore, overwrite=overwrite,
-> 1535                                   show_progress=show_progress)
   1536 
   1537 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/data/azure_storage_datastore.py in download(self, target_path, prefix, overwrite, show_progress)
    434         """
    435         module_logger.info("Called AzureFileDatastore.download")
--> 436         AzureFileDatastore._verify_prefix(prefix)
    437         if not os.path.exists(target_path):
    438             os.makedirs(target_path)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/data/azure_storage_datastore.py in _verify_prefix(prefix)
    585         prefix_segments = re.split(r'[/\\]+', prefix)
    586         if len(prefix_segments) > 1:
--> 587             raise UserErrorException("Nested prefix '{}' for Azure File Share is currently not supported.")
    588 
    589 

UserErrorException: UserErrorException:
	Message: Nested prefix '{}' for Azure File Share is currently not supported.
	InnerException None
	ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Nested prefix '{}' for Azure File Share is currently not supported."
    }
}

Thank you and best regards, Robert

I can't Replicate the whole pipeline for the Orange juice dataset

image

LOGs: (from execution logs.. other logs are blank..)

[2021-12-14 10:24:25Z] Submitting 1 runs, first five are: 4ec0b35a:035c96c2-2eeb-4538-a050-8d3ceb5ed383
[2021-12-14 10:24:55Z] Execution of experiment failed, update experiment status and cancel running nodes.

I couldn't able to replicate the custom script pipeline for orange-juice dataset..

I attached image for your reference..

Thanks in advance

Dependency failure running 03_CustomScript Forecast

Steup and all previous notebooks run ok. Then at the Copy Predictions PythonScript step you get a failure with the following standard error: "[stderr]Traceback (most recent call last):
[stderr] File "copy_predictions.py", line 4, in
[stderr] import pandas as pd
[stderr]ModuleNotFoundError: No module named 'pandas'
[stderr]

Environment or Runconfig isn't set for this.

With some investigation error is at the Copy Predictions step in the 03_CustomScriptForecast notebook. I was able to remediate by creating a RunConfig and adding that to the PythonScriptStep.

from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import RunConfiguration

conda_deps = CondaDependencies.create(pip_packages=['sklearn', 'pandas', 'joblib', 'azureml-defaults', 'azureml-core', 'azureml-dataprep[fuse]'])
run_config=RunConfiguration(conda_dependencies=conda_deps)

The following is the same but now with the added runconfig argument being set to the object

upload_predictions_step = PythonScriptStep(
name="copy_predictions",
script_name="copy_predictions.py",
compute_target=compute,
source_directory='./scripts',
inputs=[output_dref, output_dir],
runconfig=run_config,
allow_reuse=False,
arguments=['--parallel_run_step_output', output_dir,
'--output_dir', output_dref,
'--target_column', 'Quantity',
'--timestamp_column', 'WeekStarting',
'--timeseries_id_columns', 'Store', 'Brand']

AutoML training pipeline fails

When submit the AutoML training pipeline for all files it runs for a while and then crashes, the first time I was able to train 15 models before it failed, then 21, now 31. The same thing happened when I tried to use my own data.

This is the error message I got:

User program failed with EntryScriptException: Job failed with entry script error No progress update in 11190 seconds. Updated: False, total wait from last update: 11193 seconds, remaining -3 seconds. 31/11973 items processed. new_finished_tasks: 31. Detail: Processed 31 of 11973 mini batches.
The run() function in the entry script had raised exception for 237 times. Please check logs at logs/user/error/* for details.

  • Error ''AutoML_5131f894-4f05-405d-a805-dff6c3493d01'' occurred 52 times.
  • Error ''AutoML_d0abb201-a0c6-4dc1-b2da-8535e2181755'' occurred 29 times.
  • Error ''AutoML_eaf5bd20-3688-47c1-8bb5-b13fdfd6e0a7'' occurred 7 times.
  • Error ''AutoML_ba49f33f-340e-4096-ac0c-0e409145ffe2'' occurred 7 times.
  • Error ''AutoML_99c58925-9b99-40c1-a383-ce1a5ff8fc8a'' occurred 6 times.
  • Error ''AutoML_194dae15-e570-484c-9c8e-c1f464465c17'' occurred 4 times.
  • Error ''AutoML_a9341559-5624-4054-86af-712c34f19701'' occurred 4 times.
  • Error ''AutoML_fb3e83f3-045e-4ced-8586-8115ae417c17'' occurred 2 times.
  • Error ''AutoML_21306702-a26d-4030-a44b-3991472ae503'' occurred 1 times.
  • Error ''AutoML_9c507df2-3869-4a2e-83eb-1c13a10e7dcb'' occurred 1 times.
  • Error ''AutoML_17e6c53f-87db-43d6-981b-d744ac74ec42'' occurred 1 times. No progress update in 11190 seconds. Updated: False, total wait from last update: 11193 seconds, remaining -3 seconds. 31/11973 items processed. new_finished_tasks: 31.

DataFrame column names are stripped when combining timeseries forecast output.

I'm running a timeseries forecast using the many models accelerator and the results come back in a plain text file with whitespace delimited values. There are no headers in the first row. The return value from each mini batch is a pandas DataFrame with columns names. These are being combined to make the output file but the column names are lost.

Forecasting training data

In Forecasting Pipeline Notebook,
we can predict testing data.
Is there option to forecast from training dataset?

copy_predictions.py fails because of pandas?

I am trying to run the pipeline in 03_CustomScript_Forecasting_Pipeline.ipynb but if fails when it gets to import pandas as pd in the copy_predictions.py script. Seems strange since the forecast.py notebook that precedes it in the pipeline also has an import pandas as pd statement and it works

The CondaDependencies.create arguments has pandas in it, see below

forecast_conda_deps = CondaDependencies.create(pip_packages=['sklearn', 'pandas', 'joblib', 'azureml-defaults', 'azureml-core', 'azureml-dataprep[fuse]'])

'add_parallel_run_step_dependencies' error

When I execute the following code in 02b_Train_AutoML.ipynb, I receive an unexpected keyword argument.

I tried upgrading azureml.contrib.pipeline.steps ( on 1.7.0 ), and it did not resolve the error.

from azureml.contrib.pipeline.steps import ParallelRunStep

parallel_run_step = ParallelRunStep(
name="many-models-training",
parallel_run_config=parallel_run_config,
allow_reuse = False,
inputs=[filedst_all_models], # train 10 models
#inputs=[filedst_all_models_inputs], # switch to this inputs if train all 11,973 models
output=output_dir,
models=[],
arguments=[],
add_parallel_run_step_dependencies=False
)

Error description


TypeError Traceback (most recent call last)
in
10 models=[],
11 arguments=[],
---> 12 add_parallel_run_step_dependencies=False
13 )

TypeError: init() got an unexpected keyword argument 'add_parallel_run_step_dependencies'

Creating compute target fails

The cells to create a compute target fail with:

Creating a new compute target...
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-40-c65a1dcc03d1> in <module>
     19                                                            max_nodes=2), # !!! RH instead of 20
     20     # Create the cluster.
---> 21     compute = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)
     22 
     23 print('Checking cluster status...')

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/compute/compute.py in create(workspace, name, provisioning_configuration)
    306             raise UserErrorException("Please specify a different target name."
    307                                      " {} is a reserved name.".format(name))
--> 308         compute_type = provisioning_configuration._compute_type
    309         return compute_type._create(workspace, name, provisioning_configuration)
    310 

AttributeError: 'tuple' object has no attribute '_compute_type'

For instance the following cell in notebook '02b_Train_AutoML':

from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget

# Choose a name for your cluster.
amlcompute_cluster_name = "train-many-model2"

found = False
# Check if this compute target already exists in the workspace.
cts = ws.compute_targets
if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':
    found = True
    print('Found existing compute target.')
    compute = cts[amlcompute_cluster_name]
    
if not found:
    print('Creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D13_V2',
                                                           min_nodes=0, # !!! RH instead of 2
                                                           max_nodes=2), # !!! RH instead of 20
    # Create the cluster.
    compute = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)
    
print('Checking cluster status...')
# Can poll for a minimum number of nodes and for a specific timeout.
# If no min_node_count is provided, it will use the scale settings for the cluster.
compute.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)
    
# For a more detailed view of current AmlCompute status, use get_status().

thank you and best regards, Robert

ImportError: cannot import name 'list_remove_none_object'

I have runned example as below.
00_Setup_AML_Workspace.ipynb
01_Data_Preparation.ipynb
02_AutoML_Training_Pipeline.ipynb

There is an error in 02_AutoML_Training_Pipeline.ipynb.
ImportError: cannot import name 'list_remove_none_object'

Could you tell me how to fix it?


Environment:
Azure compute instant

SDK version | 1.18.0

!pip install --upgrade azureml-sdk[automl]
!pip install --upgrade azureml-pipeline-steps
!pip install azureml.pipeline.steps


in 02_AutoML_Training_Pipeline.ipynb
There are some warning as below.


from scripts.helper import get_automl_environment
train_env = get_automl_environment(workspace=ws, automl_settings_dict=automl_settings)


WARNING:root:Received unrecognized parameter task
WARNING:root:Received unrecognized parameter experiment_timeout_hours
WARNING:root:Received unrecognized parameter time_column_name
WARNING:root:Received unrecognized parameter max_horizon
WARNING:root:Received unrecognized parameter group_column_names
WARNING:root:Received unrecognized parameter grain_column_names

There is an error in below code:


from scripts.helper import build_parallel_run_config

PLEASE MODIFY the following three settings based on your compute and experiment timeout.

node_count=2
process_count_per_node=8
run_invocation_timeout=3700 # this timeout(in seconds) is inline with AutoML experiment timeout or (no of iterations * iteration timeout)

parallel_run_config = build_parallel_run_config(train_env, compute, node_count, process_count_per_node, run_invocation_timeout)


train_envtrain_env---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in
6 run_invocation_timeout=3700 # this timeout(in seconds) is inline with AutoML experiment timeout or (no of iterations * iteration timeout)
7
----> 8 parallel_run_config = build_parallel_run_config(, compute, node_count, process_count_per_node, run_invocation_timeout)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/std-ds3-v2-m-aml/code/manymodels_05/Automated_ML/02_AutoML_Training_Pipeline/scripts/helper.py in build_parallel_run_config(train_env, compute, nodecount, workercount, timeout)
31
32 def build_parallel_run_config(, compute, nodecount, workercount, timeout):
---> 33 from azureml.pipeline.steps import ParallelRunConfig
34 from common.scripts.helper import validate_parallel_run_config
35 parallel_run_config = ParallelRunConfig(

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/init.py in
32 from .estimator_step import EstimatorStep
33 from .mpi_step import MpiStep
---> 34 from .hyper_drive_step import HyperDriveStep, HyperDriveStepRun
35 from .azurebatch_step import AzureBatchStep
36 from .module_step import ModuleStep

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/hyper_drive_step.py in
10 from azureml.pipeline.core._module_builder import _ModuleBuilder
11 from azureml.pipeline.core.graph import ParamDef, OutputPortBinding
---> 12 from azureml.train.hyperdrive.run import HyperDriveRun
13
14

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/train/hyperdrive/init.py in
15 from .policy import BanditPolicy, MedianStoppingPolicy, NoTerminationPolicy, TruncationSelectionPolicy,
16 EarlyTerminationPolicy
---> 17 from .runconfig import HyperDriveRunConfig, HyperDriveConfig, PrimaryMetricGoal
18 from .run import HyperDriveRun
19 from .sampling import RandomParameterSampling, GridParameterSampling, BayesianParameterSampling, HyperParameterSampling

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/train/hyperdrive/runconfig.py in
17 from azureml.train.hyperdrive.policy import NoTerminationPolicy, _policy_from_dict
18 from azureml.train.hyperdrive.sampling import BayesianParameterSampling, _sampling_from_dict
---> 19 from azureml.train._estimator_helper import _get_arguments
20 from azureml._restclient.constants import RunStatus
21 from azureml.data.constants import _HYPERDRIVE_SUBMIT_ACTIVITY

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/train/_estimator_helper.py in
15 from azureml.data.output_dataset_config import OutputDatasetConfig
16 from azureml.exceptions import ComputeTargetException, UserErrorException, TrainingException
---> 17 from azureml._base_sdk_common.utils import convert_dict_to_list, merge_list,
18 list_remove_none_object, list_remove_empty_strings
19 from azureml.data.data_reference import DataReference

ImportError: cannot import name 'list_remove_none_object'

issue in CustomScript_Forecasting_Pipeline

Hi we are facing issue in CustomScript_Forecasting_Pipeline the pipeline that we are running in cell 14 is getting failed .Please find the screenshot below for reference:
image

Allowed_models parameter does not work correctly

Is there a solution to using allowed_models in solution-accelerator-many-models?

If I specify automl_settings as follows


import logging
from scripts.helper import write_automl_settings_to_file

automl_settings = {
    "task" : 'forecasting',
    "primary_metric" : 'normalized_root_mean_squared_error',
    "iteration_timeout_minutes" : 10, # This needs to be changed based on the dataset. We ask customer to explore how long training is taking before settings this value
    "iterations" : 15,
    "experiment_timeout_hours" : 1,
    "label_column_name" : 'sales_quantity',
    "n_cross_validations" : 3,
    "verbosity" : logging.INFO, 
    "debug_log": 'automl_oj_sales_debug.txt',   
    "track_child_runs": False,
    "time_column_name": 'sales_date',
    "max_horizon" : 6,
    "group_column_names": ['product_code'],
    "grain_column_names": ['product_code'],
    # "target_rolling_window_size" : 12,
    # "forecasting_parameters" : forecasting_parameters,
    "enable_voting_ensemble" : False,
    "allowed_models" : ['AutoArima','Average','Naive','Prophet','SeasonalAverage','SeasonalNaive']

}

write_automl_settings_to_file(automl_settings)

from scripts.helper import get_automl_environment
train_env = get_automl_environment(workspace=ws, automl_settings_dict=automl_settings)

I got an error message like this

---------------------------------------------------------------------------
ConfigException                           Traceback (most recent call last)
<ipython-input-78-295cd1e5a96f> in <module>
      1 from scripts.helper import get_automl_environment
----> 2 train_env = get_automl_environment(workspace=ws, automl_settings_dict=automl_settings)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/nagata-vm/code/Users/ryoma.nagata/manymodel/solution-accelerator-many-models/Automated_ML/02_AutoML_Training_Pipeline/scripts/helper.py in get_automl_environment(workspace, automl_settings_dict)
     50 def get_automl_environment(workspace: Workspace, automl_settings_dict: dict):
     51     from common.scripts.helper import get_automl_environment as get_env
---> 52     return get_env(workspace, automl_settings_dict)
     53 
     54 

/mnt/batch/tasks/shared/LS_root/mounts/clusters/nagata-vm/code/Users/ryoma.nagata/manymodel/solution-accelerator-many-models/Automated_ML/common/scripts/helper.py in get_automl_environment(workspace, automl_settings_dict)
     22     null_logger.propagate = False
     23     automl_settings_obj = AzureAutoMLSettings.from_string_or_dict(
---> 24         automl_settings_dict)
     25     run_configuration = modify_run_configuration(
     26         automl_settings_obj,

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/train/automl/_azureautomlsettings.py in from_string_or_dict(val, experiment, overrides)
    415             if overrides is not None:
    416                 val.update(overrides)
--> 417             return AzureAutoMLSettings(experiment=experiment, **val)
    418 
    419         if isinstance(val, AzureAutoMLSettings):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/train/automl/_azureautomlsettings.py in __init__(self, experiment, path, iterations, data_script, primary_metric, task_type, compute_target, spark_context, validation_size, n_cross_validations, y_min, y_max, num_classes, featurization, max_cores_per_iteration, max_concurrent_iterations, iteration_timeout_minutes, mem_in_mb, enforce_time_on_windows, experiment_timeout_minutes, experiment_exit_score, enable_early_stopping, blacklist_models, whitelist_models, exclude_nan_labels, verbosity, debug_log, debug_flag, enable_voting_ensemble, enable_stack_ensemble, ensemble_iterations, model_explainability, enable_tf, enable_subsampling, subsample_seed, cost_mode, is_timeseries, enable_onnx_compatible_models, scenario, environment_label, show_deprecate_warnings, **kwargs)
    292             scenario=scenario,
    293             environment_label=environment_label,
--> 294             **kwargs)
    295 
    296         # temporary measure to bypass the typecheck in base settings in common core

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/automl/core/automl_base_settings.py in __init__(self, path, iterations, data_script, primary_metric, task_type, validation_size, n_cross_validations, y_min, y_max, num_classes, featurization, max_cores_per_iteration, max_concurrent_iterations, iteration_timeout_minutes, mem_in_mb, enforce_time_on_windows, experiment_timeout_minutes, experiment_exit_score, blocked_models, blacklist_models, allowed_models, whitelist_models, exclude_nan_labels, verbosity, debug_log, debug_flag, enable_voting_ensemble, enable_stack_ensemble, ensemble_iterations, model_explainability, enable_tf, enable_subsampling, subsample_seed, cost_mode, is_timeseries, enable_early_stopping, early_stopping_n_iters, enable_onnx_compatible_models, enable_feature_sweeping, enable_nimbusml, enable_streaming, force_streaming, label_column_name, weight_column_name, cv_split_column_names, enable_local_managed, vm_type, track_child_runs, show_deprecate_warnings, forecasting_parameters, allowed_private_models, scenario, environment_label, **kwargs)
    549             self.whitelist_models = self.allowed_private_models.copy()
    550 
--> 551         self._verify_settings()
    552 
    553         # Settings that need to be set after verification

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/train/automl/_azureautomlsettings.py in _verify_settings(self)
    363         # Base settings object will do most of the verification. Only add AzureML-specific checks here.
    364         try:
--> 365             super()._verify_settings()
    366         except ValueError as e:
    367             # todo figure out how this is reachable, and if it's right to raise it as ConfigException

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/automl/core/automl_base_settings.py in _verify_settings(self)
    877             )
    878 
--> 879         self._validate_model_filter_lists()
    880         self._validate_allowed_private_model_list()
    881 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/automl/core/automl_base_settings.py in _validate_model_filter_lists(self)
    943                         InvalidArgumentWithSupportedValues, target="allowed_models",
    944                         reference_code=ReferenceCodes._AUTOML_CONFIG_ALLOWEDMODELS_EMPTY,
--> 945                         arguments="allowed_models", supported_values=self._get_supported_model_names()
    946                     )
    947                 )

ConfigException: ConfigException:
	Message: Invalid argument(s) 'allowed_models' specified. Supported value(s): '['XGBoostRegressor', 'TensorFlowDNN', 'ExtremeRandomTrees', 'DecisionTree', 'SGD', 'OnlineGradientDescentRegressor', 'LightGBM', 'FastLinearRegressor', 'TensorFlowLinearRegressor', 'LassoLars', 'KNN', 'RandomForest', 'ElasticNet', 'GradientBoosting']'.
	InnerException: None
	ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Invalid argument(s) 'allowed_models' specified. Supported value(s): '['XGBoostRegressor', 'TensorFlowDNN', 'ExtremeRandomTrees', 'DecisionTree', 'SGD', 'OnlineGradientDescentRegressor', 'LightGBM', 'FastLinearRegressor', 'TensorFlowLinearRegressor', 'LassoLars', 'KNN', 'RandomForest', 'ElasticNet', 'GradientBoosting']'.",
        "details_uri": "https://aka.ms/AutoMLConfig",
        "target": "allowed_models",
        "inner_error": {
            "code": "BadArgument",
            "inner_error": {
                "code": "ArgumentInvalid"
            }
        },
        "reference_code": "a95429c1-1592-4730-b8e8-d52b4db80349"
    }
}

「from azureml.train.automl._azureautomlsettings import AzureAutoMLSettings」 seems that it can't allow the Forecast model.

Exception thrown when dataframes are passed as input to ParallelRunStep class

It will be super helpful to let ParallelRunStep class to allow dataframes as inputs.

I understand that ParallelRunStep class only allows the input types - [DatasetConsumptionConfig, PipelineOutputTabularDataset,PipelineOutputTabularDataset, OutputFileDatasetConfig, OutputTabularDatasetConfig, LinkFileOutputDatasetConfig, LinkTabularOutputDatasetConfig]

Is it possible to let dataframes as inputs in ParallelRunStep. Could this be a usecase that Azure ML dev team would consider?

Exception                                 Traceback (most recent call last)
<ipython-input-27-215e373515cb> in <module>
      7     output=output_dir,
      8     allow_reuse=False,
----> 9     arguments=None
     10 )

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/parallel_run_step.py in __init__(self, name, parallel_run_config, inputs, output, side_inputs, arguments, allow_reuse)
    155             side_inputs=side_inputs,
    156             arguments=arguments,
--> 157             allow_reuse=allow_reuse,
    158         )
    159 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in __init__(self, name, parallel_run_config, inputs, output, side_inputs, arguments, allow_reuse)
    259 
    260         self._process_inputs_output_dataset_configs()
--> 261         self._validate()
    262         self._get_pystep_inputs()
    263 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _validate(self)
    329         """Validate input params to init parallel run step class."""
    330         self._validate_arguments()
--> 331         self._validate_inputs()
    332         self._validate_output()
    333         self._validate_parallel_run_config()

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _validate_inputs(self)
    410 
    411         if self._inputs:
--> 412             self._input_ds_type = self._get_input_type(self._inputs[0])
    413             for input_ds in self._inputs:
    414                 if self._input_ds_type != self._get_input_type(input_ds):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _get_input_type(self, in_ds)
    399             ds_mapping_type = INPUT_TYPE_DICT[input_type]
    400         else:
--> 401             raise Exception("Step input must be of any type: {}, found {}".format(ALLOWED_INPUT_TYPES, input_type))
    402         return ds_mapping_type
    403 

Exception: Step input must be of any type: (<class 'azureml.data.dataset_consumption_config.DatasetConsumptionConfig'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputFileDataset'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset'>, <class 'azureml.data.output_dataset_config.OutputFileDatasetConfig'>, <class 'azureml.data.output_dataset_config.OutputTabularDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkFileOutputDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkTabularOutputDatasetConfig'>), found <class 'pandas.core.frame.DataFrame'>

AutoMLPipelineBuilder error

I obtain some errors while trying to run the MMSA AutoML example with the OJ Dataset.

Create the experiment

from azureml.core import Experiment

experiment = Experiment(workspace=ws, name='mmsa-automl-training')

Connect to the dataset

from azureml.core.dataset import Dataset

oj_data_small_train_ds = Dataset.get_by_name(workspace=ws, name='oj_data_small_train')
oj_data_small_train_input = oj_data_small_train_ds.as_named_input(name='oj_data_small_train')

Choose a compute target

from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget

Choose a name for your cluster.

amlcompute_cluster_name = "cpu-cluster"

found = False

Check if this compute target already exists in the workspace.

cts = ws.compute_targets
if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':
found = True
print('Found existing compute target.')
compute = cts[amlcompute_cluster_name]
[...]

Select AutoML settings

import logging

partition_column_names = ['Store', 'Brand']

automl_settings = {
"task" : 'forecasting',
"primary_metric" : 'normalized_root_mean_squared_error',
"iteration_timeout_minutes" : 20,
"iterations" : 15,
"experiment_timeout_hours" : 1,
"label_column_name" : 'Quantity',
"n_cross_validations" : 3,
# "verbosity" : logging.INFO,
"debug_log": 'automl_oj_sales_debug.txt',
"time_column_name": 'WeekStarting',
"max_horizon" : 20,
"track_child_runs": False,
"partition_column_names": partition_column_names,
"grain_column_names": ['Store', 'Brand'],
"pipeline_fetch_max_batch_size": 15
}

Create the AutoML pipeline

from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder

train_steps = AutoMLPipelineBuilder.get_many_models_train_steps(experiment=experiment,
automl_settings=automl_settings,
train_data=oj_data_small_train_ds,
compute_target=compute,
partition_column_names=partition_column_names,
node_count=5,
process_count_per_node=20,
run_invocation_timeout=3700,
output_datastore=default_store)

I will link the error message in a text file.
MMSA-AutoML.txt

I run this with the following Conda environment:

name: azureml-env
channels:

  • conda-forge
  • defaults
    dependencies:
  • python=3.7
  • numpy
  • pandas
  • pyarrow
  • seaborn
  • nb_conda
  • ipykernel
  • jupyterlab
  • matplotlib
  • scikit-learn
  • pip
  • pip:
    • websocket
    • azureml-sdk
    • azureml-mlflow
    • azureml-widgets
    • azureml-defaults
    • azureml-train-automl
    • azureml-opendatasets
    • azureml-pipeline-steps
    • azureml-contrib-automl-pipeline-steps

Incorrect argument name in many_models_inference_driver.py

In the file many_models_inference_driver.py. On line 9
parser.add_argument("--partition_column_names", '--nargs', nargs='*', type=str, help="partition_column_names")

This should be
parser.add_argument("--partition_column_names", nargs='*', type=str, help="partition_column_names")

While this is not a particularly dangerous bug it seems to be incorrect as it is adding an alternative argument name of "nargs" for the partition_column_names argument. Maybe it is the result of a copy paste error...

Online endpoints for models

I have multiple models trained. Meanwhile I could download those ML models. Now I would like to deploy endpoints to predict data using the models instead of using batching inferencing. Do you support such functionality? If so, can you point me to the sample code? Thanks.

seaborn package is not included in AzureML kernel

seaborn needs to be installed for visualization in the 02_CustomScript_Training_Pipeline.ipynb notebook.

I see it is installed in the test workflow manually. The notebook should include a
pip install seaborn

To handle the requirement.

AutoMLPipelineBuilder: Exception on step input type of <class 'azureml.data.file_dataset.FileDataset'>

Hi,

I've been following the AutoML Training Pipeline notebook for direction on implementing a many models forecasting solution.

I've registered my data as a FileDataset per instructions mentioned in the 01_Data_Preparation notebook. However, when I call the get_many_models_train_steps method an exception is thrown on step input type of <class 'azureml.data.file_dataset.FileDataset'>.

There doesn't appear to be documentation stating that I should register the dataset as any of the classes listed in the ALLOWED_INPUT_TYPES such as azureml.pipeline.core.pipeline_output_dataset.PipelineOutputFileDataset in the exception.

Below is the code that I'm using in an AzureML notebook along with the exception that gets raised. Is there any intermediate processing that I'm missing where the dataset would be converted to one of the allowed input types?

Any direction is greatly appreciated. Thanks!

#keep azureml-core updated to the latest version
!pip install --upgrade azureml-core
#Install the azureml-contrib-automl-pipeline-steps package that is needed for many models
!pip install azureml.contrib.automl.pipeline.steps

#dependencies
import logging
import os
import random
import time
import json

from matplotlib import pyplot as plt
from matplotlib.pyplot import imshow
import numpy as np
import pandas as pd
from datetime import datetime
import time

import azureml.core
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.datastore import Datastore
from azureml.core.dataset import Dataset
from azureml.core import Workspace
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder
from azureml.automl.core.forecasting_parameters import ForecastingParameters

#initialize workspace and remote compute

ws = Workspace.create(name = workspace_name,
                      subscription_id = subscription_id,
                      resource_group = resource_group, 
                      location = workspace_region,
                      exist_ok=True)

amlcompute_cluster_name = "cluster-{}".format(ws._workspace_id)[:10]
# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',min_nodes=2,max_nodes=12)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

#set up experiment
experiment_name = 'automl-manymodels-salesforecast'
experiment = Experiment(ws, experiment_name)

#sourcing data as FileDataset
ds = ws.get_default_datastore()
train_datastore_paths = [(ds, 'azureml/ForecastsInput/train/')]
test_datastore_paths = [(ds, 'azureml/ForecastsInput/test/')]
train_sales_data = Dataset.File.from_files(path=train_datastore_paths)
test_sales_data = Dataset.File.from_files(path=test_datastore_paths)

smpl_view = pd.read_csv(train_sales_data.download()[0])
smpl_view.head(5) #this works as expected

#configuration parameters for forecasting task 
target_column_name = 'revenue'
time_column_name = 'month_date'
time_series_id_column_names = 'partner'
forecast_horizon = 3
freq='MS' #MonthBegin per pandas offset
ts_models = ['Naive', 'AutoArima', 'SeasonalAverage', 'SeasonalNaive', 'ExponentialSmoothing', 'Arimax', 'Average','Prophet']

automl_settings = {
    "task" : 'forecasting',
    "primary_metric" : 'normalized_root_mean_squared_error',
    "allowed_models": ts_models,
    "iteration_timeout_minutes" : 10, # This needs to be changed based on the dataset
    "iterations" : 15,
    "experiment_timeout_hours" : 0.3,
    "label_column_name" : target_column_name,
    "n_cross_validations" : 3,
    "verbosity" : logging.INFO, 
    "debug_log": 'autoML_manyModels.txt',
    "time_column_name": time_column_name,
    "forecast_horizon" : forecast_horizon,
    "freq": freq,
    "track_child_runs": False,
    "partition_column_names": time_series_id_column_names,
    "time_series_id_column_names": time_series_id_column_names,
    "pipeline_fetch_max_batch_size": 15
}

#AutoMLPipelineBuilder is used to build the many models train step
train_steps = AutoMLPipelineBuilder.get_many_models_train_steps(experiment=experiment,
                                                                automl_settings=automl_settings,
                                                                train_data=train_sales_data,
                                                                compute_target=compute_target,
                                                                node_count=2,
                                                                process_count_per_node=8,
                                                                run_invocation_timeout=3700,
                                                                partition_column_names = time_series_id_column_names,
                                                                output_datastore=ds)

Step Input Exception

Also note that I've tried using a TabularDataset, but the below attribute error gets thrown:

    281         # TODO: Merge these two in better fashion once tabular dataset is released to public.
    282         if(dataset_type == "<class 'azureml.data.tabular_dataset.TabularDataset'>"):
--> 283             parallel_run_config = ParallelRunConfig.create_with_partition_column_names(
    284                 source_directory=PROJECT_DIR,
    285                 entry_script='many_models_train_driver.py',

AttributeError: type object 'ParallelRunConfig' has no attribute 'create_with_partition_column_names'

Logs lost

I am try to use the solution proposed within my project with a custom training script. I have this two problems:

  • child runs fail with UserError (my fault, of course) but I cannot find the related error message in logs
  • when all child runs fail the run (father) status sometimes is FAIL, sometimes is COMPLETED, sometimes is RUNNING

I acknowledge these seem not issues with the solution proposed, still I think that may hamper the robustness of it.
Please let me know if you can help me.
Thank you a lot!

UPDATE
Modifying the except part of the custom train script as

except Exception as e:
            logging.error(str(e))

was a first good idea!

DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode().

Executing the data preparation notebook on the created AML CI (following instructions) cell 6 (Workspace.from_config) runs into:

---------------------------------------------------------------------------
DecodeError                               Traceback (most recent call last)
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/authentication.py in wrapper(self, *args, **kwargs)
    288                     module_logger.debug("{} acquired lock in {} s.".format(type(self).__name__, duration))
--> 289                 return test_function(self, *args, **kwargs)
    290             except Exception as e:

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/authentication.py in _get_all_subscription_ids_internal(self, arm_token)
    516         if isinstance(self._ambient_auth, AbstractAuthentication):
--> 517             return self._ambient_auth._get_all_subscription_ids()
    518         else:

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/authentication.py in _get_all_subscription_ids(self)
   1652         from azureml._base_sdk_common.common import fetch_tenantid_from_aad_token
-> 1653         token_tenant_id = fetch_tenantid_from_aad_token(arm_token)
   1654         return _get_subscription_ids_via_client(msi_auth), token_tenant_id

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/_base_sdk_common/common.py in fetch_tenantid_from_aad_token(token)
    115     # verify signature, we just need the tenant id.
--> 116     decode_json = jwt.decode(token, verify=False)
    117     return decode_json['tid']

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/jwt/api_jwt.py in decode(self, jwt, key, algorithms, options, **kwargs)
    118     ) -> Dict[str, Any]:
--> 119         decoded = self.decode_complete(jwt, key, algorithms, options, **kwargs)
    120         return decoded["payload"]

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/jwt/api_jwt.py in decode_complete(self, jwt, key, algorithms, options, **kwargs)
     85         if options["verify_signature"] and not algorithms:
---> 86             raise DecodeError(
     87                 'It is required that you pass in a value for the "algorithms" argument when calling decode().'

DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode().

During handling of the above exception, another exception occurred:

DecodeError                               Traceback (most recent call last)
<ipython-input-6-4fbcce1fbdc9> in <module>
      1 from azureml.core.workspace import Workspace
      2 
----> 3 ws = Workspace.from_config()
      4 
      5 # Take a look at Workspace

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/workspace.py in from_config(path, auth, _logger, _file_name)
    278 
    279         _logger.info('Found the config file in: %s', found_path)
--> 280         return Workspace.get(
    281             workspace_name,
    282             auth=auth,

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/workspace.py in get(name, auth, subscription_id, resource_group)
    546             return workspace_from_auth
    547 
--> 548         result_dict = Workspace.list(
    549             subscription_id, auth=auth, resource_group=resource_group)
    550         result_dict = {k.lower(): v for k, v in result_dict.items()}

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/workspace.py in list(subscription_id, auth, resource_group)
    636                     auth, workspaces_list, result_dict)
    637         elif subscription_id and resource_group:
--> 638             workspaces_list = Workspace._list_legacy(
    639                 auth, subscription_id=subscription_id, resource_group_name=resource_group)
    640 

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/workspace.py in _list_legacy(auth, subscription_id, resource_group_name, ignore_error)
   1373                 return None
   1374             else:
-> 1375                 raise e
   1376 
   1377     @staticmethod

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/workspace.py in _list_legacy(auth, subscription_id, resource_group_name, ignore_error)
   1366             # A list of object of
   1367             # azureml._base_sdk_common.workspace.models.workspace.Workspace
-> 1368             workspace_autorest_list = _commands.list_workspace(
   1369                 auth, subscription_id=subscription_id, resource_group_name=resource_group_name)
   1370             return workspace_autorest_list

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/_project/_commands.py in list_workspace(auth, subscription_id, resource_group_name)
    387         if resource_group_name:
    388             list_object = WorkspacesOperations.list_by_resource_group(
--> 389                 auth._get_service_client(AzureMachineLearningWorkspaces, subscription_id).workspaces,
    390                 resource_group_name)
    391             workspace_list = list_object.value

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/authentication.py in _get_service_client(self, client_class, subscription_id, subscription_bound, base_url)
    155         # in the multi-tenant case, which causes confusion.
    156         if subscription_id:
--> 157             all_subscription_list, tenant_id = self._get_all_subscription_ids()
    158             self._check_if_subscription_exists(subscription_id, all_subscription_list, tenant_id)
    159 

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/authentication.py in _get_all_subscription_ids(self)
    498         """
    499         arm_token = self._get_arm_token()
--> 500         return self._get_all_subscription_ids_internal(arm_token)
    501 
    502     def _get_workspace(self, subscription_id, resource_group, name):

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/authentication.py in wrapper(self, *args, **kwargs)
    293                     InteractiveLoginAuthentication(force=True, tenant_id=self._tenant_id)
    294                     # Try one more time
--> 295                     return test_function(self, *args, **kwargs)
    296                 else:
    297                     raise e

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/authentication.py in _get_all_subscription_ids_internal(self, arm_token)
    515     def _get_all_subscription_ids_internal(self, arm_token):
    516         if isinstance(self._ambient_auth, AbstractAuthentication):
--> 517             return self._ambient_auth._get_all_subscription_ids()
    518         else:
    519             from azureml._vendor.azure_cli_core._profile import Profile

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/authentication.py in _get_all_subscription_ids(self)
   1651         arm_token = self._get_arm_token()
   1652         from azureml._base_sdk_common.common import fetch_tenantid_from_aad_token
-> 1653         token_tenant_id = fetch_tenantid_from_aad_token(arm_token)
   1654         return _get_subscription_ids_via_client(msi_auth), token_tenant_id
   1655 

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/_base_sdk_common/common.py in fetch_tenantid_from_aad_token(token)
    114     # We set verify=False, as we don't have keys to verify signature, and we also don't need to
    115     # verify signature, we just need the tenant id.
--> 116     decode_json = jwt.decode(token, verify=False)
    117     return decode_json['tid']
    118 

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/jwt/api_jwt.py in decode(self, jwt, key, algorithms, options, **kwargs)
    117         **kwargs,
    118     ) -> Dict[str, Any]:
--> 119         decoded = self.decode_complete(jwt, key, algorithms, options, **kwargs)
    120         return decoded["payload"]
    121 

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/jwt/api_jwt.py in decode_complete(self, jwt, key, algorithms, options, **kwargs)
     84 
     85         if options["verify_signature"] and not algorithms:
---> 86             raise DecodeError(
     87                 'It is required that you pass in a value for the "algorithms" argument when calling decode().'
     88             )

DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode().

ImportError: cannot import name '_DATASET_OUTPUT_ARGUMENT_TEMPLATE'

In the notebook 02_AutoML_Training cell "Build many model training steps" section running cell results in ImportError: cannot import name '_DATASET_OUTPUT_ARGUMENT_TEMPLATE'

from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder

train_steps = AutoMLPipelineBuilder.get_many_models_train_steps(experiment=experiment,
                                                                automl_settings=automl_settings,
                                                                train_data=filedst_10_models_input,
                                                                compute_target=compute,
                                                                partition_column_names=partition_column_names,
                                                                node_count=2,
                                                                process_count_per_node=8,
                                                                run_invocation_timeout=3700,
                                                                output_datastore=dstore)

Full error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-16-5cb613553b0a> in <module>
----> 1 from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder
      2 
      3 train_steps = AutoMLPipelineBuilder.get_many_models_train_steps(experiment=experiment,
      4                                                                 automl_settings=automl_settings,
      5                                                                 train_data=filedst_10_models_input,

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/automl/pipeline/steps/__init__.py in <module>
      6 
      7 """
----> 8 from .automl_pipeline_builder import AutoMLPipelineBuilder
      9 
     10 __all__ = ["AutoMLPipelineBuilder"]

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/automl/pipeline/steps/automl_pipeline_builder.py in <module>
     19 from azureml.core import ComputeTarget, Datastore, Environment, Experiment
     20 from azureml.data.data_reference import DataReference
---> 21 from azureml.pipeline.core import PipelineData, PipelineRun, PipelineStep
     22 from azureml.pipeline.steps import ParallelRunConfig, ParallelRunStep
     23 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/__init__.py in <module>
     22 service?](https://docs.microsoft.com/azure/machine-learning/concept-ml-pipelines)
     23 """
---> 24 from .builder import PipelineStep, PipelineData, StepSequence
     25 from .pipeline import Pipeline
     26 from .graph import PublishedPipeline, PortDataReference, OutputPortBinding, InputPortBinding, TrainingOutput

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/builder.py in <module>
     12 from azureml.data._dataset import _Dataset
     13 from azureml.data.abstract_dataset import AbstractDataset
---> 14 from azureml.data.constants import _DATASET_ARGUMENT_TEMPLATE, _DATASET_OUTPUT_ARGUMENT_TEMPLATE
     15 from azureml.data.dataset_consumption_config import DatasetConsumptionConfig
     16 from azureml.data.output_dataset_config import OutputDatasetConfig

ImportError: cannot import name '_DATASET_OUTPUT_ARGUMENT_TEMPLATE'

!pip freeze

absl-py==0.11.0
adal==1.2.6
aiohttp==3.7.3
aiohttp-cors==0.7.0
aioredis==1.3.1
alembic==1.4.1
ansiwrap==0.8.4
antlr4-python3-runtime==4.7.2
applicationinsights==0.11.9
argcomplete==1.12.2
argon2-cffi==20.1.0
astor==0.8.1
astroid==2.4.2
astunparse==1.6.3
async-timeout==3.0.1
atari-py==0.2.6
attrs==20.3.0
autokeras==1.0.12
autopep8==1.5.5
azure-appconfiguration==1.1.1
azure-batch==10.0.0
azure-cli==2.19.1
azure-cli-core==2.19.1
azure-cli-telemetry==1.0.6
azure-common==1.1.26
azure-core==1.10.0
azure-cosmos==3.2.0
azure-datalake-store==0.0.51
azure-functions-devops-build==0.0.22
azure-graphrbac==0.61.1
azure-identity==1.2.0
azure-keyvault==1.1.0
azure-keyvault-administration==4.0.0b1
azure-loganalytics==0.1.0
azure-mgmt-advisor==2.0.1
azure-mgmt-apimanagement==0.2.0
azure-mgmt-appconfiguration==1.0.1
azure-mgmt-applicationinsights==0.1.1
azure-mgmt-authorization==0.61.0
azure-mgmt-batch==9.0.0
azure-mgmt-batchai==2.0.0
azure-mgmt-billing==1.0.0
azure-mgmt-botservice==0.3.0
azure-mgmt-cdn==5.2.0
azure-mgmt-cognitiveservices==6.3.0
azure-mgmt-compute==18.2.0
azure-mgmt-consumption==2.0.0
azure-mgmt-containerinstance==1.5.0
azure-mgmt-containerregistry==2.8.0
azure-mgmt-containerservice==9.4.0
azure-mgmt-core==1.2.2
azure-mgmt-cosmosdb==1.0.0
azure-mgmt-databoxedge==0.2.0
azure-mgmt-datalake-analytics==0.2.1
azure-mgmt-datalake-nspkg==3.0.1
azure-mgmt-datalake-store==0.5.0
azure-mgmt-datamigration==4.1.0
azure-mgmt-deploymentmanager==0.2.0
azure-mgmt-devtestlabs==4.0.0
azure-mgmt-dns==2.1.0
azure-mgmt-eventgrid==3.0.0rc7
azure-mgmt-eventhub==4.1.0
azure-mgmt-hdinsight==2.2.0
azure-mgmt-imagebuilder==0.4.0
azure-mgmt-iotcentral==3.0.0
azure-mgmt-iothub==0.12.0
azure-mgmt-iothubprovisioningservices==0.2.0
azure-mgmt-keyvault==2.2.0
azure-mgmt-kusto==0.3.0
azure-mgmt-loganalytics==8.0.0
azure-mgmt-managedservices==1.0.0
azure-mgmt-managementgroups==0.2.0
azure-mgmt-maps==0.1.0
azure-mgmt-marketplaceordering==0.2.1
azure-mgmt-media==2.2.0
azure-mgmt-monitor==2.0.0
azure-mgmt-msi==0.2.0
azure-mgmt-netapp==0.15.0
azure-mgmt-network==17.0.0
azure-mgmt-nspkg==3.0.2
azure-mgmt-policyinsights==0.5.0
azure-mgmt-privatedns==0.1.0
azure-mgmt-rdbms==3.1.0rc1
azure-mgmt-recoveryservices==0.4.0
azure-mgmt-recoveryservicesbackup==0.11.0
azure-mgmt-redhatopenshift==0.1.0
azure-mgmt-redis==7.0.0rc2
azure-mgmt-relay==0.1.0
azure-mgmt-reservations==0.6.0
azure-mgmt-resource==10.2.0
azure-mgmt-search==8.0.0
azure-mgmt-security==0.6.0
azure-mgmt-servicebus==0.6.0
azure-mgmt-servicefabric==0.5.0
azure-mgmt-signalr==0.4.0
azure-mgmt-sql==0.26.0
azure-mgmt-sqlvirtualmachine==0.5.0
azure-mgmt-storage==11.2.0
azure-mgmt-synapse==0.6.0
azure-mgmt-trafficmanager==0.51.0
azure-mgmt-web==0.48.0
azure-multiapi-storage==0.5.2
azure-nspkg==3.0.2
azure-storage-blob==12.7.1
azure-storage-common==1.4.2
azure-storage-queue==12.1.5
azure-synapse-accesscontrol==0.2.0
azure-synapse-artifacts==0.3.0
azure-synapse-spark==0.2.0
azureml-accel-models==1.22.0
azureml-automl-core==1.23.0
azureml-automl-runtime==1.23.0
azureml-cli-common==1.22.0
azureml-contrib-automl-pipeline-steps==1.23.0
azureml-contrib-dataset==1.22.0
azureml-contrib-fairness==1.22.0
azureml-contrib-gbdt==1.22.0
azureml-contrib-interpret==1.22.0
azureml-contrib-notebook==1.22.0
azureml-contrib-pipeline-steps==1.22.0
azureml-contrib-reinforcementlearning==1.22.0
azureml-contrib-server==1.22.0
azureml-contrib-services==1.22.0
azureml-core==1.23.0
azureml-datadrift==1.22.0
azureml-dataprep==2.10.1
azureml-dataprep-native==30.0.0
azureml-dataprep-rslex==1.8.0
azureml-dataset-runtime==1.23.0
azureml-defaults==1.23.0
azureml-explain-model==1.22.0
azureml-interpret==1.23.0
azureml-mlflow==1.22.0
azureml-model-management-sdk==1.0.1b6.post1
azureml-opendatasets==1.10.0
azureml-pipeline==1.23.0
azureml-pipeline-core==1.23.0
azureml-pipeline-steps==1.23.0
azureml-samples @ file:///mnt/jupyter-azsamples
azureml-sdk==1.23.0
azureml-telemetry==1.23.0
azureml-tensorboard==1.22.0
azureml-train==1.23.0
azureml-train-automl==1.23.0
azureml-train-automl-client==1.23.0.post1
azureml-train-automl-runtime==1.23.0.post1
azureml-train-core==1.23.0
azureml-train-restclients-hyperdrive==1.23.0
azureml-widgets==1.22.0
backcall==0.2.0
backports.functools-lru-cache==1.6.1
backports.tempfile==1.0
backports.weakref==1.0.post1
bcrypt==3.2.0
beautifulsoup4==4.9.3
bleach==3.3.0
blessings==1.7
blis==0.2.4
bokeh==2.2.3
boto==2.49.0
boto3==1.15.18
botocore==1.18.18
Bottleneck==1.3.2
cached-property==1.5.2
cachetools==4.2.1
certifi==2020.12.5
cffi==1.14.4
chardet==4.0.0
click==7.1.2
cloudpickle==1.6.0
colorama==0.4.4
colorful==0.5.4
configparser==3.7.4
contextlib2==0.6.0.post1
contextvars==2.4
convertdate @ file:///home/conda/feedstock_root/build_artifacts/convertdate_1605102623033/work
coremltools @ git+https://github.com/apple/coremltools@13c064ed99ab1da7abea0196e4ddf663ede48aad
cryptography==3.2
cycler==0.10.0
cymem==2.0.5
Cython @ file:///tmp/build/80754af9/cython_1594831565616/work
databricks-cli==0.14.1
dataclasses==0.8
decorator==4.4.2
defusedxml==0.6.0
dill==0.3.3
distro==1.5.0
dm-tree==0.1.5
docker==4.4.1
dotnetcore2==2.1.20
en-core-web-sm @ https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
encrypted-inference==0.9
entrypoints==0.3
enum34==1.1.10
fabric==2.6.0
fairlearn==0.4.6
fastai==1.0.61
fastprogress==1.0.0
fbprophet==0.5
filelock==3.0.12
fire==0.4.0
flake8==3.8.4
Flask==1.0.3
Flask-Cors==3.0.10
flatbuffers==1.12
fusepy==3.0.1
future==0.18.2
gast==0.2.2
gensim==3.8.3
gevent==21.1.2
gitdb==4.0.5
GitPython==3.1.13
google-api-core==1.26.0
google-auth==1.26.1
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
googleapis-common-protos==1.52.0
gpustat==0.6.0
greenlet==1.0.0
grpcio==1.35.0
gunicorn==19.9.0
gym==0.18.0
h5py==2.10.0
hiredis==1.1.0
holidays==0.9.11
horovod==0.19.1
humanfriendly==9.1
idna==2.10
idna-ssl==1.1.0
imageio==2.9.0
immutables==0.15
importlib-metadata==3.4.0
interpret-community==0.16.0
interpret-core==0.2.1
invoke==1.5.0
ipykernel==5.4.3
ipython==7.16.1
ipython-genutils==0.2.0
ipywidgets==7.6.3
isodate==0.6.0
isort==5.7.0
itsdangerous==1.1.0
javaproperties==0.5.1
jedi==0.18.0
jeepney==0.6.0
Jinja2==2.11.2
jmespath==0.10.0
joblib==0.14.1
jsmin==2.2.2
json-logging-py==0.2
json5==0.9.5
jsondiff==1.2.0
jsonpickle==1.5.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.11
jupyter-console==6.2.0
jupyter-core==4.7.1
jupyter-server-proxy==1.6.0
jupyterlab==2.1.4
jupyterlab-nvdashboard==0.4.0
jupyterlab-server==1.2.0
jupyterlab-widgets==1.0.0
jupytext==1.6.0
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
keras-tuner==1.0.2
keras2onnx==1.6.0
kiwisolver==1.3.1
knack==0.8.0rc2
lazy-object-proxy==1.4.3
liac-arff==2.5.0
lightgbm==2.3.0
lunardate==0.2.0
lz4==3.1.3
Mako==1.1.4
Markdown==3.3.3
markdown-it-py==0.5.8
MarkupSafe==1.1.1
matplotlib==3.2.1
mccabe==0.6.1
mistune==0.8.4
mkl-fft==1.2.0
mkl-random==1.1.0
mkl-service==2.3.0
mlflow==1.13.1
mock==4.0.3
msal==1.8.0
msal-extensions==0.1.3
msgpack==1.0.2
msrest==0.6.21
msrestazure==0.6.4
multidict==5.1.0
murmurhash==1.0.5
nbconvert==5.6.1
nbformat==5.1.2
ndg-httpsclient==0.5.1
networkx==2.5
nimbusml==1.8.0
notebook==6.2.0
numexpr==2.7.2
numpy==1.18.5
nvidia-ml-py3==7.352.0
oauthlib==3.1.0
olefile==0.46
onnx==1.7.0
onnxconverter-common==1.6.0
onnxmltools==1.4.1
onnxruntime==1.3.0
opencensus==0.7.12
opencensus-context==0.1.2
opencv-python==4.5.1.48
opencv-python-headless==4.3.0.36
opt-einsum==3.3.0
packaging==20.9
pandas==0.25.3
pandas-ml==0.6.1
pandocfilters==1.4.3
papermill==1.2.1
paramiko==2.7.2
parso==0.8.1
pathlib2==2.3.5
pathspec==0.8.1
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.1.0
pkginfo==1.7.0
plac==0.9.6
pluggy==0.13.1
pmdarima==1.1.1
portalocker==1.7.1
preshed==2.0.1
prometheus-client==0.9.0
prometheus-flask-exporter==0.18.1
prompt-toolkit==3.0.14
protobuf==3.14.0
psutil==5.8.0
psycopg2==2.8.4
ptyprocess==0.7.0
py-cpuinfo==5.0.0
py-spy==0.3.4
py4j==0.10.9
pyarrow==1.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0.0
pycodestyle==2.6.0
pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work
pydocstyle==5.1.1
pyflakes==2.2.0
pyglet==1.5.0
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1610742651773/work
PyJWT==1.7.1
pylint==2.6.0
PyMeeus @ file:///home/conda/feedstock_root/build_artifacts/pymeeus_1589222711601/work
PyNaCl==1.4.0
pynvml==8.0.4
pyodbc===4.0.0-unsupported
pyOpenSSL==20.0.1
pyparsing==2.4.7
pyrsistent==0.17.3
pyspark==3.0.1
pystan==2.19.0.0
python-dateutil==2.8.1
python-editor==1.0.4
python-jsonrpc-server==0.4.0
python-language-server==0.35.0
pytorch-transformers==1.0.0
pytz==2021.1
PyWavelets==1.1.1
PyYAML==5.4.1
pyzmq==22.0.2
qtconsole==5.0.2
QtPy==1.9.0
QuantLib==1.21
querystring-parser==1.2.4
ray==1.2.0
redis==3.5.3
regex==2020.11.13
requests==2.25.1
requests-oauthlib==1.3.0
rope==0.18.0
rsa==4.7
ruamel.yaml==0.16.12
ruamel.yaml.clib==0.2.2
s3transfer==0.3.4
sacremoses==0.0.43
scikit-image==0.17.2
scikit-learn==0.22.2.post1
scipy==1.4.1
scp==0.13.3
scrapbook==0.5.0
SecretStorage==3.3.0
Send2Trash==1.5.0
sentencepiece==0.1.95
setuptools-git==1.2
shap==0.34.0
simpervisor==0.4
sip==4.19.24
six==1.15.0
skl2onnx==1.4.9
sklearn==0.0
sklearn-pandas==1.7.0
smart-open==1.9.0
smmap==3.0.5
snowballstemmer==2.1.0
soupsieve==2.2
spacy==2.1.8
SQLAlchemy==1.3.23
sqlparse==0.4.1
srsly==1.0.5
sshtunnel==0.1.5
statsmodels==0.10.2
style==1.1.0
tabulate==0.8.7
tenacity==6.3.1
tensorboard==2.1.1
tensorboard-plugin-wit==1.8.0
tensorboardX==2.1
tensorflow==2.1.0
tensorflow-estimator==2.1.0
tensorflow-gpu==2.1.0
termcolor==1.1.0
terminado==0.9.2
terminaltables==3.1.0
testpath==0.4.4
textwrap3==0.9.2
thinc==7.0.8
threadpoolctl @ file:///tmp/tmp9twdgx9k/threadpoolctl-2.1.0-py3-none-any.whl
tifffile==2020.9.3
tokenizers==0.10.1
toml==0.10.2
torch==1.6.0
torchvision==0.7.0
tornado==6.1
tqdm==4.56.0
traitlets==4.3.3
transformers==4.3.2
typed-ast==1.4.2
typing-extensions==3.7.4.3
ujson==4.0.2
update==0.0.1
urllib3==1.25.11
vsts==0.1.25
vsts-cd-manager==1.0.2
waitress==1.4.4
wasabi==0.8.2
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1600965781394/work
webencodings==0.5.1
websocket-client==0.57.0
websockets==8.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wrapt==1.12.1
xgboost==1.3.3
xmltodict==0.12.0
yapf==0.30.0
yarl==1.6.3
zipp==3.4.0
zope.event==4.5.0
zope.interface==5.2.0

Using OutputFileDatasetConfig with a ParallelRunStep

We're trying to use a ParallelRunStep for data preprocessing and were wondering if it's possible to use an OutputFileDatasetConfig to register a "dataset of datasets" (a dataset of metadata). Our entry script writes each processed file as parquet to a local directory on the compute cluster (./outputs/data), but the only thing that actually gets written to our blob store using the strategy below is the parallel_run_step.txt. We successfully use this strategy on a different PythonScriptStep but it does not seem to work the same way with a ParallelRunStep. We've also tried with PipelineData instead of OutputFileDatasetConfig.

output_dir = OutputFileDatasetConfig(name="etl_prepped", destination=(DEFAULT_DATASTORE, 'data/etl_prepped'), source='./outputs/data/').register_on_complete('preprocessed_files')

parallel_run_config = ParallelRunConfig(
      source_directory=parent_dir,
      entry_script='etl.py',
      mini_batch_size="1",
      run_invocation_timeout=timeout,
      error_threshold=10,
      output_action="append_row",
      environment=CURATED_ENVIRONMENT,
      process_count_per_node=processes_per_node,
      compute_target=COMPUTE_TARGET,
      node_count=node_count
  )

parallel_run_step = ParallelRunStep(
  name="etl",
  parallel_run_config=parallel_run_config,
  inputs=[small_seeq_cache],
  output=output_dir,
  allow_reuse=False,
)

OjSalesSimulated. Problem opening dataset.

I was trying to follow the following instructions about training many models y Azure Machine Learning:
https://github.com/microsoft/solution-accelerator-many-models

When trying to prepare the data and Pull all of the data inside the dataset, a memory error raises when executing (in Python):

oj_sales_files = OjSalesSimulated.get_file_dataset()

It is a memory exception:

I have tried to execute the code in two machines: one with 14GB and the other with 28 GB of RAM. The result is:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-10-443b3ed80ee7> in <module>
      3 
      4 # Pull all of the data
----> 5 oj_sales_files = OjSalesSimulated.get_file_dataset()
      6 

After further investigation it does not seem a memory problem when profiling the memory of the Compute Instance.

It looks like a parsing problem:

MemoryError Traceback (most recent call last)
in
3
4 # Pull all of the data
----> 5 oj_sales_files = OjSalesSimulated.get_file_dataset()
6
7 # Pull only the first dataset_maxfiles files

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/opendatasets/_oj_sales_simulated.py in get_file_dataset(cls, enable_telemetry)
33 open_datasets = cls._get_open_dataset(
34 enable_telemetry=enable_telemetry)
---> 35 return open_datasets.get_file_dataset()

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/opendatasets/accessories/open_dataset_base.py in _instance_get_file_dataset(self, start_date, end_date, enable_telemetry, **kwargs)
289 if DefaultArgKey.ENDDATE.value in kwargs:
290 kwargs.pop(DefaultArgKey.ENDDATE.value)
--> 291 return self.class.get_file_dataset(start_date, end_date, enable_telemetry, **kwargs)
292
293 def _to_spark_dataframe(self):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/opendatasets/accessories/_loggerfactory.py in wrapper(*args, **kwargs)
138 with _LoggerFactory.track_activity(logger, func.name, activity_type, custom_dimensions) as al:
139 try:
--> 140 return func(*args, **kwargs)
141 except Exception as e:
142 al.activity_info['error_message'] = str(e)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/opendatasets/accessories/open_dataset_base.py in get_file_dataset(cls, start_date, end_date, enable_telemetry, **kwargs)
266 if end_date:
267 kwargs[DefaultArgKey.ENDDATE.value] = end_date
--> 268 return cls._blob_accessor.get_file_dataset(**kwargs)
269
270 def _instance_get_file_dataset(

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/opendatasets/dataaccess/_blob_accessor.py in get_file_dataset(self, **kwargs)
151 properties["opendatasets"] = self.id
152 ds = FileDataset._create(self.get_file_dataflow(
--> 153 **kwargs), properties=properties)
154 ds._telemetry_info = _DatasetTelemetryInfo(
155 entry_point='PythonSDK:OpenDataset')

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/opendatasets/dataaccess/_blob_accessor.py in get_file_dataflow(self, **kwargs)
158 def get_file_dataflow(self, **kwargs) -> dprep.Dataflow:
159 self._check_dataprep()
--> 160 dflow = dprep.Dataflow.get_files(self.get_urls(**kwargs))
161 dflow = self._filter_file_dataflow(dflow, **kwargs)
162 # skip for now and wait for DataPrep Official release to studio

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/dataflow.py in get_files(path)
2396 """
2397 Expands the path specified by reading globs and files in folders and outputs one record per file found.
-> 2398
2399 :param path: The path or paths to expand.
2400 :return: A new Dataflow.

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/dataflow.py in _path_to_get_files_block(path, archive_options)
2469
2470 self._set_values_to_find(replace_dict, find)
-> 2471
2472 if replace_with is None:
2473 replace_dict['replaceWithType'] = FieldType.NULL

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/dataflow.py in _get_files(path, archive_options)
2490 error_replace_with = str(error_replace_with) if error_replace_with is not None else None
2491 return self.add_step('Microsoft.DPrep.ReplaceBlock', {
-> 2492 'columns': column_selection_to_selector_value(columns),
2493 'valueToFindType': replace_dict['valueToFindType'],
2494 'stringValueToFind': replace_dict['stringValueToFind'],

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/engineapi/api.py in get_engine_api()
17 if not _engine_api:
18 _engine_api = EngineAPI()
---> 19
20 from .._dataset_resolver import register_dataset_resolver
21 register_dataset_resolver(_engine_api.requests_channel)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/engineapi/api.py in init(self)
118 return typedefinitions.ExecuteInspectorCommonResponse.from_pod(response) if response is not None else None
119
--> 120 @update_aml_env_vars(get_engine_api)
121 def execute_inspectors(self, message_args: List[typedefinitions.ExecuteInspectorsMessageArguments], cancellation_token: CancellationToken = None) -> Dict[str, typedefinitions.ExecuteInspectorCommonResponse]:
122 response = self._message_channel.send_message('Engine.ExecuteInspectors', message_args, cancellation_token)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/engineapi/api.py in connect_to_requests_channel()
105 @update_aml_env_vars(get_engine_api)
106 def execute_anonymous_activity(self, message_args: typedefinitions.ExecuteAnonymousActivityMessageArguments, cancellation_token: CancellationToken = None) -> None:
--> 107 response = self._message_channel.send_message('Engine.ExecuteActivity', message_args, cancellation_token)
108 return response
109

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/_aml_helper.py in wrapper(op_code, message, cancellation_token)
36 if len(changed) > 0:
37 engine_api_func().update_environment_variable(changed)
---> 38 return send_message_func(op_code, message, cancellation_token)
39
40 return wrapper

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/engineapi/api.py in sync_host_secret(self, message_args, cancellation_token)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/engineapi/engine.py in send_message(self, op_code, message, cancellation_token)
273 self._process = self._process_opener()
274 self._renew_response_thread()
--> 275 self._renew_wait_thread()
276 _LoggerFactory.trace(log, 'MultiThreadMessageChannel_create_engine', { 'engine_pid': self._process.pid } )
277 with self._messages_lock:

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/engineapi/engine.py in process_responses()
221 self._responses_thread = Thread(target=process_responses, daemon=True)
222 self._responses_thread.start()
--> 223
224 def on_relaunch(self, callback: Callable[[], None]):
225 self._relaunch_callback = callback

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/dataprep/api/engineapi/engine.py in _read_response(self, caller)
146 parsed = json.loads(string)
147 finally:
--> 148 if parsed is None: # Exception is being thrown
149 print('Line read from engine could not be parsed as JSON. Line:')
150 try:

MemoryError: Engine process terminated. This is most likely due to system running out of memory. Please retry with increased memory. |session_id=10efdc4c-45af-4701-ba6c-bbc5ee225681

Trying a workaround:

If according with the documentation with the OjSalesSimulated dataset (https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/open-datasets/dataset-oj-sales-simulated.md), I tried the following:

oj_sales_files = OjSalesSimulated.get_file_dataset(num_files=10)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-47ae2fe8410e> in <module>
      3 
      4 # Pull all of the data
----> 5 oj_sales_files = OjSalesSimulated.get_file_dataset(num_files=10)
      6 
      7 # Pull only the first `dataset_maxfiles` files

TypeError: get_file_dataset() got an unexpected keyword argument 'num_files'

The method does not accept num_files param.

If the library does not support the num_files param what is the hardware recommendation to download the dataset?

Model naming convention is not surfaced in API

The Model registered by the many models accelerator (MMA) is named with a SHA256 hash of the strings of the partition names concatenated with '_'. The code to do this is duplicated in the inference and training scripts and is not surfaced to the client consuming the API. This means a consumer of the MMA has no official/guaranteed way to recreate the name. This is very important because the name or id of the model must be supplied in order to retrieve the model from the registry. (As far as I can see, the only other way to retrieve it is with Model.list and filter down to what is required). For use in production there must be a reliable way to generate the model name.

custom script - parallelrunstep not working

when I try run the example Custom_Script/02_CustomScript_Training_Pipeline.ipynb I cannot create ParallelRunConfig

parallel_run_config = ParallelRunConfig(
    source_directory='./scripts',
    entry_script='train.py',
    mini_batch_size="1",
    run_invocation_timeout=timeout,
    error_threshold=10,
    output_action="append_row",
    environment=train_env,
    process_count_per_node=processes_per_node,
    compute_target=compute,
    node_count=node_count)

it gives error:

in /anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/parallel_run_config.py
...
TypeError: __init__() got an unexpected keyword argument 'allowed_failed_count'

I have updated to latest SDK (pipeline):

zureml-pipeline-core==1.22.0
azureml-pipeline-steps==1.22.0

when I downgrade to 1.20.0 it works:

zureml-pipeline-core==1.20.0
azureml-pipeline-steps==1.20.0

so fix is:

!pip install update azureml-pipeline-steps==1.20.0

Issue with 03_AutoML_Forecasting_Pipeline

I've been trying to get the Many Model solution accelerator working with customer data. I cannot get the forecasting pipeline notebook to run without erroring out. Troubleshooting is turning out to be a problem because the error message returned from the run has no useful information in it.

JSON file from the run is attached
many_models_error.txt

I'm essentially stuck using this method. Please advise.

Issue in 02_CustomScript_Training_Pipeline

Hi , while running the cell 22 in 6.4 Visualize Performance across models of the notebook 02_CustomScript_Training_Pipeline.ipynb we are getting the error mentioned below
manymodels-01
manymodels-02
Could you please check into this?

Add a feature to store prediction results to a storage : mysql

My last client there is a requirement to store prediction in a storage so that applications can access it by API and reporting users can see from Power BI. I tested with mysql. But since the data was huge we couldnt use it. But it is good to add a feature to store prediction result in different storages . Start with mysql and then other options like cosmosdb for large data.

Dynamic Train_Run_ID in inference pipeline

Hello,

I am using the solutions accelerator (AutoML) to run a train and inference pipeline that i am hoping to publish.
Is there a way to pass the most recent training run ID as a parameter to the inference pipeline?
Right now as far as i can tell i have to register the pipeline with a static train_run_id value and cant add a pipeline parameter than i can change dynamically from inside Azure Synapse when the pipeline is called on a daily basis

Fully automate training and forecasting pipeline

Refer to 03_AutoML_Forecasting_Pipeline.ipynb,
We plan to automate Modeling and forecasting pipeline,
but we need to manually input training_pipeline_run_id as in below script.
Could we add any code to get training_pipeline_run_id of the latest run?


from scripts.helper import get_automl_environment
training_pipeline_run_id ="<training_pipeline_run_id_goes_here>"
training_experiment_name = "<training_experiment_name_goes_here>"
forecast_env = get_automl_environment(workspace=ws,
training_pipeline_run_id=training_pipeline_run_id,
training_experiment_name=training_experiment_name)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.