danielsc / azureml-workshop-2019 Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 44.0 40.37 MB

AzureML Workshop for the 2019 Euro Tour

License: MIT License

Jupyter Notebook 94.58% Python 4.26% R 1.16%

azureml-workshop-2019's People

Contributors

Stargazers

Watchers

azureml-workshop-2019's Issues

[Design-Issue:SDK] AutoML-DATA GUARDRAILS default behavior inconsistent between LOCAL vs. REMOTE runs

When running a local training with AutoML, by default it outputs DATA GUARDRAILS information.
However, in a remote training with AutoML, by default, it doesn't show any info on DATA GUARDRAILS.

By default behavior and output should be consistent between local and remote AutoML trainings.

Local AutoML Training (It shows DATA GUARDRAILS):

https://github.com/danielsc/azureml-workshop-2019/blob/master/2-training-and-interpretability/2.3-automl-training/local-compute/binayclassification-employee-attrition-autoaml-local-compute.ipynb

Remote AutoML Training: (It does NOT show DATA GUARDRAILS)

https://github.com/danielsc/azureml-workshop-2019/blob/master/2-training-and-interpretability/2.3-automl-training/remote-compute/binayclassification-employee-attrition-autoaml-remote-amlcompute.ipynb

[Bug:ONNX-Runtime-Python-Helper] Scoring with ONNX model is very slow compared to original Scikit-Learn model

Scoring with the exported ONNX model is very slow compared to original Scikit-Learn model:

When doing 294 predictions with the exported ONNX model needs 1.3 secs but with the original Scikit-Learn model it just needs 0.4 sec. So, the exported ONNX model is around 350% slower..

294 predictions from Test dataset:
Time for predictions with Scikit-Learn model: --- **0.48 seconds ---
Time for predictions with ONNX model: --- **1.27 seconds ---

Confirmed by Yunsong Bai that "Currently in the onnx inference helper it uses per record to feed data in the onnxruntime, we used this mode since there was errors found in previous ort version when feeding data in batch."

The onnx inference helper needs to be fixed and use batch mode by default when loading the data.

Broken Links

All of the links except for the 3rd EventGrid tutorial link are broken.

Interpretability Widget failing saying "NameError: name 'model' is not defined"

However the model is created and available..:

Repro Notebooks:
Local training:
https://github.com/danielsc/azureml-workshop-2019/blob/master/2-training-and-interpretability/2.2-aml-interpretability/1-simple-feature-transformations-explain-local.ipynb

Remote Training:
https://github.com/danielsc/azureml-workshop-2019/blob/master/2-training-and-interpretability/2.2-aml-interpretability/2-explain-model-on-amlcompute.ipynb
azureml.contrib.interpret.explanation

Data guardrails need some explanation

It's good to know that I passed a check for high cardinality features, but why do I care? What's the impact on my ability to fit a good model? What action should I take to avoid generating high-quality data in the future?

Min AutoML time is 1 hr

Tutorial says to set to 15min, but min time allowed in UI is 1 hour.

Step 7 here: https://github.com/danielsc/azureml-workshop-2019/blob/master/1-new-workspace/3-automl.md

[PaperCut] Need a terminal from Workspace to run az cli

To run AZ CLI commands for automated pipelines we need a terminal interface - like a terminal UI in visual studio.

[Paper-Cut:UI] Poor usability in AML notebooks: Folders and files management in AML notebooks UI is not intuitive when creating folders or files

In AML notebooks UI, when creating a folder or file, it should be created into the original folder that was previously selected before clicking to "New Folder".
Instead, in the dialog window, the user needs to select "again" the parent folder where the child folder of file is going to be created.

This causes poor usability and in most cases you create the child folder or file in a folder you didn't want and need to re-create it again...

Deploy to ACI with auth in UI is not clear

there's just a toggle for enable auth. UI should call out what happens when it's enabled.
currently user would not know that they should go to Endpoints, find the endpoint, and see that the auth is key based, and find the key.

[Workshop-Update-Issue] Update Pipeline creation with deprecated YAML

Fix in the .MD and notebook:

https://github.com/danielsc/azureml-workshop-2019/blob/master/4-mlops/mlopsworkshop.md

simple_setup_pipeline_notebook.ipynb

[Bug:PaperCut:Perf-Issue] With small datasets, Remote AutoML needs a lot more time per run than Local training

This might be because the size of the training datasets is pretty small and then in remote training it might need to deploy Docker containers for the trainings whereas in local training is straightforward and it just trains in a ready-to-go machine/VM?

If the datasets were large, that time needed for Docker containers might be small in comparison to training times...

But this is a papercut for folks experimenting with small downsampled datasets where the end-to-end training in remote compute is too high due to infrastructure time needed (containers?):

Local Training: Total Time: 5.7 minutes
versus
Remote Training: Total Time: 67 minutes

Basically it is around 5 secs for each child run local training and 1.5 minutes for each remote training.

Local Training: Total Time: 5.7 minutes

01-13-2020-05
classif-automl-local-01-13-2020-05
Running on local machine
Parent Run ID: AutoML_a8a0a27e-6228-481b-bde0-406ec5a6ded0

Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS SUMMARY:
For more details, use API: run.get_guardrails()

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Classes are balanced in the training data.

TYPE:         Missing values imputation
STATUS:       PASSED
DESCRIPTION:  There were no missing values found in the training data.

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.

****************************************************************************************************
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************

 ITERATION   PIPELINE                                       DURATION      METRIC      BEST
         0   MaxAbsScaler SGD                               0:00:04       0.8716    0.8716
         1   MaxAbsScaler SGD                               0:00:05       0.7696    0.8716
         2   MaxAbsScaler ExtremeRandomTrees                0:00:05       0.7220    0.8716
         3   MaxAbsScaler SGD                               0:00:05       0.8801    0.8801
         4   MaxAbsScaler RandomForest                      0:00:05       0.8154    0.8801
         5   MaxAbsScaler SGD                               0:00:05       0.8682    0.8801
         6   MaxAbsScaler RandomForest                      0:00:05       0.7483    0.8801
         7   StandardScalerWrapper RandomForest             0:00:05       0.7228    0.8801
         8   MaxAbsScaler RandomForest                      0:00:06       0.7415    0.8801
         9   MaxAbsScaler ExtremeRandomTrees                0:00:05       0.8478    0.8801
        10   MaxAbsScaler BernoulliNaiveBayes               0:00:05       0.7823    0.8801
        11   StandardScalerWrapper BernoulliNaiveBayes      0:00:05       0.7347    0.8801
        12   MaxAbsScaler BernoulliNaiveBayes               0:00:05       0.7704    0.8801
        13   MaxAbsScaler RandomForest                      0:00:05       0.7152    0.8801
        14   MaxAbsScaler RandomForest                      0:00:05       0.6591    0.8801
        15   MaxAbsScaler SGD                               0:00:05       0.8733    0.8801
        16   MaxAbsScaler ExtremeRandomTrees                0:00:05       0.8503    0.8801
        17   MaxAbsScaler RandomForest                      0:00:05       0.7100    0.8801
        18   StandardScalerWrapper ExtremeRandomTrees       0:00:05       0.7100    0.8801
        19   StandardScalerWrapper ExtremeRandomTrees       0:00:07       0.8478    0.8801
        20   MaxAbsScaler SGD                               0:00:06       0.8478    0.8801
        21   StandardScalerWrapper LightGBM                 0:00:07       0.8656    0.8801
        22   MaxAbsScaler ExtremeRandomTrees                0:00:06       0.8478    0.8801
        23   MaxAbsScaler LightGBM                          0:00:07       0.8741    0.8801
        24   StandardScalerWrapper LightGBM                 0:00:05       0.8665    0.8801
        25   StandardScalerWrapper SGD                      0:00:06       0.8478    0.8801
        26   StandardScalerWrapper LightGBM                 0:00:07       0.8690    0.8801
        27   MaxAbsScaler LightGBM                          0:00:06       0.8554    0.8801
        28   MaxAbsScaler LightGBM                          0:00:06       0.8478    0.8801
        29   SparseNormalizer ExtremeRandomTrees            0:00:06       0.8478    0.8801
        30   VotingEnsemble                                 0:00:15       0.8860    0.8860
        31   StackEnsemble                                  0:00:14       0.8809    0.8860
Stopping criteria reached at iteration 31. Ending experiment.
Manual run timing: --- 341.8196201324463 seconds needed for running the whole LOCAL AutoML Experiment ---

Remote Training: Total Time: 67 minutes

01-13-2020-05
classif-automl-remote-01-13-2020-05
Running on remote compute: cesardl-cpu-clus
Parent Run ID: AutoML_c833d1c3-ce81-43cc-bdaf-a24858744afd

Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************

 ITERATION   PIPELINE                                       DURATION      METRIC      BEST
         0   MaxAbsScaler SGD                               0:01:44       0.8232    0.8232
         1   MaxAbsScaler SGD                               0:01:38       0.7830    0.8232
         2   MaxAbsScaler ExtremeRandomTrees                0:01:36       0.7329    0.8232
         3   MaxAbsScaler SGD                               0:01:37       0.8635    0.8635
         4   MaxAbsScaler RandomForest                      0:01:42       0.7990    0.8635
         5   MaxAbsScaler SGD                               0:01:45       0.8581    0.8635
         6   MaxAbsScaler RandomForest                      0:01:41       0.7444    0.8635
         7   StandardScalerWrapper RandomForest             0:01:43       0.7201    0.8635
         8   MaxAbsScaler RandomForest                      0:01:44       0.7481    0.8635
         9   MaxAbsScaler ExtremeRandomTrees                0:01:43       0.8377    0.8635
        10   MaxAbsScaler BernoulliNaiveBayes               0:01:43       0.7610    0.8635
        11   StandardScalerWrapper BernoulliNaiveBayes      0:01:37       0.7003    0.8635
        12   MaxAbsScaler BernoulliNaiveBayes               0:01:37       0.7466    0.8635
        13   MaxAbsScaler RandomForest                      0:01:45       0.6927    0.8635
        14   MaxAbsScaler RandomForest                      0:01:39       0.6981    0.8635
        15   MaxAbsScaler SGD                               0:01:38       0.8612    0.8635
        16   MaxAbsScaler ExtremeRandomTrees                0:01:47       0.8445    0.8635
        17   MaxAbsScaler RandomForest                      0:01:44       0.7307    0.8635
        18   StandardScalerWrapper ExtremeRandomTrees       0:01:46       0.7186    0.8635
        19   MaxAbsScaler LightGBM                          0:01:48       0.8665    0.8665
        20   StandardScalerWrapper LightGBM                 0:01:40       0.8377    0.8665
        21   StandardScalerWrapper ExtremeRandomTrees       0:01:46       0.8377    0.8665
        22   MaxAbsScaler LightGBM                          0:01:35       0.8612    0.8665
        23   MaxAbsScaler LightGBM                          0:01:40       0.8673    0.8673
        24   TruncatedSVDWrapper LinearSVM                  0:01:44       0.8377    0.8673
        25   StandardScalerWrapper LightGBM                 0:01:44       0.8377    0.8673
        26   StandardScalerWrapper LightGBM                 0:01:44       0.8635    0.8673
        27   StandardScalerWrapper LightGBM                 0:01:38       0.8559    0.8673
        28   SparseNormalizer LightGBM                      0:01:38       0.8543    0.8673
        29   MaxAbsScaler LightGBM                          0:01:34       0.8377    0.8673
        30   StandardScalerWrapper LightGBM                 0:01:43       0.8377    0.8673
        31   StandardScalerWrapper LightGBM                 0:01:42       0.8528    0.8673
        32   StandardScalerWrapper LightGBM                 0:01:41       0.8650    0.8673
        33   StandardScalerWrapper LightGBM                 0:01:44       0.8543    0.8673
        34    VotingEnsemble                                0:02:06       0.8764    0.8764
        35    StackEnsemble                                 0:01:52       0.8703    0.8764
Manual run timing: --- 4020.8364148139954 seconds needed for running the whole Remote AutoML Experiment ---

[Paper-Cut:ACI Deployment] Very slow deploying a single model container into ACI

A container into ACI should only need a matter of seconds. If it needs so much time it is probably because it is re-creating Docker images or because ACI should have the Docker images cached so it won't need to pull the image on every deployment.

It'd be important to have those Docker images cached per model so it won't need to pull the Docker image into ACI for every deployment and deployment to ACI would be a lot faster.

Example of timing of deployment to ACI:

Model deployment to ACI: --- 200.5308485031128 seconds needed to deploy to ACI ---

--> That is around 3.4 minutes for deploying a single Docker container..

[Papercut] Have to click on Run ID to "Deploy Best Model" for AutoML

When I am trying to "Deploy best model" after running AutoML, I click on "AutoML," and then I have to click on the user-unfriendly Run ID hyperlink in order to get a UI that lets me "Deploy best model." If I click on "Experiment name" (more natural to me as a user), I get directed to the Experiment UI, rather than the AutoML UI.

UI link order is wrong

The UI lists Notebooks, AutoML, Designer in that order. This is not alphabetical nor is it easy->advanced.

[Feature:UI] UI Management for Environments

There's no UI Management for Environments. Since Environments are infrastructure assets that can be re-used (in many ways comparable to Model management or Compute management) there should be a UO feature to manage Environments such as importing from Conda, Cloning from curated environments, list Environments in your Workspace, etc.

AutoML UI doesn't have a way to specify ONNX output

There doesn't seem to be a way to request ONNX model to be generated

[Paper-Cut:AML-SDK] No direct way to copy/clone a curated environment into a custom environment in the model registry

Afaik, there's no direct way to copy a curated environment into a custom environment in the model registry.
There should be an easy way, but instead you need to save Conda environments for files, then load from files, etc. That's not intuitive/easy for users.

These are the current needed steps which should be able to do in a single line instead:

# Save curated environment definition to folder (Two files, one for conda_dependencies.yml and another file for azureml_environment.json)
curated_environment.save_to_directory(path="./curated_environment_definition", overwrite=True)

# Create custom Environment from Conda specification file
custom_environment = Environment.from_conda_specification(name="custom-workshop-environment", file_path="./curated_environment_definition/conda_dependencies.yml")

# Save curated environment definition to folder (Two files, one for conda_dependencies.yml and another file for azureml_environment.json)
custom_environment.save_to_directory(path="./custom_environment_definition", overwrite=True)

custom_environment.register(ws)

Something like the following should be possible to do in a single line:

curated_environment.clone("my-custom-environment")

Compute Instance authentication to workspace creation

Every time you create a new compute instance, then call:

ws = Workspace.from_config()

you need to authenticate through the browser by copy/pasting a string:

I would expect to not have to do this given I'm using a compute instance in the workspace.

Also I get the annoying warning pictured above every time I have to do this.

AutoML UI should not start with data/compute setup

AutoML should start with the task/scenarios. As a new/novice users, I'd like to know what is possible before I spend time configuring data and compute. Current experience is not user friendly and potentially wastes customer time.

No way to rename datasets through UI

Confirmed with Dom that the only way to do this is to unregister and create a new one

1st time run automl job is in preparing state forever

when 1st time running an automl job, it takes time to prepare the image and compute. But the UX only shows "preparing". It takes at least 10 mins to get started. User will get confused and dont' know whether they should cancel the job.

[AutoML SDK] Can an AutoML experiment be submitted async from the notebook?

[AutoML SDK] Can an experiment be submitted async from the notebook?

When using the following, it is synchronous, so you canot see the widget info untuil the whole AutoML process is completed:

run = experiment.submit(automl_config, show_output=True)

The following cell cannot be run until AutoML is done:

Is there any way to submit asyn the AutoML process from the notebook?

[Design-Issue:SDK] Why two Estimators (Estimator vs. SKLearn) for running Scikit-Learn models if there's no additional benefit/advantage for any of them?

Afaik, it is almost the same to use the plain Estimator or the SKLearn estimator with no additional benefit or reasons why you should use one or the other. If that's confirmed, why do we have two approaches for doing the same thing? One of the two APIs should be deprecated:

# Using Estimator class
estimator = Estimator(source_directory=project_folder, 
                      script_params=script_params,
                      compute_target=cluster,
                      entry_script='train.py',
                      pip_packages=pip_packages,
                      conda_packages=['scikit-learn==0.20.3'],
                      inputs=[ws.datasets['IBM-Employee-Attrition'].as_named_input('attrition')])


# Using SKLearn estimator class
estimator = SKLearn(source_directory=project_folder, 
                     script_params=script_params,
                     compute_target=cluster,
                     entry_script='train.py',
                     pip_packages=pip_packages,
                     conda_packages=['scikit-learn==0.20.3'],
                     inputs=[aml_dataset.as_named_input('attrition')])

Update instruction: "Notebook VM" does not exist

Hi,

The AML "Notebook VM" has been replaced by "Compute Instances"

The instructions needs to be modified
https://github.com/danielsc/azureml-workshop-2019/blob/master/1-workspace-concepts/1-setup-compute.md

Could not scroll across when creating dataset from local files

AutoML UI says deep learning helps with text, but not true for category values

The UI says "If your data contains text, enabling deep learning could provide higher accuracy":
This is not true for text that is used as category values.

AutoML: Duration does not update regularly

Shown time is misleading and does not update regularly. Should either not show (if unknown until end of job) or keep up to date with current-running-duration.

Suggest to move designer section before automl to avoid the cluster being occupied by automl for hours

Per current workshop sequence, the automl is in section 1-3. After running automl, the training cluster created in this workshop earlier will be occupied for hours. No other UX tutorial can run till either the automl run finished, which takes hours or create a new training cluster, which is not in the workshop instruction.

[AutoML+ONNX] Too many configurations

When trying to get an ONNX model from AutoML, you need to set configurations in 3 places.

AutoML config - enable_onnx_compatible_models=True,
Get best run output - best_run, onnx_mdl = remote_run.get_output (return_onnx_model=True)
Save ONNX model - from azureml.automl.runtime.onnx_convert import OnnxConverter
onnx_fl_path = "./best_model.onnx"
OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)

Ideally this should just be controlled in 1 place, perhaps when getting the model (step 2). Step #1 should go away once we have 100% ONNX support for AutoML models, so for short term it's ok.
It's unclear why step 3 is needed with a separate ONNXConverter. Can this step be merged with #2? The mechanism/convention to save an ONNX model should be the same as saving a non-ONNX model.

(reference notebook: https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb)

CORS error when creating dataset

When attempting to create the IBM attrition dataset, I saw a CORS error.

consistent naming convention

Need to align to a unified naming convention - upper/lower case, special characters, length.

Auto-deployment: Create release

Need to save the new pipeline before creating the release. Update documentation.

training vs. inference cluster, how to separate

Notebooks pane in UI needs links to launch JupyterLab/Jupyter etc

User would not expect that they have to go into Compute tab to launch these.

[Paper-Cut:SDK] Why two classes (Estimator and ScriptRunConfig) for a very similar purpose

This is confusing for users. There are two ways of submitting experiments to AML compute:

Based on Estimator
Based on ScriptRunConfig

Then, both can be used with the HyperDriveConfig class. No clear reasons or why you should use one or the other approach.

Also, no where in the docs I could find when one or the other should be used or which one is better under what circumstances.

If both are very similar, we should probably deprecate one of them so the usage path is simplified for the user. Having multiple ways of doing something without clear reasons is very confusing for the users.

Compute setup screenshot showing min_node=1

The description tells users to set min_node to 0, but screenshot shows 1. This needs to be updated.

Child runs not displaying hierarchy of runs

When selecting "Include child runs" in the "Experiments" section, I see a flat list of all runs, with no sense of the hierarchy in runs. We need to indicate to the user which runs are children of which other runs, either through a nested tree structure or some naming convention that appends the name of the parent run to the name of the child run.

hyperdrive metrics plot cut off, not possible to scroll from left-to-right on hyperdrive run UI

The hyperdrive plot appears cut off in JupyterLab, and it's not possible to scroll from left to right.

Discoverability of Settings Panel in Designer

I need to access the "Settings" panel in order to change the "default compute" in Designer, but I had to ask Meng to find the button (to the right of the experiment name)

Need documentation on how to use Experiment vs. Run

We have documentation on experiments and on runs, but no guidance in our "Concepts" section on how to decide whether something the user is doing is an experiment or a run.

UI bug in azureml.widgets RunDetails UI showing non-existing metric visualization errors

Bug in azureml.widgets RunDetails UI showing non-existing metric visualization errors.

Running a remote AML compute based on HyperDrive with multiple child runs.
The training process runs okay with no issues and after the parent run I'm able to get the primary metric of the best model with no issues.

But the Widget UI is showing errors in red like the following..:

Repro notebook here:
https://github.com/danielsc/azureml-workshop-2019/blob/master/2-training-and-interpretability/2.1-aml-training-and-hyperdrive/2-scikit-learn-remote-training-on-aml-compute-plus-hyperdrive/binayclassification-employee-attrition-aml-compute-notebook.ipynb

[Paper-cut:AutoML UI] List of parent run shoundlt show the last child run

It is confusing to show the last run along with the parent run:

Looks confusing to show the last child run when the "Show child runs is not selected":
It is disabled, so the last child run shouldn't show up:

4-mlops Deploy Model as Webservice on Azure Container Instance hit name conflict error

for running the workshop. If multiple AML workspace uses the same resource group, the sample code of Mlops will hit a naming conflict error

ML Ops workshop missing steps

We need to modify the instructions for ML Ops:

the azure-pipelines.yml has to be updated. The default arguments for the following parameters need to be changed to correspond to the workspace and workspace connection being used.

 mlWorkspaceConnection: 'build-demo' ,
  mlWorkspaceName: 'build-2019-demo',
  resourceGroupName: 'scottgu-all-hands'

aml_config/train.runconfig needs to be updated so that "target" is the name of the training cluster.

[Paper-Cut:UI] Model Deployment from UI doesn't help on how to consume the model

Once the model is deployed into AML compute, the UI doesn't help much on how to consume it and try the service. It simply provides an URI, but a user who just deployed it won't be able to try it unless he invests time on researching docs from scratch.

This is a clear paper-cut because it is blocking the path for a fast getting started experience.

This is the only info provided so far for consumption:

That "Consume" page section should help on sample code on the following areas and showing client code for Python apps, .NET apps, Java apps, Node-JS apps, etc.:

End-user app consuming the model service end-point (web app, etc.)
- How to load the service object from the URI
- How to consume it with sample data and code similar to the following:

import json
import pandas as pd

# the sample below contains the data for an employee that is not an attrition risk
sample = pd.DataFrame(data=[{'Age': 41, 'BusinessTravel': 'Travel_Rarely', 'DailyRate': 1102, 'Department': 'Sales', 'DistanceFromHome': 1, 'Education': 2, 'EducationField': 'Life Sciences', 'EnvironmentSatisfaction': 2, 'Gender': 'Female', 'HourlyRate': 94, 'JobInvolvement': 3, 'JobLevel': 2, 'JobRole': 'Sales Executive', 'JobSatisfaction': 4, 'MaritalStatus': 'Single', 'MonthlyIncome': 5993, 'MonthlyRate': 19479, 'NumCompaniesWorked': 8, 'OverTime': 'No', 'PercentSalaryHike': 11, 'PerformanceRating': 3, 'RelationshipSatisfaction': 1, 'StockOptionLevel': 0, 'TotalWorkingYears': 8, 'TrainingTimesLastYear': 0, 'WorkLifeBalance': 1, 'YearsAtCompany': 6, 'YearsInCurrentRole': 4, 'YearsSinceLastPromotion': 0, 'YearsWithCurrManager': 5}])

# converts the sample to JSON string
sample = pd.DataFrame.to_json(sample)

# deserializes sample to a python object 
sample = json.loads(sample)

# serializes sample to JSON formatted string as expected by the scoring script
sample = json.dumps({"data":sample})

prediction = service.run(sample)

print(prediction)

Missing instruction: "enter the new user interface in the portal"

in the set-up: https://github.com/danielsc/azureml-workshop-2019/blob/master/1-workspace-concepts/1-setup-compute.md

there is no mention of to enter the new interface. needs to be added to follow the rest

Not clear which column the "Select Columns Transform" is selecting in Designer

When I click on the box for "Select Columns Transform," it's not clear to me what this transform is doing.

Difficult to view VM size

Difficult to select VMs as I could not see the entire VM Size such as 'Standard_Dv2_v2", no information was shown when hovering over the sizes.

[Paper-Cut: SDK] Estimator class should be able to use a Curated Environment in addition to a Custom Environment

Afaik, the Estimator class needs to always use a Custom Environment, probably because it always creates a Docker Image under the covers?

However, for simplicity's sake, it should be able to use a curated Environment (like you actually can do with the ScriptRunConfig class).

If you try to use a curated environment you get this error:

Error: "Environment name can not start with the prefix AzureML. To alter a curated environment first create a copy of it."

Therefore, you need to copy/clone a curated environment first, which is also not straightforward and needs the following code:

# Get the curated environment
curated_environment = Environment.get(workspace=ws, name="AzureML-Tutorial")

# Save curated environment definition to folder (Two files, one for conda_dependencies.yml and another file for azureml_environment.json)
curated_environment.save_to_directory(path="./curated_environment_definition", overwrite=True)

# Create custom Environment from Conda specification file
custom_environment = Environment.from_conda_specification(name="custom-workshop-environment", file_path="./curated_environment_definition/conda_dependencies.yml")

# Save curated environment definition to folder (Two files, one for conda_dependencies.yml and another file for azureml_environment.json)
custom_environment.save_to_directory(path="./custom_environment_definition", overwrite=True)

custom_environment.register(ws) 

estimator = Estimator(source_directory=project_folder, 
                      script_params=script_params,
                      compute_target=cluster,
                      # use_docker=True, #AML Cluster only supports Docker runs
                      entry_script='train.py',
                      environment_definition= custom_environment,
                      inputs=[ws.datasets['IBM-Employee-Attrition'].as_named_input('attrition')]

If the Estimator could use a curated environment like you can do it with the ScriptRunConfig class, you would simply need the following code:

curated_environment = Environment.get(workspace=ws, name="AzureML-Tutorial")

estimator = Estimator(source_directory=project_folder, 
                      script_params=script_params,
                      compute_target=cluster,
                      # use_docker=True, #AML Cluster only supports Docker runs
                      entry_script='train.py',
                      environment_definition= curated_environment,
                      inputs=[ws.datasets['IBM-Employee-Attrition'].as_named_input('attrition')]

[Design-Issue] Why drop_column_names in AutoMLConfig only available for Forecast in AutoMLConfig but not for Regression and Classification?

Why drop_column_names in the class AutoMLConfig is only available for Forecast in AutoML but not for Regression and Classification?

If you try to use it for other ML Task, like Classification or Regression you get an error.

Looks not very consistent especially since that action (drop columns) could also be useful for other ML Tasks not just for Forecast..

Also, in the docs/reference it doesn't say it is exclusive for Forecast, but you only realize when getting an error:

https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py

If something is for exclusive usage on a single scenario or ML Task, it should be said in the docs.

danielsc / azureml-workshop-2019 Goto Github PK

azureml-workshop-2019's People

Contributors

Stargazers

Watchers

Forkers

azureml-workshop-2019's Issues

Recommend Projects

Recommend Topics

Recommend Org