danielsc / azureml-workshop-2019 Goto Github PK
View Code? Open in Web Editor NEWAzureML Workshop for the 2019 Euro Tour
License: MIT License
AzureML Workshop for the 2019 Euro Tour
License: MIT License
When running a local training with AutoML, by default it outputs DATA GUARDRAILS information.
However, in a remote training with AutoML, by default, it doesn't show any info on DATA GUARDRAILS.
By default behavior and output should be consistent between local and remote AutoML trainings.
Local AutoML Training (It shows DATA GUARDRAILS):
Remote AutoML Training: (It does NOT show DATA GUARDRAILS)
Scoring with the exported ONNX model is very slow compared to original Scikit-Learn model:
When doing 294 predictions with the exported ONNX model needs 1.3 secs but with the original Scikit-Learn model it just needs 0.4 sec. So, the exported ONNX model is around 350% slower..
294 predictions from Test dataset:
Time for predictions with Scikit-Learn model: --- **0.48 seconds ---
Time for predictions with ONNX model: --- **1.27 seconds ---
Confirmed by Yunsong Bai that "Currently in the onnx inference helper it uses per record to feed data in the onnxruntime, we used this mode since there was errors found in previous ort version when feeding data in batch."
The onnx inference helper needs to be fixed and use batch mode by default when loading the data.
Interpretability Widget failing saying "NameError: name 'model' is not defined"
However the model is created and available..:
Repro Notebooks:
Local training:
https://github.com/danielsc/azureml-workshop-2019/blob/master/2-training-and-interpretability/2.2-aml-interpretability/1-simple-feature-transformations-explain-local.ipynb
Remote Training:
https://github.com/danielsc/azureml-workshop-2019/blob/master/2-training-and-interpretability/2.2-aml-interpretability/2-explain-model-on-amlcompute.ipynb
azureml.contrib.interpret.explanation
Tutorial says to set to 15min, but min time allowed in UI is 1 hour.
Step 7 here: https://github.com/danielsc/azureml-workshop-2019/blob/master/1-new-workspace/3-automl.md
To run AZ CLI commands for automated pipelines we need a terminal interface - like a terminal UI in visual studio.
In AML notebooks UI, when creating a folder or file, it should be created into the original folder that was previously selected before clicking to "New Folder".
Instead, in the dialog window, the user needs to select "again" the parent folder where the child folder of file is going to be created.
This causes poor usability and in most cases you create the child folder or file in a folder you didn't want and need to re-create it again...
there's just a toggle for enable auth. UI should call out what happens when it's enabled.
currently user would not know that they should go to Endpoints, find the endpoint, and see that the auth is key based, and find the key.
Fix in the .MD and notebook:
https://github.com/danielsc/azureml-workshop-2019/blob/master/4-mlops/mlopsworkshop.md
This might be because the size of the training datasets is pretty small and then in remote training it might need to deploy Docker containers for the trainings whereas in local training is straightforward and it just trains in a ready-to-go machine/VM?
If the datasets were large, that time needed for Docker containers might be small in comparison to training times...
But this is a papercut for folks experimenting with small downsampled datasets where the end-to-end training in remote compute is too high due to infrastructure time needed (containers?):
Local Training: Total Time: 5.7 minutes
versus
Remote Training: Total Time: 67 minutes
Basically it is around 5 secs for each child run local training and 1.5 minutes for each remote training.
Local Training: Total Time: 5.7 minutes
01-13-2020-05
classif-automl-local-01-13-2020-05
Running on local machine
Parent Run ID: AutoML_a8a0a27e-6228-481b-bde0-406ec5a6ded0
Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
****************************************************************************************************
DATA GUARDRAILS SUMMARY:
For more details, use API: run.get_guardrails()
TYPE: Class balancing detection
STATUS: PASSED
DESCRIPTION: Classes are balanced in the training data.
TYPE: Missing values imputation
STATUS: PASSED
DESCRIPTION: There were no missing values found in the training data.
TYPE: High cardinality feature detection
STATUS: PASSED
DESCRIPTION: Your inputs were analyzed, and no high cardinality features were detected.
****************************************************************************************************
Current status: ModelSelection. Beginning model selection.
****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************
ITERATION PIPELINE DURATION METRIC BEST
0 MaxAbsScaler SGD 0:00:04 0.8716 0.8716
1 MaxAbsScaler SGD 0:00:05 0.7696 0.8716
2 MaxAbsScaler ExtremeRandomTrees 0:00:05 0.7220 0.8716
3 MaxAbsScaler SGD 0:00:05 0.8801 0.8801
4 MaxAbsScaler RandomForest 0:00:05 0.8154 0.8801
5 MaxAbsScaler SGD 0:00:05 0.8682 0.8801
6 MaxAbsScaler RandomForest 0:00:05 0.7483 0.8801
7 StandardScalerWrapper RandomForest 0:00:05 0.7228 0.8801
8 MaxAbsScaler RandomForest 0:00:06 0.7415 0.8801
9 MaxAbsScaler ExtremeRandomTrees 0:00:05 0.8478 0.8801
10 MaxAbsScaler BernoulliNaiveBayes 0:00:05 0.7823 0.8801
11 StandardScalerWrapper BernoulliNaiveBayes 0:00:05 0.7347 0.8801
12 MaxAbsScaler BernoulliNaiveBayes 0:00:05 0.7704 0.8801
13 MaxAbsScaler RandomForest 0:00:05 0.7152 0.8801
14 MaxAbsScaler RandomForest 0:00:05 0.6591 0.8801
15 MaxAbsScaler SGD 0:00:05 0.8733 0.8801
16 MaxAbsScaler ExtremeRandomTrees 0:00:05 0.8503 0.8801
17 MaxAbsScaler RandomForest 0:00:05 0.7100 0.8801
18 StandardScalerWrapper ExtremeRandomTrees 0:00:05 0.7100 0.8801
19 StandardScalerWrapper ExtremeRandomTrees 0:00:07 0.8478 0.8801
20 MaxAbsScaler SGD 0:00:06 0.8478 0.8801
21 StandardScalerWrapper LightGBM 0:00:07 0.8656 0.8801
22 MaxAbsScaler ExtremeRandomTrees 0:00:06 0.8478 0.8801
23 MaxAbsScaler LightGBM 0:00:07 0.8741 0.8801
24 StandardScalerWrapper LightGBM 0:00:05 0.8665 0.8801
25 StandardScalerWrapper SGD 0:00:06 0.8478 0.8801
26 StandardScalerWrapper LightGBM 0:00:07 0.8690 0.8801
27 MaxAbsScaler LightGBM 0:00:06 0.8554 0.8801
28 MaxAbsScaler LightGBM 0:00:06 0.8478 0.8801
29 SparseNormalizer ExtremeRandomTrees 0:00:06 0.8478 0.8801
30 VotingEnsemble 0:00:15 0.8860 0.8860
31 StackEnsemble 0:00:14 0.8809 0.8860
Stopping criteria reached at iteration 31. Ending experiment.
Manual run timing: --- 341.8196201324463 seconds needed for running the whole LOCAL AutoML Experiment ---
Remote Training: Total Time: 67 minutes
01-13-2020-05
classif-automl-remote-01-13-2020-05
Running on remote compute: cesardl-cpu-clus
Parent Run ID: AutoML_c833d1c3-ce81-43cc-bdaf-a24858744afd
Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: ModelSelection. Beginning model selection.
****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************
ITERATION PIPELINE DURATION METRIC BEST
0 MaxAbsScaler SGD 0:01:44 0.8232 0.8232
1 MaxAbsScaler SGD 0:01:38 0.7830 0.8232
2 MaxAbsScaler ExtremeRandomTrees 0:01:36 0.7329 0.8232
3 MaxAbsScaler SGD 0:01:37 0.8635 0.8635
4 MaxAbsScaler RandomForest 0:01:42 0.7990 0.8635
5 MaxAbsScaler SGD 0:01:45 0.8581 0.8635
6 MaxAbsScaler RandomForest 0:01:41 0.7444 0.8635
7 StandardScalerWrapper RandomForest 0:01:43 0.7201 0.8635
8 MaxAbsScaler RandomForest 0:01:44 0.7481 0.8635
9 MaxAbsScaler ExtremeRandomTrees 0:01:43 0.8377 0.8635
10 MaxAbsScaler BernoulliNaiveBayes 0:01:43 0.7610 0.8635
11 StandardScalerWrapper BernoulliNaiveBayes 0:01:37 0.7003 0.8635
12 MaxAbsScaler BernoulliNaiveBayes 0:01:37 0.7466 0.8635
13 MaxAbsScaler RandomForest 0:01:45 0.6927 0.8635
14 MaxAbsScaler RandomForest 0:01:39 0.6981 0.8635
15 MaxAbsScaler SGD 0:01:38 0.8612 0.8635
16 MaxAbsScaler ExtremeRandomTrees 0:01:47 0.8445 0.8635
17 MaxAbsScaler RandomForest 0:01:44 0.7307 0.8635
18 StandardScalerWrapper ExtremeRandomTrees 0:01:46 0.7186 0.8635
19 MaxAbsScaler LightGBM 0:01:48 0.8665 0.8665
20 StandardScalerWrapper LightGBM 0:01:40 0.8377 0.8665
21 StandardScalerWrapper ExtremeRandomTrees 0:01:46 0.8377 0.8665
22 MaxAbsScaler LightGBM 0:01:35 0.8612 0.8665
23 MaxAbsScaler LightGBM 0:01:40 0.8673 0.8673
24 TruncatedSVDWrapper LinearSVM 0:01:44 0.8377 0.8673
25 StandardScalerWrapper LightGBM 0:01:44 0.8377 0.8673
26 StandardScalerWrapper LightGBM 0:01:44 0.8635 0.8673
27 StandardScalerWrapper LightGBM 0:01:38 0.8559 0.8673
28 SparseNormalizer LightGBM 0:01:38 0.8543 0.8673
29 MaxAbsScaler LightGBM 0:01:34 0.8377 0.8673
30 StandardScalerWrapper LightGBM 0:01:43 0.8377 0.8673
31 StandardScalerWrapper LightGBM 0:01:42 0.8528 0.8673
32 StandardScalerWrapper LightGBM 0:01:41 0.8650 0.8673
33 StandardScalerWrapper LightGBM 0:01:44 0.8543 0.8673
34 VotingEnsemble 0:02:06 0.8764 0.8764
35 StackEnsemble 0:01:52 0.8703 0.8764
Manual run timing: --- 4020.8364148139954 seconds needed for running the whole Remote AutoML Experiment ---
A container into ACI should only need a matter of seconds. If it needs so much time it is probably because it is re-creating Docker images or because ACI should have the Docker images cached so it won't need to pull the image on every deployment.
It'd be important to have those Docker images cached per model so it won't need to pull the Docker image into ACI for every deployment and deployment to ACI would be a lot faster.
Example of timing of deployment to ACI:
Model deployment to ACI: --- 200.5308485031128 seconds needed to deploy to ACI ---
--> That is around 3.4 minutes for deploying a single Docker container..
When I am trying to "Deploy best model" after running AutoML, I click on "AutoML," and then I have to click on the user-unfriendly Run ID hyperlink in order to get a UI that lets me "Deploy best model." If I click on "Experiment name" (more natural to me as a user), I get directed to the Experiment UI, rather than the AutoML UI.
The UI lists Notebooks, AutoML, Designer in that order. This is not alphabetical nor is it easy->advanced.
There's no UI Management for Environments. Since Environments are infrastructure assets that can be re-used (in many ways comparable to Model management or Compute management) there should be a UO feature to manage Environments such as importing from Conda, Cloning from curated environments, list Environments in your Workspace, etc.
There doesn't seem to be a way to request ONNX model to be generated
Afaik, there's no direct way to copy a curated environment into a custom environment in the model registry.
There should be an easy way, but instead you need to save Conda environments for files, then load from files, etc. That's not intuitive/easy for users.
These are the current needed steps which should be able to do in a single line instead:
# Save curated environment definition to folder (Two files, one for conda_dependencies.yml and another file for azureml_environment.json)
curated_environment.save_to_directory(path="./curated_environment_definition", overwrite=True)
# Create custom Environment from Conda specification file
custom_environment = Environment.from_conda_specification(name="custom-workshop-environment", file_path="./curated_environment_definition/conda_dependencies.yml")
# Save curated environment definition to folder (Two files, one for conda_dependencies.yml and another file for azureml_environment.json)
custom_environment.save_to_directory(path="./custom_environment_definition", overwrite=True)
custom_environment.register(ws)
Something like the following should be possible to do in a single line:
curated_environment.clone("my-custom-environment")
Every time you create a new compute instance, then call:
ws = Workspace.from_config()
you need to authenticate through the browser by copy/pasting a string:
I would expect to not have to do this given I'm using a compute instance in the workspace.
Also I get the annoying warning pictured above every time I have to do this.
AutoML should start with the task/scenarios. As a new/novice users, I'd like to know what is possible before I spend time configuring data and compute. Current experience is not user friendly and potentially wastes customer time.
Confirmed with Dom that the only way to do this is to unregister and create a new one
when 1st time running an automl job, it takes time to prepare the image and compute. But the UX only shows "preparing". It takes at least 10 mins to get started. User will get confused and dont' know whether they should cancel the job.
[AutoML SDK] Can an experiment be submitted async from the notebook?
When using the following, it is synchronous, so you canot see the widget info untuil the whole AutoML process is completed:
run = experiment.submit(automl_config, show_output=True)
The following cell cannot be run until AutoML is done:
Is there any way to submit asyn the AutoML process from the notebook?
Afaik, it is almost the same to use the plain Estimator or the SKLearn estimator with no additional benefit or reasons why you should use one or the other. If that's confirmed, why do we have two approaches for doing the same thing? One of the two APIs should be deprecated:
# Using Estimator class
estimator = Estimator(source_directory=project_folder,
script_params=script_params,
compute_target=cluster,
entry_script='train.py',
pip_packages=pip_packages,
conda_packages=['scikit-learn==0.20.3'],
inputs=[ws.datasets['IBM-Employee-Attrition'].as_named_input('attrition')])
# Using SKLearn estimator class
estimator = SKLearn(source_directory=project_folder,
script_params=script_params,
compute_target=cluster,
entry_script='train.py',
pip_packages=pip_packages,
conda_packages=['scikit-learn==0.20.3'],
inputs=[aml_dataset.as_named_input('attrition')])
Hi,
The AML "Notebook VM" has been replaced by "Compute Instances"
The instructions needs to be modified
https://github.com/danielsc/azureml-workshop-2019/blob/master/1-workspace-concepts/1-setup-compute.md
The UI says "If your data contains text, enabling deep learning could provide higher accuracy":
This is not true for text that is used as category values.
Per current workshop sequence, the automl is in section 1-3. After running automl, the training cluster created in this workshop earlier will be occupied for hours. No other UX tutorial can run till either the automl run finished, which takes hours or create a new training cluster, which is not in the workshop instruction.
When trying to get an ONNX model from AutoML, you need to set configurations in 3 places.
Ideally this should just be controlled in 1 place, perhaps when getting the model (step 2). Step #1 should go away once we have 100% ONNX support for AutoML models, so for short term it's ok.
It's unclear why step 3 is needed with a separate ONNXConverter. Can this step be merged with #2? The mechanism/convention to save an ONNX model should be the same as saving a non-ONNX model.
Need to align to a unified naming convention - upper/lower case, special characters, length.
Need to save the new pipeline before creating the release. Update documentation.
User would not expect that they have to go into Compute tab to launch these.
This is confusing for users. There are two ways of submitting experiments to AML compute:
Then, both can be used with the HyperDriveConfig class. No clear reasons or why you should use one or the other approach.
Also, no where in the docs I could find when one or the other should be used or which one is better under what circumstances.
If both are very similar, we should probably deprecate one of them so the usage path is simplified for the user. Having multiple ways of doing something without clear reasons is very confusing for the users.
The description tells users to set min_node to 0, but screenshot shows 1. This needs to be updated.
When selecting "Include child runs" in the "Experiments" section, I see a flat list of all runs, with no sense of the hierarchy in runs. We need to indicate to the user which runs are children of which other runs, either through a nested tree structure or some naming convention that appends the name of the parent run to the name of the child run.
We have documentation on experiments and on runs, but no guidance in our "Concepts" section on how to decide whether something the user is doing is an experiment or a run.
Bug in azureml.widgets RunDetails UI showing non-existing metric visualization errors.
Running a remote AML compute based on HyperDrive with multiple child runs.
The training process runs okay with no issues and after the parent run I'm able to get the primary metric of the best model with no issues.
But the Widget UI is showing errors in red like the following..:
We need to modify the instructions for ML Ops:
mlWorkspaceConnection: 'build-demo' ,
mlWorkspaceName: 'build-2019-demo',
resourceGroupName: 'scottgu-all-hands'
Once the model is deployed into AML compute, the UI doesn't help much on how to consume it and try the service. It simply provides an URI, but a user who just deployed it won't be able to try it unless he invests time on researching docs from scratch.
This is a clear paper-cut because it is blocking the path for a fast getting started experience.
This is the only info provided so far for consumption:
That "Consume" page section should help on sample code on the following areas and showing client code for Python apps, .NET apps, Java apps, Node-JS apps, etc.:
import json
import pandas as pd
# the sample below contains the data for an employee that is not an attrition risk
sample = pd.DataFrame(data=[{'Age': 41, 'BusinessTravel': 'Travel_Rarely', 'DailyRate': 1102, 'Department': 'Sales', 'DistanceFromHome': 1, 'Education': 2, 'EducationField': 'Life Sciences', 'EnvironmentSatisfaction': 2, 'Gender': 'Female', 'HourlyRate': 94, 'JobInvolvement': 3, 'JobLevel': 2, 'JobRole': 'Sales Executive', 'JobSatisfaction': 4, 'MaritalStatus': 'Single', 'MonthlyIncome': 5993, 'MonthlyRate': 19479, 'NumCompaniesWorked': 8, 'OverTime': 'No', 'PercentSalaryHike': 11, 'PerformanceRating': 3, 'RelationshipSatisfaction': 1, 'StockOptionLevel': 0, 'TotalWorkingYears': 8, 'TrainingTimesLastYear': 0, 'WorkLifeBalance': 1, 'YearsAtCompany': 6, 'YearsInCurrentRole': 4, 'YearsSinceLastPromotion': 0, 'YearsWithCurrManager': 5}])
# converts the sample to JSON string
sample = pd.DataFrame.to_json(sample)
# deserializes sample to a python object
sample = json.loads(sample)
# serializes sample to JSON formatted string as expected by the scoring script
sample = json.dumps({"data":sample})
prediction = service.run(sample)
print(prediction)
in the set-up: https://github.com/danielsc/azureml-workshop-2019/blob/master/1-workspace-concepts/1-setup-compute.md
there is no mention of to enter the new interface. needs to be added to follow the rest
Afaik, the Estimator class needs to always use a Custom Environment, probably because it always creates a Docker Image under the covers?
However, for simplicity's sake, it should be able to use a curated Environment (like you actually can do with the ScriptRunConfig class).
If you try to use a curated environment you get this error:
Error: "Environment name can not start with the prefix AzureML. To alter a curated environment first create a copy of it."
Therefore, you need to copy/clone a curated environment first, which is also not straightforward and needs the following code:
# Get the curated environment
curated_environment = Environment.get(workspace=ws, name="AzureML-Tutorial")
# Save curated environment definition to folder (Two files, one for conda_dependencies.yml and another file for azureml_environment.json)
curated_environment.save_to_directory(path="./curated_environment_definition", overwrite=True)
# Create custom Environment from Conda specification file
custom_environment = Environment.from_conda_specification(name="custom-workshop-environment", file_path="./curated_environment_definition/conda_dependencies.yml")
# Save curated environment definition to folder (Two files, one for conda_dependencies.yml and another file for azureml_environment.json)
custom_environment.save_to_directory(path="./custom_environment_definition", overwrite=True)
custom_environment.register(ws)
estimator = Estimator(source_directory=project_folder,
script_params=script_params,
compute_target=cluster,
# use_docker=True, #AML Cluster only supports Docker runs
entry_script='train.py',
environment_definition= custom_environment,
inputs=[ws.datasets['IBM-Employee-Attrition'].as_named_input('attrition')]
If the Estimator could use a curated environment like you can do it with the ScriptRunConfig class, you would simply need the following code:
curated_environment = Environment.get(workspace=ws, name="AzureML-Tutorial")
estimator = Estimator(source_directory=project_folder,
script_params=script_params,
compute_target=cluster,
# use_docker=True, #AML Cluster only supports Docker runs
entry_script='train.py',
environment_definition= curated_environment,
inputs=[ws.datasets['IBM-Employee-Attrition'].as_named_input('attrition')]
Why drop_column_names in the class AutoMLConfig is only available for Forecast in AutoML but not for Regression and Classification?
If you try to use it for other ML Task, like Classification or Regression you get an error.
Looks not very consistent especially since that action (drop columns) could also be useful for other ML Tasks not just for Forecast..
Also, in the docs/reference it doesn't say it is exclusive for Forecast, but you only realize when getting an error:
If something is for exclusive usage on a single scenario or ML Task, it should be said in the docs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.