Is that wine good or bad? A beginner tutorial on how to build a binary classification machine learning model with no code using Azure Machine Learning Visual Interface

Azure Tools and Data

Create Resource in Azure

Go to Azure Portal and login or Create an Account
Click "Create resource"
Select "AI + Machine Learning" then "Machine Learning service workspace"
Fill in required fields and select "Review + Create" then select "Create"
It will take a few minutes to create the resources needed for your workspace. Below is a list of all the resources that are created:

Launch Azure Machine Learning Visual Interface

Navigate to your resource group that you created the workspace under
Click the "Machine Learning Service Workspace" resource listed in the resource group
In the left nav click on "Visual Interface"
Then click "Launch visual interface"
This will open a new tab for the Visual interface for Azure Machine Learning Service

We need data!

I used a dataset I found on Kaggle. Kaggle is an online community of data scientists.
Download the dataset from this repo because I have added an additional field (qualityBool) to the dataset.

Wine Dataset from Repo
Kaggle Dataset
Relevant publication: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

Getting data into Azure Machine Learning Visual Interface

There are a few different ways to import data into Visual Interface. You can use the Import Data Module to import data from Azure Blob Storage or a Web URL via HTTP. In this tutorial we are going to upload our data into the "My Datasets"

Select "New" from the bottom left corner of the browser
From the left nav bar Select "Datasets"
Select "Upload from Local file"
Navigate to downloaded data and select it to be uploaded
Update the name and add a description (its helpful to have detailed description once there are lots of datasets uploaded)

Create New Experiment

Select "New" from the bottom left corner of the browser
Select "Blank Experiment"
In the top left hand of the workspace select the experiment name text "Experiment created on xx/xx/xxxx" and edit the name of your experinment.
Go to "My Datasets" to find the data uploaded OR use the import module to import from the github csv link
Drag data module onto workspace

Build the Model

Assign the label attribute to the dataset

We now have created an experiment and have imported the data. Lets build the model. In the left hand nav there are different modules that you can drag and drop onto the workspace to build the model.

Under Data Transformation > Manipulation drag and drop the "Edit Metadata" module onto the workspace
Connect the modules together be clicking and dragging on the circles like a visio diagram.
Click on the "Edit Metadata" and select "Edit Columns" from the right hand side of the workspace
Leave the default configuration and type qualityBool into the textbox and click "Ok"

The First Run of the Experiment

Select "Run" from the button of the workspace
Select "Create new" to create a new compute target
Enter a name for the new compute target
Select "Run"

Select Feature Columns

Under Data Transformation > Manipulation drag and drop the "Select Columns in dataset" module onto the workspace
Connect the modules together be clicking and dragging on the circles like a Visio diagram.
Click on the "Select Columns in dataset" and select "Edit Columns" from the right hand side of the workspace
Select exclude column quality
Select the arrow to move the highlight feature into the "Selected Columns" box and click "Ok".

Visualize the Data

Data visualizations are an important part of the data science process.

To visualize the data, right click on the Edit Metadata module and select "Visualize"
Select each column to see the data visualized on the right side.

Split the Data

When you train the model the standard practice is to split your data to train and score your model. 70% trains the model and 30% scores the model to see how well the training went. Understand that true model accuracy should be tested on unseen data outside of this 30% score. This score gives you an idea of how the model is performing but is not law and sometimes misleading.

In the left nav type "Split Data" in the textbox at the top
Drag and drop the module onto the workspace and connect it to the existing modules
Select the "Split Data" module and change the split from 0.5 to 0.7

Train, Score and Evaluate the Model

Now we have prepared our data by select features, assigning labels, cleaning and preprocessing. Its time to train the model.

There are many different algorithms to choose from when building a model. Many professional data scientists try a few different ones to see which provides a better accuracy score. Here is a cheatsheet for choosing an algorithm. For this model we are going to use a Two-Class Logistic Regression.
Add the following modules to the workspace: Two-Class Logistic Regression, Train Model, Score Model, Evaluate Model
hint: if you have questions about modules or concepts, click on the module and in the lower right corner of the workspace you will see a "more help" link. Click the link to get information about how the module works and help with data science terms
Connect them together as displayed below
Select the Train Model module and click "Edit Columns" in the right side of the workspace
Type qualityBool into the textbox to indicate the dataset label
Run the Experiment

Check Accuracy of Model

We now have a trained model in Azure Machine Learning Visual Interface. Lets visualize our results to see how it performed.

Right click on the button circle of the Evaluate Model module.
Select "Visualize" from the menu that popped up
How to understand metrics for classification models
Our accuracy is ok, but we can probably do better.

Different ways to Improve Accuracy

Evaluate the selected features with data visualizations to see if they are helping or hurting accuracy
Try a different machine learning algorithm
Do you have enought data? Sometimes a low accuracy means you dont have enough data
If the data is noisy it can be hard for the algorithm to read the signal.

Deploy the Web Service

Once the model has an acceptable or "good enough" accuracy its time to deploy your model to a web service.

Click "Create predictive experiment" in the bottom nav of the workspace
Click "Run" on the predictive experiment, select the compute and click "Run"
Now the model you created will show up under "Trained Models" in the left nav of the workspace. This allows you to import trained models into different experiments
Click "Deploy Web Service" in the bottom nav of the workspace
Now we need to create a web service compute target (if you dont already have one)
Click "Create new" and then click the "Go to azure portal link" think will open a new tab and bring you to the azure machine learning workspace resource with compute selected from the left hand nav. Follow the instructions in the pane to create the compute target for the web service.
Once you have created the compute target, click refresh in the corner of the pane to show the newly created compute target
Select the Compute target and click "Deploy"

Test the Web Service

Select the "Web Service" icon on the left nav of the workspace. The web service that was created will be listed.
Click the web service that was created
Here you can test and get the information needed to consume the API created.

You have now created a machine learning model using Azure Machine Learning Visual Interface! 🎉✨

Machine Learning Beginner Gotchas

Selecting what features will give the best accuracy (Feature Enginnering)

In this example we used all the attributes in the datasets as features. When building a model is it important to think about what features help make a decision or a good prediction.
Data visualizations and talking to subject matter experts can help identify what features are best. Additionally it is an iterative process, meaning playing around and using trial and error of what features are going to get the best accuracy.

Is 100% accuracy always good? What overfitting is.

Overfitting a model means you dont have enough data for it to actual "learn" so it will do great on your data but as soon as it put out into the real world. It will fail. This is why you want to always test with unseen data.
If data is very imbalanced meaning you have lots of one label and little of another - this can also create overfitting and models that look like they are performing really well when they are actually not good models. There are different techniques to work with imbalanced data. Such as: Over Sampling, Creating fake data, collection more data, weighted data, dropout and others.

Helpful Links

Machine Learning Visual Interface Overview Doc
Flavors of Machine Learning Doc
MS Learn Intro to Data Science in Azure
Stanford Machine Learning Cheatsheet

Want to see how to build this same model in python?

Here is a link to the notebook included in this repo. If you want to run it I recommend using the Notebook VMs in the Azure Machine Learning Workspace you created in this workshop.

azureadvocatebit / wine-quality-azure-ml-visual-interface-1 Goto Github PK

wine-quality-azure-ml-visual-interface-1's Introduction