Workshop Resources

Hands-On Guide: https://github.com/IBMDeveloperMEA/AI-Integrity-Improving-AI-models-with-Cortex-Certifai/blob/main/README.md

Slides: https://github.com/IBMDeveloperMEA/AI-Integrity-Improving-AI-models-with-Cortex-Certifai

Workshop Replay: https://www.crowdcast.io/e/integrityinai

This Repo is for the upcoming webinar AI Integrity: Improving AI models with Cortex Certifai - Register for the live stream and access the replay – https://www.crowdcast.io/e/integrityinai

Prerequisites

Sign-up/Login to IBM Cloud - https://ibm.biz/BdfhxH/

If you are an existing user please login to IBM Cloud

And if you are not, don't worry! We have got you covered! There are 3 steps to create your account on IBM Cloud:

Put your email and password.
You get a verification link with the registered email to verify your account.
Fill the personal information fields. ** Please make sure you select the country you are in when asked at any step of the registration process.

Black box AI models explained using Cortex Certifai

Explainability of AI models is a difficult task which is made simpler by Cortex Certifai. It evaluates AI models for robustness, fairness, and explainability, and allows users to compare different models or model versions for these qualities. Certifai can be applied to any black-box model including machine learning models, predictive models and works with a variety of input datasets.

How does Certifai work?

Data Scientists can create model scan definitions, which are comprised of trained models that they want to evaluate for the parameters listed below.

Performance Metric: (e.g. Accuracy)

Robustness: How the model generalizes on new data.

Fairness by group: measures the bias in the data.

Explainability: measures the explanations provided for each model.

Explanations: display the change that must occur in a dataset with given restrictions to obtain a different outcome.

Business decision makers are able to view the evaluation comparison through visualizations and scores to select the best models for business goals and to identify whether or not models meet thresholds for robustness, fairness, and/or explainability. Data Scientists can use the evaluation results for analysis to provide more trustworthy AI models.

This code pattern demonstrates how to use Certifai Toolkit for creating scans to evaluate the performance of multiple predictive models using IBM Watson Studio platform.

Architecture Diagram

Log in to Watson Studio powered by spark, initiate Cloud Object Storage, and create a project.
Upload the .csv data file to Object Storage.
Load the Data File in Watson Studio Notebook.
Install Cortex Certifai Toolkit in the Watson Studio Notebook.
Visualization for explainability and interpretability of AI Model for the three different types of Users.

Included components

IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
IBM Cloud Object Storage: An IBM Cloud service that provides an unstructured cloud data store to build and deliver cost effective apps and services with high reliability and fast speed to market. This code pattern uses Cloud Object Storage.

Featured technologies

Artificial Intelligence: Any system which can mimic cognitive functions that humans associate with the human mind, such as learning and problem solving.
Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
Analytics: Analytics delivers the value of data for the enterprise.
Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.

We can run the scan using Cortex Certifai using Watson Studio and command line interface. This code pattern demonstrates how to run the scan using Watson studio on two different machine learning techniques, Regression & Classification.

Download Certifai Toolkit

Toolkit Edition: You can signup for free use of the Certifai Toolkit on the CognitiveScale website. A download link will be provided in the confirmation email.

Steps using Cortex Certifai on Watson Studio

Create an account with IBM Cloud
Create a new Watson Studio project
Add Data
Create the notebook
Insert the data as dataframe
Run the notebook
Analyze the results

1. Create an account with IBM Cloud

2. Create a new Watson Studio project

Click on New Project and select per below.

Define the project by giving a Name and hit 'Create'.

3. Add Data

Clone this repo Navigate to data/assets and save the file by name german_credit_eval.csv on the disk. The dataset will be available under the Certifai toolkit which was downloaded in the previous step.

Click on Assets and select Browse and add the csv file from your file system.

4. Create the notebook

Open IBM Watson Studio.
Go to the project and click on Add
Click on Create notebook to create a notebook.
Select the From URL tab.
Enter a name for the notebook.
Optionally, enter a description for the notebook.
Enter this Notebook URL : https://github.com/IBM/blackbox-ai-models-explained-using-cortexcertifai/blob/main/notebooks/WS_classifier.ipynb
Select the runtime (8 vCPU and 32GB RAM)
Click the Create button.

After the notebook is imported, click on Not Trusted and select the option as Yes to trust the source of the notebook.

This notebook has been created to demonstrate the steps for building the model using Watson Studio platform. For other usecases, the notebook has to be created from scratch.

5. Insert the data as dataframe

Click on 0010 icon at the top right side which will bring up the data assets tab.

Click on Insert to code dropdown and select the option Insert Pandas Dataframe.

6. Run the notebook

When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:

A blank, this indicates that the cell has never been executed.
A number, this number represents the relative order this code step was executed.
A *, this indicates that the cell is currently executing.

There are several ways to execute the code cells in your notebook:

One cell at a time.
- Select the cell, and then press the Play button in the toolbar.
Batch mode, in sequential order.
- From the Cell menu bar, there are several options available. For example, you can Run All cells in your notebook, or you can Run All Below, that will start executing from the first cell under the currently selected cell, and then continue executing all cells that follow.

7. Analyze the results

After we run all cells in the notebook, the scan results are uploaded onto object storage which can be downloaded by following these steps. Login to IBM Cloud, navigate to Dashboard on the left hand side and click on Storage. Click on the bucket name which is an extension of the project name in Watson Studio and select the scan_results.csv file for downloading it.

Cloud Object Storage

Download results file

View results file

How to run the scan locally using CLI

Create a folder in your local file system, Download this repo into the folder and unzip it.
Please make sure that you have installed Python version 3.6 or higher.
Open a command prompt, CD into the subfolder of notebooks and type Jupyter Notebook. When the notebook is launched, select the notebook by name regressor.ipynb and run all the cells using top down approach.
After we run the cells, the scan is complete and results are stored in the current directory of the notebook under reports folder.
Open a new command prompt, CD into the reports folder and type the command certifai console reports. This will start the flask server and the UI is ready for review.
Launch the UI at http://localhost:8000/ and the scan reports along with comparitive analysis are ready for review and analysis.

Scan results for Classification usecase - Predict Fraud

Comparative analysis

Scan results for Regression usecase - Predict Customer spend

Comparative analysis

Note : The scan results are dependent on the input dataset. If we change the input data, the scan results change accordingly.

As per business requirement, we can choose the best model for production deployment. The scan result files in csv format are also available for review under certifai-scan-results folder.

This code pattern will be very helpful for developers, machine learning engineers, data scientists, architects to compare multiple models and evaluate under different criteria to select the best model as per their requirement. We can also run remote scans from Red Hat Open Shift cluster provided there is a storage allocated from Amazon S3, GCP or Azure.

Troubleshooting

See DEBUGGING.md.

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Check the ASL FAQ link for more details

Thank you,

Sbusiso Mkhombe

Cloud Engineer, Hybrid Cloud Build Team

IBM Technology Sales

[email protected]

ibmdevelopermea / ai-integrity-improving-ai-models-with-cortex-certifai Goto Github PK

ai-integrity-improving-ai-models-with-cortex-certifai's Introduction