microsoft / responsible-ai-toolbox Goto Github PK

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

Home Page: https://responsibleaitoolbox.ai/

License: MIT License

Python 14.21% JavaScript 0.28% TypeScript 82.52% HTML 0.01% Shell 0.01% Jupyter Notebook 2.98%

ui responsible-ai data-science fairness fairness-ml fairness-ai explainable-ai explainable-ml explainability machinelearning

responsible-ai-toolbox's Introduction

Responsible AI Toolbox

Responsible AI is an approach to assessing, developing, and deploying AI systems in a safe, trustworthy, and ethical manner, and take responsible decisions and actions.

Responsible AI Toolbox is a suite of tools providing a collection of model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

The Toolbox consists of three repositories:

Repository	Tools Covered
Responsible-AI-Toolbox Repository (Here)	This repository contains four visualization widgets for model assessment and decision making: 1. Responsible AI dashboard, a single pane of glass bringing together several mature Responsible AI tools from the toolbox for a holistic responsible assessment and debugging of models and making informed business decisions. With this dashboard, you can identify model errors, diagnose why those errors are happening, and mitigate them. Moreover, the causal decision-making capabilities provide actionable insights to your stakeholders and customers. 2. Error Analysis dashboard, for identifying model errors and discovering cohorts of data for which the model underperforms. 3. Interpretability dashboard, for understanding model predictions. This dashboard is powered by InterpretML. 4. Fairness dashboard, for understanding model’s fairness issues using various group-fairness metrics across sensitive features and cohorts. This dashboard is powered by Fairlearn.
Responsible-AI-Toolbox-Mitigations Repository	The Responsible AI Mitigations Library helps AI practitioners explore different measurements and mitigation steps that may be most appropriate when the model underperforms for a given data cohort. The library currently has two modules: 1. DataProcessing, which offers mitigation techniques for improving model performance for specific cohorts. 2. DataBalanceAnalysis, which provides metrics for diagnosing errors that originate from data imbalance either on class labels or feature values. 3. Cohort: provides classes for handling and managing cohorts, which allows the creation of custom pipelines for each cohort in an easy and intuitive interface. The module also provides techniques for learning different decoupled estimators (models) for different cohorts and combining them in a way that optimizes different definitions of group fairness.
Responsible-AI-Tracker Repository	Responsible AI Toolbox Tracker is a JupyterLab extension for managing, tracking, and comparing results of machine learning experiments for model improvement. Using this extension, users can view models, code, and visualization artifacts within the same framework enabling therefore fast model iteration and evaluation processes. Main functionalities include: 1. Managing and linking model improvement artifacts 2. Disaggregated model evaluation and comparisons 3. Integration with the Responsible AI Mitigations library 4. Integration with mlflow
Responsible-AI-Toolbox-GenBit Repository	The Responsible AI Gender Bias (GenBit) Library helps AI practitioners measure gender bias in Natural Language Processing (NLP) datasets. The main goal of GenBit is to analyze your text corpora and compute metrics that give insights into the gender bias present in a corpus.

Introducing Responsible AI dashboard

Responsible AI dashboard is a single pane of glass, enabling you to easily flow through different stages of model debugging and decision-making. This customizable experience can be taken in a multitude of directions, from analyzing the model or data holistically, to conducting a deep dive or comparison on cohorts of interest, to explaining and perturbing model predictions for individual instances, and to informing users on business decisions and actions.

In order to achieve these capabilities, the dashboard integrates together ideas and technologies from several open-source toolkits in the areas of

Error Analysis powered by Error Analysis, which identifies cohorts of data with higher error rate than the overall benchmark. These discrepancies might occur when the system or model underperforms for specific demographic groups or infrequently observed input conditions in the training data.
Fairness Assessment powered by Fairlearn, which identifies which groups of people may be disproportionately negatively impacted by an AI system and in what ways.
Model Interpretability powered by InterpretML, which explains blackbox models, helping users understand their model's global behavior, or the reasons behind individual predictions.
Counterfactual Analysis powered by DiCE, which shows feature-perturbed versions of the same datapoint who would have received a different prediction outcome, e.g., Taylor's loan has been rejected by the model. But they would have received the loan if their income was higher by $10,000.
Causal Analysis powered by EconML, which focuses on answering What If-style questions to apply data-driven decision-making – how would revenue be affected if a corporation pursues a new pricing strategy? Would a new medication improve a patient’s condition, all else equal?
Data Balance powered by Responsible AI, which helps users gain an overall understanding of their data, identify features receiving the positive outcome more than others, and visualize feature distributions.

Responsible AI dashboard is designed to achieve the following goals:

To help further accelerate engineering processes in machine learning by enabling practitioners to design customizable workflows and tailor Responsible AI dashboards that best fit with their model assessment and data-driven decision making scenarios.
To help model developers create end to end and fluid debugging experiences and navigate seamlessly through error identification and diagnosis by using interactive visualizations that identify errors, inspect the data, generate global and local explanations models, and potentially inspect problematic examples.
To help business stakeholders explore causal relationships in the data and take informed decisions in the real world.

This repository contains the Jupyter notebooks with examples to showcase how to use this widget. Get started here.

Installation

Use the following pip command to install the Responsible AI Toolbox.

If running in jupyter, please make sure to restart the jupyter kernel after installing.

pip install raiwidgets

Responsible AI dashboard Customization

The Responsible AI Toolbox’s strength lies in its customizability. It empowers users to design tailored, end-to-end model debugging and decision-making workflows that address their particular needs. Need some inspiration? Here are some examples of how Toolbox components can be put together to analyze scenarios in different ways:

Please note that model overview (including fairness analysis) and data explorer components are activated by default!

Responsible AI Dashboard Flow	Use Case
Model Overview -> Error Analysis -> Data Explorer	To identify model errors and diagnose them by understanding the underlying data distribution
Model Overview -> Fairness Assessment -> Data Explorer	To identify model fairness issues and diagnose them by understanding the underlying data distribution
Model Overview -> Error Analysis -> Counterfactuals Analysis and What-If	To diagnose errors in individual instances with counterfactual analysis (minimum change to lead to a different model prediction)
Model Overview -> Data Explorer -> Data Balance	To understand the root cause of errors and fairness issues introduced via data imbalances or lack of representation of a particular data cohort
Model Overview -> Interpretability	To diagnose model errors through understanding how the model has made its predictions
Data Explorer -> Causal Inference	To distinguish between correlations and causations in the data or decide the best treatments to apply to see a positive outcome
Interpretability -> Causal Inference	To learn whether the factors that model has used for decision making has any causal effect on the real-world outcome.
Data Explorer -> Counterfactuals Analysis and What-If	To address customer questions about what they can do next time to get a different outcome from an AI.
Data Explorer -> Data Balance	To gain an overall understanding of the data, identify features receiving the positive outcome more than others, and visualize feature distributions

Useful Links

Tabular Examples:

Text Examples:

Vision Examples:

Supported Models

This Responsible AI Toolbox API supports models that are trained on datasets in Python numpy.ndarray, pandas.DataFrame, iml.datatypes.DenseData, or scipy.sparse.csr_matrix format.

The explanation functions of Interpret-Community accept both models and pipelines as input as long as the model or pipeline implements a predict or predict_proba function that conforms to the Scikit convention. If not compatible, you can wrap your model's prediction function into a wrapper function that transforms the output into the format that is supported (predict or predict_proba of Scikit), and pass that wrapper function to your selected interpretability techniques.

If a pipeline script is provided, the explanation function assumes that the running pipeline script returns a prediction. The repository also supports models trained via PyTorch, TensorFlow, and Keras deep learning frameworks.

Other Use Cases

Tools within the Responsible AI Toolbox can also be used with AI models offered as APIs by providers such as Azure Cognitive Services. To see example use cases, see the folders below:

Maintainers

responsible-ai-toolbox's People

Stargazers

Watchers

Forkers

imatiach-msft racoltacalin global-localhost global19 global19-atlassian-net jorickvdhoeven mabalija b-kartal sawravchy resslerruntime sd37 terrychang1015 hackthecrisis21 praveenjha527 vimal-m i-spark chenxingqiang ceteongvanness ben1009 sidshroff cjrussellfsa codewired tryweirder larauj anujdesai510 say543 elenaxubrown hrisheekeshr iamthatiam777 marrowp1968 dngoins vmax24 xiaofengzhu mahmoodm2 cetinerdem jixing475 raghavendrasri vinayasathyanarayana amitkayal ashutosh2909 tpnguyen almudenaftourne jon-den daoos bhaskarab lidagh alexquach rmasiniexpert szelor vsyrgkanis krzischp brunoscaglione joamps zakimedina sumatutta jenny-altaml landes1977 annielytix ilsevernoij nkzhangheng jasperhg90 python01100100 jonmach asener1 shabadvaswani medwani conradmr94 cerkut 79212 roysourav90 ms-kashyap cagomezt y12uc231 ravishankar-as soniaang radhakrishnang abdelbassetbrahim busenkr ankojubhanuprakash karlotimmerman jplummer01 radovankavicky gapdata tuanthng derekrmiller sky-dust-intelligence adriangonzalezsanchez kkasravi zarmada personal-fork-reps raghavar4u kekedouu yakovkeselman mohammad-ext-nuri geofragkos ezherdeva ferpari imrandevtest anildwarepo lidaghr

responsible-ai-toolbox's Issues

Fairness: "How to read" modal text is wrong

In the case of a ratio-based fairness metric this should be "the closer to 1 the better" (or "the larger the better" given that it can't get larger than 1).

Fairness: Configurable single model view table

We'd like the table in the single model view to be configurable so that we can select additional metrics to show in the table. For example, in addition to the previously selected performance (e.g. accuracy) and fairness metrics (e.g. demographic parity difference) it may make sense to compare other performance metrics (e.g. F1 score) or fairness metrics (e.g. false positive rate difference).

This builds onto #63 as an extension that's nice to have but not essential for getting this to users.

Fairness: table misaligned with chart in raiwidgets

Error Analysis : Feature list in 'retrain' pane not sorted

I know that there's a search option, but I think that lists like this should be sorted:

Export atomic pages

Pages in dashboards should be exportable and then consumable as standalone components.
Currently, they rely on data processing that happens in the main component, this should be extracted to a static method, so exportable page components can have interfaces that are a subset of the main page interface and use its helper functions for processing.

Automate releases to pypi

That's for all python packages in the repo (currently only rai_core_flask)

Fairlearn Dashboard: Title above dropdowns in model comparison view needed

Otherwise it's unclear what the dropdowns are for.

Error Analysis: Legend does not update for the heatmap

Seems to be happening only for the 1-dim case. Not sure:

Azure ML VM environment

The environment detection test for determining if it is an azure-ml VM returns a false positive when the user runs the azureml sdk on a local machine. The sdk sets the os.environ variable we check against.

Fairness: use breadcrumbs for navigation between pages

https://developer.microsoft.com/en-us/fluentui#/controls/web/breadcrumb

This could replace Back/Next buttons and the current top pivot.

Additionally, this would allow us to skip the first three config pages which is nice for the experienced user.

Fairlearn dashboard: V2 dashboard title on intro page is barely visible in dark mode

Fairness: ModelComparisonChart does not show parity metrics for regression

Currently hard-coded to "average" as the metric for regression & probability.

Fairness: selected sensitive feature changes when performance or parity selection changes

It always resets to the initial choice in the setup.

Expected behavior: whenever I change any dropdown selection the other two should keep their current values.

Error Analysis: Move the heatmap to the left so that it is aligned with the legend

Low priority

Fairness: surface errors properly as opposed to showing spinner

Example: balanced accuracy as performance metric, where one group has only a single class (say, all labels = 0, none with 1)

Error from python metrics code:
error=Only one class present in y_true. ROC AUC score is not defined in that case.

However, the spinner is shown as opposed to an explanation for why it will never show up.

Fairlearn dashboard: different fonts in V2 dashboard

The fonts sometimes are drastically different across views. We should probably fix that. Examples:

Python widget: use environment_detector

Currently we do this:

# TODO
        # FairnessDashboard._service.env.display(html)
        display(HTML(html))

which should be replaced with the environment_detector

Error Analysis: Label alignment seems to be a bit off in the heatmap.

See below. I'd assume this happens mostly for categorical data.

Fairness: download report should have more information

Currently the download button functionality in Insights.tsx is disabled. @nessamilan 's idea for that functionality:

If I’m thinking from the perspective of a user printing a full report, let’s say, to email someone else or add to a document, It would be nice to include:

model name
data set specs (when applicable)
date report is generated
drop down category label: selection
visualization

Alternatively, if the chart is all some users may want, we could offer the option to download visualization -or- full report (we would need to tweak the call to action UI slightly).

Error Analysis : "Full screen" button remains even in full screen mode

The title says it all. I click 'Full Screen' and a new tab opens.... so it seems odd that the "Full Screen" button is still there.

Error Analysis: Widget crash - Both in full screen or inline in jupyter

Happened in 2 occasions:
1- When clicking the first cell in a 1-dimensional heatmap.
2- When clicking Cohort Info.

This is the last part of the error in console.

Uncaught TypeError: Cannot read property 'label' of undefined
at (index):371456
at Array.map ()
at ErrorCohort.cohortFiltersToString ((index):371453)
at (index):371503
at Array.map ()
at ErrorCohort.cohortCompositeFiltersToString ((index):371501)
at (index):371506
at Array.map ()
at ErrorCohort.cohortCompositeFiltersToString ((index):371501)
at (index):371506
(index):338687 Warning: Can't perform a React state update on an unmounted component. This is a no-op, but it indicates a memory leak in your application. To fix, cancel all subscriptions and asynchronous tasks in the componentWillUnmount method.
in TreeViewRenderer (created by ErrorAnalysisView)

=====================================
This is the full error

Download the React DevTools for a better development experience: https://fb.me/react-devtools
(index):338687 Warning: Using UNSAFE_componentWillReceiveProps in strict mode is not recommended and may indicate bugs in your code. See https://fb.me/react-unsafe-component-lifecycles for details.

Move data fetching code or side effects to componentDidUpdate.
If you're updating state whenever props change, refactor your code to use memoization techniques or move it to static getDerivedStateFromProps. Learn more at: https://fb.me/react-derived-state

Please update the following components: DropdownBase, ResizeGroupBase
printWarning @ (index):338687
(index):338687 Warning: Using UNSAFE_componentWillUpdate in strict mode is not recommended and may indicate bugs in your code. See https://fb.me/react-unsafe-component-lifecycles for details.

Move data fetching code or side effects to componentDidUpdate.

Please update the following components: OverflowSetBase
printWarning @ (index):338687
(index):338687 Warning: findDOMNode is deprecated in StrictMode. findDOMNode was passed an instance of WithResponsiveMode which is inside StrictMode. Instead, add a ref directly to the element you want to reference. Learn more about using refs safely here: https://fb.me/react-strict-mode-find-node
in div (created by DropdownBase)
in DropdownBase (created by WithResponsiveMode)
in WithResponsiveMode (created by StyledWithResponsiveMode)
in StyledWithResponsiveMode (created by commandBarButtonAs)
in commandBarButtonAs (created by OuterWithDefaultRender)
in OuterWithDefaultRender (created by OverflowSetBase)
in div (created by OverflowSetBase)
in div (created by OverflowSetBase)
in OverflowSetBase (created by StyledOverflowSetBase)
in StyledOverflowSetBase (created by ResizeGroupBase)
in div (created by FocusZone)
in FocusZone (created by ResizeGroupBase)
in div (created by ResizeGroupBase)
in div (created by ResizeGroupBase)
in div (created by ResizeGroupBase)
in ResizeGroupBase (created by CommandBarBase)
in CommandBarBase (created by StyledCommandBarBase)
in StyledCommandBarBase (created by MainMenu)
in div (created by MainMenu)
in div (created by MainMenu)
in MainMenu (created by ErrorAnalysisDashboard)
in div (created by ErrorAnalysisDashboard)
in ErrorAnalysisDashboard (created by ErrorAnalysis)
in ErrorAnalysis (created by App)
in App
in StrictMode
printWarning @ (index):338687
(index):364550 Warning: Can't call setState on a component that is not yet mounted. This is a no-op, but it might indicate a bug in your application. Instead, assign to this.state directly or define a state = {}; class property with the desired state in the TreeViewRenderer component.
printWarning @ (index):364550
(index):364550 Warning: Can't call forceUpdate on a component that is not yet mounted. This is a no-op, but it might indicate a bug in your application. Instead, assign to this.state directly or define a state = {}; class property with the desired state in the TreeViewRenderer component.
printWarning @ (index):364550
(index):338687 Warning: Using UNSAFE_componentWillMount in strict mode is not recommended and may indicate bugs in your code. See https://fb.me/react-unsafe-component-lifecycles for details.

Move code with side effects to componentDidMount, and set initial state in the constructor.

Please update the following components: CalloutContentBase, Popup
printWarning @ (index):338687
(index):338687 Warning: Using UNSAFE_componentWillUpdate in strict mode is not recommended and may indicate bugs in your code. See https://fb.me/react-unsafe-component-lifecycles for details.

Move data fetching code or side effects to componentDidUpdate.

Please update the following components: CalloutContentBase
printWarning @ (index):338687
(index):338687 Warning: Using UNSAFE_componentWillReceiveProps in strict mode is not recommended and may indicate bugs in your code. See https://fb.me/react-unsafe-component-lifecycles for details.

Move data fetching code or side effects to componentDidUpdate.
If you're updating state whenever props change, refactor your code to use memoization techniques or move it to static getDerivedStateFromProps. Learn more at: https://fb.me/react-derived-state

Please update the following components: Autofill, ComboBox
printWarning @ (index):338687
(index):364550 Warning: Can't call setState on a component that is not yet mounted. This is a no-op, but it might indicate a bug in your application. Instead, assign to this.state directly or define a state = {}; class property with the desired state in the MatrixArea component.
printWarning @ (index):364550
(index):371456 Uncaught TypeError: Cannot read property 'label' of undefined
at (index):371456
at Array.map ()
at ErrorCohort.cohortFiltersToString ((index):371453)
at (index):371503
at Array.map ()
at ErrorCohort.cohortCompositeFiltersToString ((index):371501)
at (index):371506
at Array.map ()
at ErrorCohort.cohortCompositeFiltersToString ((index):371501)
at (index):371506
(index):338687 Warning: Using UNSAFE_componentWillReceiveProps in strict mode is not recommended and may indicate bugs in your code. See https://fb.me/react-unsafe-component-lifecycles for details.

Move data fetching code or side effects to componentDidUpdate.
If you're updating state whenever props change, refactor your code to use memoization techniques or move it to static getDerivedStateFromProps. Learn more at: https://fb.me/react-derived-state

Please update the following components: FocusTrapZone
printWarning @ (index):338687
(index):358126 The above error occurred in the component:
in PredictionPath (created by CohortInfo)
in div (created by CohortInfo)
in div (created by CohortInfo)
in div (created by PanelBase)
in div (created by PanelBase)
in div (created by PanelBase)
in div (created by FocusTrapZone)
in FocusTrapZone (created by PanelBase)
in div (created by PanelBase)
in div (created by Popup)
in Popup (created by PanelBase)
in div (created by FabricBase)
in FabricBase (created by StyledFabricBase)
in StyledFabricBase (created by LayerBase)
in span (created by LayerBase)
in LayerBase (created by Context.Consumer)
in CustomizedLayer (created by StyledCustomizedLayer)
in StyledCustomizedLayer (created by PanelBase)
in PanelBase (created by StyledPanelBase)
in StyledPanelBase (created by CohortInfo)
in CohortInfo (created by ErrorAnalysisDashboard)
in div (created by FabricBase)
in FabricBase (created by StyledFabricBase)
in StyledFabricBase (created by LayerBase)
in span (created by LayerBase)
in LayerBase (created by Context.Consumer)
in CustomizedLayer (created by StyledCustomizedLayer)
in StyledCustomizedLayer (created by ErrorAnalysisDashboard)
in Customizer (created by ErrorAnalysisDashboard)
in div (created by ErrorAnalysisDashboard)
in div (created by ErrorAnalysisDashboard)
in ErrorAnalysisDashboard (created by ErrorAnalysis)
in ErrorAnalysis (created by App)
in App
in StrictMode

Consider adding an error boundary to your tree to customize error handling behavior.
Visit https://fb.me/react-error-boundaries to learn more about error boundaries.
logCapturedError @ (index):358126
(index):349701 Uncaught TypeError: Cannot read property 'label' of undefined
at (index):371456
at Array.map ()
at ErrorCohort.cohortFiltersToString ((index):371453)
at (index):371503
at Array.map ()
at ErrorCohort.cohortCompositeFiltersToString ((index):371501)
at (index):371506
at Array.map ()
at ErrorCohort.cohortCompositeFiltersToString ((index):371501)
at (index):371506
(index):338687 Warning: Can't perform a React state update on an unmounted component. This is a no-op, but it indicates a memory leak in your application. To fix, cancel all subscriptions and asynchronous tasks in the componentWillUnmount method.
in TreeViewRenderer (created by ErrorAnalysisView)
printWarning @ (index):338687
DevTools failed to load SourceMap: Could not load content for http://localhost:5000/runtime.js.map: HTTP error: status code 404, net::ERR_HTTP_RESPONSE_CODE_FAILURE
DevTools failed to load SourceMap: Could not load content for http://localhost:5000/polyfills.js.map: HTTP error: status code 404, net::ERR_HTTP_RESPONSE_CODE_FAILURE
DevTools failed to load SourceMap: Could not load content for http://localhost:5000/vendor.js.map: HTTP error: status code 404, net::ERR_HTTP_RESPONSE_CODE_FAILURE
DevTools failed to load SourceMap: Could not load content for http://localhost:5000/main.js.map: HTTP error: status code 404, net::ERR_HTTP_RESPONSE_CODE_FAILURE

Upgrade FairlearnDashboard in python package to work with fairlearn version vNext (>0.4.6)

The next version adds several important capabilities that we require including additional metrics. Those capabilities are currently commented out. This issue tracks the work required to uncomment and thereby enable the new metrics.

Missing metrics in v0.4.6:

log loss
F1 score
several more missing but the capabilities in Fairlearn need to be set up as well (e.g. for parity metrics)

Error Analysis : Heatmap a little odd when same feature chosen for each axis

Using the breast cancer set, I can get something like this:

Shouldn't the resultant plot still be a 2D matrix, but with values only on the diagonal?

Fairlearn dashboard: lorem ipsum text shown

Lots of missing disparity metrics in FairlearnDashboard

Currently only 4 shown, and their description is identical to the title which isn't helpful.

These should be identical with the ones from the metrics proposal, and vary dependending on the task (regression, classification, etc.) https://github.com/fairlearn/fairlearn-proposals/blob/master/api/METRICS.md

This also means we need to add descriptions for all the metrics.

For classification we'll have at least around 16 metrics, so the dropdown by itself may not be a good solution long term.

Fairness: precomputed FPR/FNR chart not aligned in the middle

The border between the FPR and FNR should be in the middle with all bars aligned. Only happens in precomputed case.

Extract common components

Any components that could have use across projects should be moved to the common-ui package, so that code is shared. This will require defining interfaces for things that up until now have been defined by concrete classes. (eg. Cohort, dataset)

Error Analysis: Search feature functionality in the feature list is not predictable

The search functionality has some unpredictable behavior. When you search for a feature it unchecks everything in a list. And then when you clear the search box, everything gets checked again.

Fairness: key insights for single model view not defined yet

The insights work only for model comparison right now, because we haven't defined what they should be for a single model. Probably something along the lines of

disparity in <chosen performance metric> is <disparity>. Max value <val> is from group <max group>, min value <val> is from group <min group>
disparity in <chosen parity metric> is <disparity>. Max value in the underlying metric <metric> is <val> from group <max group>, min value <val> is from group <min group>

Error Analysis: Shifting a cohort does not refresh the tree or the heatmap

See error below:

Found array with 0 sample(s) (shape=(0, 6)) while a minimum of 1 is required.

===================

Traceback (most recent call last):
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\raiwidgets\error_analysis_dashboard_input.py", line 398, in debug_ml
diff = self._model.predict(input_data) != true_y
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\utils\metaestimators.py", line 119, in
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\pipeline.py", line 407, in predict
Xt = transform.transform(Xt)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\compose_column_transformer.py", line 604, in transform
Xs = self._fit_transform(X, None, _transform_one, fitted=True)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\compose_column_transformer.py", line 467, in _fit_transform
self._iter(fitted=fitted, replace_strings=True), 1))
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\joblib\parallel.py", line 1048, in call
if self.dispatch_one_batch(iterator):
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\joblib\parallel.py", line 866, in dispatch_one_batch
self._dispatch(tasks)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\joblib\parallel.py", line 784, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\joblib_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\joblib_parallel_backends.py", line 572, in init
self.results = batch()
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\joblib\parallel.py", line 263, in call
for func, args, kwargs in self.items]
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\joblib\parallel.py", line 263, in
for func, args, kwargs in self.items]
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\pipeline.py", line 719, in _transform_one
res = transformer.transform(X)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\pipeline.py", line 549, in _transform
Xt = transform.transform(Xt)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\impute_base.py", line 415, in transform
X = self._validate_input(X, in_fit=False)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\impute_base.py", line 251, in _validate_input
raise ve
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\impute_base.py", line 244, in _validate_input
copy=self.copy)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\base.py", line 420, in _validate_data
X = check_array(X, **check_params)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
File "C:\Users\benushi.REDMOND.conda\envs\ea\lib\site-packages\sklearn\utils\validation.py", line 653, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 6)) while a minimum of 1 is required.

Fairness: styling alignment with Explanation Dashboard

Use consistent fonts, colors, elements (e.g. dropdown)

Add CHANGES.md

We want to track what's changing from version to version. This should be grouped by

educational materials (additions/updates)
new features
breaking changes
bug fixes
other

Examples of how scikit-learn handles this:

release highlight https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_0_23_0.html

Fairness: roc_auc_score failure should be caught

Example: balanced accuracy score (perhaps also ROC AUC score)

{"error":"Only one class present in y_true. ROC AUC score is not defined in that case.","locals":"{'data': {'binVector': [1, 1, 2, 2, 1, 1, 1, 1, 0, 0, 2, 2, 0, 3, 2, 2, 0, 4, 4, 4, 1, 2, 1, 1, 4, 1, 1, 2, 1, 3, 1, 4, 1, 2, 0, 1, 1, 3, 2, 2, 2, 0, 2, 0, 1, 4, 0, 2, 1, 2, 3, 2, 3, 1, 1, 1, 1, 1, 1, 4, 2, 2, 1, 1, 3, 1, 4, 3, 4, 0, 2, 2, 1, 2, 3, 1, 0, 1, 1, 0, 3, 4, 2, 1, 2, 1, 1, 0, 3, 4, 0, 2, 2, 2, 0, 0, 2, 1, 1, 1, 0, 1, 2, 2, 3, 0, 3, 1, 2, 3, 1, 4, 3, 2], 'metricKey': 'balanced_accuracy_score', 'modelIndex': 0, 'true_y': [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1], 'predicted_ys': [[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1], [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1]], 'dataset': [[13.4], [13.21], [14.02], [14.26], [13.03], [11.34], [12.05], [11.7], [7.729], [10.26], [14.69], [14.62], [9.397], [16.84], [14.64], [15.46], [9.042], [20.51], [19.55], [20.94], [11.84], [16.24], [13.47], [11.84], [21.37], [11.93], [10.9], [13.73], [13.4], [18.77], [12.81], [20.57], [13.14], [16.16], [9.738], [12.68], [13.27], [19.0], [14.86], [15.08], [15.13], [8.878], [16.02], [10.32], [13.24], [21.56], [8.671], [14.03], [11.81], [14.54], [19.17], [14.86], [18.45], [11.6], [12.46], [11.47], [11.32], [12.27], [12.18], [21.75], [13.77], [16.07], [11.14], [12.87], [17.6], [12.04], [22.27], [17.46], [20.58], [9.465], [14.42], [16.13], [11.52], [14.6], [18.65], [13.0], [8.571], [13.05], [11.66], [8.734], [16.6], [20.18], [15.78], [12.43], [16.03], [12.32], [13.5], [8.196], [17.19], [20.6], [9.504], [15.12], [14.99], [15.22], [9.405], [9.876], [15.5], [12.89], [11.57], [11.43], [8.888], [12.36], [14.97], [14.05], [18.46], [9.268], [17.91], [12.88], [15.61], [17.42], [12.75], [20.18], [18.31], [15.04]], 'classification_methods': ['accuracy_score', 'balanced_accuracy_score', 'precision_score', 'recall_score', 'f1_score'], 'regression_methods': ['root_mean_squared_error', 'mean_squared_error', 'mean_absolute_error', 'r2_score'], 'probability_methods': ['auc', 'root_mean_squared_error', 'balanced_root_mean_squared_error', 'mean_squared_error', 'mean_absolute_error', 'log_loss'], 'model_names': ['a', 'b']}, 'metric_method': <function roc_auc_score at 0x000002A424AF6678>, 'ex': ValueError('Only one class present in y_true. ROC AUC score is not defined in that case.'), 'sys': <module 'sys' (built-in)>, 'traceback': <module 'traceback' from 'C:\\\\Anaconda3\\\\lib\\\\traceback.py'>, 'exc_type': <class 'ValueError'>, 'exc_value': ValueError('Only one class present in y_true. ROC AUC score is not defined in that case.'), 'exc_traceback': <traceback object at 0x000002A42C7705C8>, 'self': <raiwidgets.fairness_dashboard.FairnessDashboard object at 0x000002A406220D88>}","stacktrace":"['Traceback (most recent call last):\\n', '  File \"c:\\\\git\\\\responsible-ai-core\\\\raiwidgets\\\\raiwidgets\\\\fairness_dashboard.py\", line 123, in fairness_metrics_calculation\\n    sensitive_features=data[\"binVector\"])\\n', '  File \"C:\\\\Anaconda3\\\\lib\\\\site-packages\\\\fairlearn\\\\metrics\\\\_metric_frame.py\", line 151, in __init__\\n    self._by_group = self._compute_by_group(func_dict, y_t, y_p, sf_list, cf_list)\\n', '  File \"C:\\\\Anaconda3\\\\lib\\\\site-packages\\\\fairlearn\\\\metrics\\\\_metric_frame.py\", line 169, in _compute_by_group\\n    return self._compute_dataframe_from_rows(func_dict, y_true, y_pred, rows)\\n', '  File \"C:\\\\Anaconda3\\\\lib\\\\site-packages\\\\fairlearn\\\\metrics\\\\_metric_frame.py\", line 194, in _compute_dataframe_from_rows\\n    curr_metric = func_dict[func_name].evaluate(y_true, y_pred, mask)\\n', '  File \"C:\\\\Anaconda3\\\\lib\\\\site-packages\\\\fairlearn\\\\metrics\\\\_function_container.py\", line 103, in evaluate\\n    return self.func_(y_true[mask], y_pred[mask], **params)\\n', '  File \"C:\\\\Anaconda3\\\\lib\\\\site-packages\\\\sklearn\\\\utils\\\\validation.py\", line 73, in inner_f\\n    return f(**kwargs)\\n', '  File \"C:\\\\Anaconda3\\\\lib\\\\site-packages\\\\sklearn\\\\metrics\\\\_ranking.py\", line 393, in roc_auc_score\\n    sample_weight=sample_weight)\\n', '  File \"C:\\\\Anaconda3\\\\lib\\\\site-packages\\\\sklearn\\\\metrics\\\\_base.py\", line 77, in _average_binary_score\\n    return binary_metric(y_true, y_score, sample_weight=sample_weight)\\n', '  File \"C:\\\\Anaconda3\\\\lib\\\\site-packages\\\\sklearn\\\\metrics\\\\_ranking.py\", line 223, in _binary_roc_auc_score\\n    raise ValueError(\"Only one class present in y_true. ROC AUC score \"\\n', 'ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.\\n']"}

Error Analysis : Little feedback in heatmap when feature 2 is selected first

Going to the heat map, if I select "Feature 2" first, I get something like this:

It would be nice to have some sort of indication that I need to select feature 1 as well before a useful display appears. Or perhaps make the feature 2 control inactive until feature 1 is selected.

Replace user token in GitHub workflow

Currently @KeXu444 's user token is used. This should probably be replaced with a more generic team token or else we all have access to npm through @KeXu444 's account.

Fairness: get started svg gets cut off in raiwidgets

Fairlearn Dashboard: single model view should have several charts

In v1 we had "fairness in accuracy" and "fairness in predictions" charts, which correspond to an over- and underprediction chart for binary classification and a selection rate chart.

In the current v2 there's only an equalized odds chart, and a table for related metrics.

What it should look like:

table with performance and fairness metrics at the top.
dropdown or other selection mechanism to select which chart to show from
- over- and underprediction chart (binary classification)
- selection rate chart (binary classification) to show demographic disparity
- error rate chart (for regression)
- other charts in the future (e.g. calibration)

Adding new charts is not part of this feature, just the general setup to allow for switching between charts.

Fairness Dashboard: table should have the right performance and fairness metrics

It should by default show the chosen performance and fairness metrics. The fairness metric won't have individual values per group since it's an aggregate value. It may be nice to show relevant metrics, though, e.g. selection rate when the fairness metric is demographic parity.

There should be a description for how it calculates these metrics including which groups contributed min and max value.

#65 goes a step further by making the columns configurable but that's out of scope for this initial change.

ACTION REQUIRED: Microsoft needs this private repository to complete compliance info

There are open compliance tasks that need to be reviewed for your responsible-ai-widgets repo.

Action required: 4 compliance tasks

To bring this repository to the standard required for 2021, we require administrators of this and all Microsoft GitHub repositories to complete a small set of tasks within the next 60 days. This is critical work to ensure the compliance and security of your microsoft GitHub organization.

Please take a few minutes to complete the tasks at: https://repos.opensource.microsoft.com/orgs/microsoft/repos/responsible-ai-widgets/compliance

The GitHub AE (GitHub inside Microsoft) migration survey has not been completed for this private repository
No Service Tree mapping has been set for this repo. If this team does not use Service Tree, they can also opt-out of providing Service Tree data in the Compliance tab.
No repository maintainers are set. The Open Source Maintainers are the decision-makers and actionable owners of the repository, irrespective of administrator permission grants on GitHub.
Classification of the repository as production/non-production is missing in the Compliance tab.

You can close this work item once you have completed the compliance tasks, or it will automatically close within a day of taking action.

If you no longer need this repository, it might be quickest to delete the repo, too.

GitHub inside Microsoft program information

More information about GitHub inside Microsoft and the new GitHub AE product can be found at https://aka.ms/gim or by contacting [email protected]

FYI: current admins at Microsoft include @xuke444, @romanlutz, @chnldw, @gregorybchris, @imatiach-msft, @riedgar-ms

Equalized Odds chart for regression doesn't make sense

The legend doesn't match the shown chart.
The title doesn't match the shown chart

Badges for repository

pypi version
code coverage
latest build status

Fairness Dashboard: rename underprediction to false negative rate, and overprediction to false positive rate

Azure ML compute instance doesn't have flask ^1.1.1, so what-if fails

To make post requests to flask requires flask ^1.1.1, lower versions fail when predict is called on the model. Fix in the flask-wrapper, then move interpret-communty to rely on flask-wrapper

Error Analysis : Can't seem to change axis in explanation view

I went to the following page with the breast cancer example, and clicked on the indicated axis label (side note, it wasn't particularly obvious that this was clickable):

Should there be an "OK" or "Apply" button in the pane popping out on the right? I can't seem to get the axis to update.

Fairlearn dashboard: spacing on intro page is off

There should be some space between the number and the text.

Shifting a cohort on the tree map view is not working

I shifted the cohort from all data (2 filters) to all data on the tree map view and the tree view did not get updates. I was expecting for the tree view to show me the root node selected... But tree kept its previous node selection.

Error Analysis : Poor contrast in tree view

Running on the breast cancer sample, I'm getting things like:

That "42/43" is extremely hard to read.

Fairness: points can be in the same spot and unreachable

If two models are in the same spot (as defined by performance and parity metrics) in the multimodel view of the fairness dashboard we can't click on both of them to get the single model view, but rather only the one on top.

Move legacy interpret dashboard

Interpret dashboard has two top-level components, ExpanationDashboard and NewExplanationDashboard. When the new dashboard is determined to be sufficient, the old dashboard and its components should be moved to a legacy folder.

Fairness: more intuitive selection of performance and fairness metric in model comparison view

Long-term we want the dropdowns / selectors to be more intuitive in the model comparison view. Perhaps this could be handled similar to what the Explanationdashboard does, i.e. the selector is at the corresponding axis. This won't work for sensitive features, but for performance metric and fairness metric.

Given that they'll have quite a few entries we may want to make them searchable as well.

Related to #59