xplainable / xplainable Goto Github PK

Real-time explainable machine learning for business optimisation

License: GNU Affero General Public License v3.0

Python 100.00%

auto-ml data-analytics data-science explainable-ai explainable-ml machine-learning machine-learning-algorithms predictions python shap

xplainable's Introduction

xplainable

Real-time explainable machine learning for business optimisation

Xplainable makes tabular machine learning transparent, fair, and actionable.

Why Was Xplainable Created?

In machine learning, there has long been a trade-off between accuracy and explainability. This drawback has led to the creation of explainable ML libraries such as Shap and Lime which make estimations of model decision processes. These can be incredibly time-expensive and often present steep learning curves making them challenging to implement effectively in production environments.

To solve this problem, we created xplainable. xplainable presents a suite of novel machine learning algorithms specifically designed to match the performance of popular black box models like XGBoost and LightGBM while providing complete transparency, all in real-time.

Simple Interface

You can interface with xplainable either through a typical Pythonic API, or using a notebook-embedded GUI in your Jupyter Notebook.

Models

Xplainable has each of the fundamental tabular models used in data science teams. They are fast, accurate, and easy to use.

Model	Python API	Jupyter GUI
Regression	✅	✅
Binary Classification	✅	✅
Multi-Class Classification	✅	🔜

Installation

You can install the core features of xplainable with:

pip install xplainable

to use the xplainable gui in a jupyter notebook, install with:

pip install xplainable[gui]

Visit our General Documentation to learn how to get the best out of xplainable
Vist our API Documentation for additional support with using our API

Getting Started

Basic Example

import xplainable as xp
from xplainable.core.models import XClassifier
import pandas as pd
from sklearn.model_selection import train_test_split

# Load data
data = xp.load_dataset('titanic')

X, y = data.drop(columns=['Survived']), data['Survived']

X_train, X_test, y_train, y_test = train_test_split(
     X, y, test_size=0.25, random_state=42)

# Train a model
model = XClassifier()
model.fit(X_train, y_train)

# Explain the model
model.explain()

Features

Xplainable helps to streamline development processes by making model tuning and deployment simpler than you can imagine.

Preprocessing

We built a comprehensive suite of preprocessing transformers for rapid and reproducible data preprocessing.

Feature	Python API	Jupyter GUI
Data Health Checks	✅	✅
Transformers Library	✅	✅
Preprocessing Pipelines	✅	✅
Pipeline Persistance	✅	✅

Using the API

from xplainable.preprocessing.pipeline import XPipeline
from xplainable.preprocessing import transformers as xtf

pipeline = XPipeline()

# Add stages for specific features
pipeline.add_stages([
    {"feature": "age", "transformer": xtf.Clip(lower=18, upper=99)},
    {"feature": "balance", "transformer": xtf.LogTransform()}
])

# add stages on multiple features
pipeline.add_stages([
    {"transformer": xtf.FillMissing({'job': 'mode', 'age': 'mean'})},
    {"transformer": xtf.DropCols(columns=['duration', 'campaign'])}
])

# Fit and transform the data
train_transformed = pipeline.fit_transform(train)

# Apply transformations on new data
test_transformed = pipeline.transform(test)

Using the GUI

pp = xp.Preprocessor()

pp.preprocess(train)

Modelling

Xplainable models can be developed, optimised, and re-optimised using Pythonic APIs or the embedded GUI.

Feature	Python API	Jupyter GUI
Classic Vanilla Data Science APIs	✅	-
AutoML	✅	✅
Hyperparameter Optimisation	✅	✅
Partitioned Models	✅	✅
Rapid Refitting (novel to xplainable)	✅	✅
Model Persistance	✅	✅

Using the API

import xplainable as xp
from xplainable.core.models import XClassifier
from xplainable.core.optimisation.bayesian import XParamOptimiser
from sklearn.model_selection import train_test_split
import pandas as pd

# Load your data
data = xp.load_dataset('titanic')

# note: the data requires preprocessing, so results may be poor
X, y = data.drop('Survived', axis=1), data['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Optimise params
opt = XParamOptimiser(metric='roc-auc')
params = opt.optimise(X_train, y_train)

# Train your model
model = XClassifier(**params)
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)

# Explain the model
model.explain()

Using the GUI

model = xp.classifier(train)

Rapid Refitting

Fine tune your models by refitting model parameters on the fly, even on individual features.

Using the API

new_params = {
            "features": ['Age'],
            "max_depth": 6,
            "min_info_gain": 0.01,
            "min_leaf_size": 0.03,
            "weight": 0.05,
            "power_degree": 1,
            "sigmoid_exponent": 1,
            "x": X_train,
            "y": y_train
}

model.update_feature_params(**new_params)

Using the GUI

Explainability

Models are explainable and real-time, right out of the box, without having to fit surrogate models such as Shap orLime.

Feature	Python API	Jupyter GUI
Global Explainers	✅	✅
Regional Explainers	✅	✅
Local Explainers	✅	✅
Real-time Explainability	✅	✅

model.explain()

Action & Optimisation

We leverage the explainability of our models to provide real-time recommendations on how to optimise predicted outcomes at a local and global level.

Feature
Automated Local Prediction Optimisation	✅
Automated Global Decision Optimisation	🔜

Deployment

Xplainable brings transparency to API deployments, and it's easy. By the time your finger leaves the mouse, your model is on a secure server and ready to go.

Feature	Python API	Xplainable Cloud
< 1 Second API Deployments	✅	✅
Explainability-Enabled API Deployments	✅	✅
A/B Testing	-	🔜
Champion Challenger Models (MAB)	-	🔜

#FairML

We promote fair and ethical use of technology for all machine learning tasks. To help encourage this, we're working on additional bias detection and fairness testing classes to ensure that everything you deploy is safe, fair, and compliant.

Feature	Python API	Xplainable Cloud
Bias Identification	✅	✅
Automated Bias Detection	🔜	🔜
Fairness Testing	🔜	🔜

Xplainable Cloud

This Python package is free and open-source. To add more value to data teams within organisations, we also created Xplainable Cloud that brings your models to a collaborative environment.

import xplainable as xp
import os

xp.initialise(api_key=os.environ['XP_API_KEY'])

Contributors

We'd love to welcome contributors to xplainable to keep driving forward more transparent and actionable machine learning. We're working on our contributor docs at the moment, but if you're interested in contributing, please send us a message at [email protected].

Thanks for trying xplainable!

Made with ❤️ in Australia

xplainable's People

Contributors

Stargazers

Watchers

Forkers

lordventer matthaigh27 bthek1 lashdk

xplainable's Issues

X_Classifier.predict_proba returns array of shape (n_samples, 1) instead of (n_samples, 2).

Description of Issue:
predict_proba method implemented in scikit-learn Estimators and LGBM usually returns returns the predicted probability for each class for each sample. XClassifier deviates from this behaviour returning only probabilities estimates for class 1.

How to reproduce:

from xplainable.core.models import XClassifier
from sklearn.datasets import make_classification
import numpy as np
X, y = make_classification(n_samples=1000,n_features=4,random_state=42,n_classes=2)
X = pd.DataFrame(X,columns=['Feat_'+str(i) for i in range(4)])
y = pd.Series(y)
print(f'Shape of training data {X.shape}')
print(f'Shape of target {y.shape}')

x_model = XClassifier()
x_model.fit(X, y)
print(x_model.predict_proba(X).shape)

Screenshot

Add key attributes to the BasePartition class

Problem Statement

While using the XClassifier and XRegressor classes, it's straightforward to retrieve the feature importances and profile after training using the model.feature_importances and model.profle attributes. However, the PartitionedClassifier class, which contains multiple model objects, currently lacks this functionality.

Proposed Solution

It would be beneficial if we could list the feature importances of all XClassifier and XRegressor objects contained within a PartitionedClassifier instance. Ideally, we would return the metadata for all partitions in a nested dict.

Add .score method to Classifier and Regressor to adhere to standard comparisons

Xplainable Forecasting

Easy to understand Linear Forecasting built into main package

Auto-Blueprint for Data Preprocessing

Current state

Xplainable has a suite of data transformers aimed at simplifying data preprocessing. These stages are generally added to a Pipeline() object. One of the advantages of using xplainable for modelling is, generally, our models require less complex transformations (no need for class balancing or log transforming), which makes it easier to infer what transformations should be applied given the structure of a feature.

What we want to see

An automated "blueprint" that generates what a transformation pipeline should look like for a given dataset. Essentially, an automated-preprocessing tool. You can then pass the blueprint to a pipeline and it will build it.

Vision

This is a rough vision of what it could look like, but be flexible!

from xplainable.preprocessing.auto import AutoBlueprint
from xplainable.preprocessing.pipeline import Pipeline
import pandas as pd

df = pd.read_csv("path/to/data.csv")

X, y = df.drop(columns=['target']), df['target']

blueprint = AutoBlueprint()
stages = blueprint.fit(X, y)

pipeline = Pipeline(stages=stages)
pipeline.fit(X, y)

df_transformed = pipeline.transform(df)

Add NLP explainer to package

Creating a Conda Install for Xplainable Python Package

Summary

We aim to expand the availability of our open-source Python package, currently installable via pip, by making it accessible through Conda. This will enhance the ease of installation and broaden the user base.

Objectives

Create a Conda package for Xplainable.
Submit the package to a widely-used Conda channel (e.g., Anaconda, conda-forge).

Tasks

Research the requirements and guidelines for creating a Conda package.
Adapt the package structure if necessary to meet Conda packaging standards.
Create a recipe for building the Conda package.
Thoroughly test the Conda package to ensure it installs and works correctly.
Submit the package to a Conda channel for review and inclusion.

Considerations

Ensure compatibility with major platforms (Windows, macOS, Linux).
Provide clear installation instructions for users.

Potential Challenges

Handling dependencies that may not be available on Conda.
Maintaining the package for future updates.

Contributions and suggestions are welcome.

MLflow Integration (dependent on Surrogate Modelling)

Dependency

#83

What we want to achieve

We would like to integrate xplainable into the MLflow workflow. This involves the following workflows:

If the model is an xplainable model, the model is logged to MLflow and Xplainable Cloud (with the same experiment_id)
If the model is NOT an xplainable model, a surrogate model is created and logged to MLflow and Xplainable Cloud

Key considerations

This should be as uninvasive as possible. We don't want people to have to change their workflow drastically, rather just import xplainable and automate the explainer creation and logging.

Flexibility in approach

Please run with your own creative freedom here. We would love to discuss different approaches, as long as the disruption to user workflow is minimal.

Vision 1

This idea involves implementing a manual logging step. Behind the scenes it will train an xplainable surrogate model and log to both MLflow and Xplainable Cloud.

import xplainable as xp
import mlflow

# <- Build any model here

with mlflow.start_run():
    signature = infer_signature(X_train, model.predict(X_train))

    # <- other logging here

    mlflow.log_model(
        model=model,
        signature=signature,
        input_example=X_train,
        registered_model_name="model-with-xplainable"
    )

    # This should log to mlflow and to xplainable cloud (if an api key is active)
    xp.mlflow.log_explanation(model.predict, X)

Vision 2

This is similar to vision 1, but without the need to manually log explainers. It's not clear how this would be achieved, but it is an idea worth fleshing out.

import xplainable as xp
import mlflow

# <- Build any model here

xp.mlflow.auto_logging = True

with mlflow.start_run():
    signature = infer_signature(X_train, model.predict(X_train))

    # <- other logging here

    mlflow.log_model(
        model=model,
        signature=signature,
        input_example=X_train,
        registered_model_name="model-with-xplainable"
    ) # < -- This should auto-log to mlflow and to xplainable cloud (if an api key is active)

ENH : Option to explicitly mention categorical columns during fit

Description of Enhancement :
BaseModel during self._fetch_meta() selects categorical based on if the column dtype is 'object'. This is typically acceptable and would cover 95% of use cases but the inbuilt pandas dtype categorical would be missed by this . I propose we add categorical_feature parameter to our fit methods like in LGBM's fit method : categorical_feature (list of str or int, or 'auto', optional (default='auto'))(https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier.fit)

How to reproduce:

from xplainable.core.models import XClassifier
from sklearn.datasets import make_classification
import numpy as np
X, y = make_classification(n_samples=1000,n_features=4,random_state=42,n_classes=2)
X = pd.DataFrame(X,columns=['Feat_'+str(i) for i in range(4)])
y = pd.Series(y)
X['Feat_0'] = X['Feat_0'].astype('category')
print(list(X.select_dtypes('object')))

Screenshot

Xplainable as a surrogate model

Proposed Feature

Currently, xplainable has a suite of standalone solid models designed to be inherently explainable. We would also like to implement a new feature that allows xplainable to be used as a surrogate model to explain other complex models.

Vision

from xplainable.surrogates import XClfSurrogate

# Other code here ...

# Some complex model
model = XGBClassifier()
model.fit(X_train, y_train)

surrogate = XClfSurrogate()
explainer = surrogate.fit(model, X_train, y_train)

explainer.explain()

XClassifier fails and raises Division By Zero error when training data column contains 100% null values

Description of Issue :
XClassifier fails and raises Division By Zero error when training data column contains 100% null values

How to reproduce:

from xplainable.core.models import XClassifier
from sklearn.datasets import make_classification
import numpy as np
X, y = make_classification(n_samples=1000,n_features=4,random_state=42,n_classes=2)
X = pd.DataFrame(X,columns=['Feat_'+str(i) for i in range(4)])
y = pd.Series(y)
print(f'Shape of training data {X.shape}')
print(f'Shape of target {y.shape}')
print(X.isna().sum())
print(X.nunique(dropna=False))

# Case 1 : 100% null values in feature 0

X_case_1 = X.copy(deep=True)
X_case_1['Feat_0'] = np.nan

print(X_case_1.isna().sum())
print(X_case_1.nunique(dropna=False))

x_model = XClassifier()
x_model.fit(X_case_1, y)

Screenshot

GUI for Preprocessing

Current Implementation

We are currently using ipywidgets to render an in-notebook GUI for data preprocessing.

The issue

ipywidgets brings with it a multitude of bugs:

The GUI doesn't render in environments like Google Collab and Databricks
Some ipython dependencies, including tornado, cause kernel crashes when rendering results

Desired Solution

A custom GUI implementation that does not use ipywidgets and will safely render in any notebook environment.

Key considerations

The current implementation works in a Jupyter Notebook with the required dependencies installed with pip install xplainable[gui]. You should use this environment to understand the desired functionality.

XMultiClassifier Implementation

Background

Currently, xplainable is limited to Binary Classification (XClassifier) and Regression (XRegressor) models. We have run testing internally that validates the efficacy of using XClassifier with OVM and OVO to achieve robust performance.

Desired Feature

We want to integrate XMultiClassifier into our suite of models.

Key considerations

The class must be usable in the same way as XClassifier
How can we effectively visualise the explainers?
The metadata must be compatible with XClassifier and the xplainable client

Multiclass Client

Following the implementation of XMultiClassifier, the ability to persist to the cloud must be implemented.

Enhance .explain() method in BasePartition

The output of the .explain() method on partitioned models should include another dropdown that allows you to select which partition to interrogate. Currently, this can be achieved only with the 'partition' parameter

Improve and complete unit testing

The Issue

Due to our haste to release xplainable, we haven't completed unit testing across the entire package (I know, I know...).

Desired Input

Someone to help us implement proper unit testing across the entire xplainable library

GUI for loading models and preprocessors

Current implementation

Currently, we are using ipywidgets to render widgets for users to easily load models and preprocessors.

The problem

The widgets don't load in Google Collab, Databricks, and some other notebook environments.

What needs to be done

We would like to implement a custom solution that doesn't use ipywidgets so we can safely render a model selector and a preprocessor selector in any notebook environment

Key considerations

The properly test this solution, you will need access the the xplainable client. While you can achieve this with a free xplainable account, we will offer the person who wants to solve this problem with a free premium account. Please contact us at [email protected] and mention this issue.

GUI for Model Training

Current Implementation

We are currently using ipywidgets to render an in-notebook GUI for model training.

The issue

ipywidgets brings with it a multitude of bugs:

The GUI doesn't render in environments like Google Collab and Databricks
Some ipython dependencies, including tornado, cause kernel crashes when rendering results

Desired Solution

A custom GUI implementation that does not use ipywidgets and will safely render in any notebook environment.

Key considerations

The current implementation works in a Jupyter Notebook with the required dependencies installed with pip install xplainable[gui]. You should use this environment to understand the desired functionality.

API key management

Details

The original API key management implementation relied on the machine keychain (using keyring) to store API keys. This was applied for better useability for users in notebook environments. However, this is not well supported by platforms like Google Collab and Databricks, nor is it smooth for Linux virtual machines.

What we want to change

Update API management to use simple instantiation for uses to pass environment variables instead of being prompted to insert an API key. The goal is to improve usability and reduce friction.

Local Explanation Plotting

The Issue

The plotting of local explainers for both XClassifier and XRegressor is relatively primitive and buggy.

Desired Result

A better designed and interactive local explainer plot that will render in all common notebook environments

Functions to alter

_plot_local_explainer() found in xplainable.visualisation.explain
local_explainer() in the BaseModel class

Key considerations

When there are many features, we would like to default to showing the Top 10 contributing features with the rest bundled into "other". The user should be able to specify the top N features.

xplainable / xplainable Goto Github PK

xplainable's Introduction

xplainable

Real-time explainable machine learning for business optimisation

Why Was Xplainable Created?

Simple Interface

Models

Installation

Getting Started

Features

Preprocessing

Using the API

Using the GUI

Modelling

Using the API

Using the GUI

Rapid Refitting

Using the API

Using the GUI

Explainability

Action & Optimisation

Deployment

#FairML

Xplainable Cloud

Contributors

xplainable's People

Contributors

Stargazers

Watchers

Forkers

xplainable's Issues

Problem Statement

Proposed Solution

Current state

What we want to see

Vision

Summary

Objectives

Tasks

Considerations

Potential Challenges

Dependency

What we want to achieve

Key considerations

Flexibility in approach

Vision 1

Vision 2

Proposed Feature

Vision

Current Implementation

The issue

Desired Solution

Key considerations

Background

Desired Feature

Key considerations

Multiclass Client

The Issue

Desired Input

Current implementation

The problem

What needs to be done

Key considerations

Current Implementation

The issue

Desired Solution

Key considerations

Details

What we want to change

The Issue

Desired Result

Functions to alter

Key considerations

Recommend Projects

Recommend Topics

Recommend Org