Code Monkey home page Code Monkey logo

xplainable's Introduction

Contributors

xplainable

Real-time explainable machine learning for business optimisation

Python PyPi Downloads

Xplainable makes tabular machine learning transparent, fair, and actionable.

Why Was Xplainable Created?

In machine learning, there has long been a trade-off between accuracy and explainability. This drawback has led to the creation of explainable ML libraries such as Shap and Lime which make estimations of model decision processes. These can be incredibly time-expensive and often present steep learning curves making them challenging to implement effectively in production environments.

To solve this problem, we created xplainable. xplainable presents a suite of novel machine learning algorithms specifically designed to match the performance of popular black box models like XGBoost and LightGBM while providing complete transparency, all in real-time.

Simple Interface

You can interface with xplainable either through a typical Pythonic API, or using a notebook-embedded GUI in your Jupyter Notebook.

Models

Xplainable has each of the fundamental tabular models used in data science teams. They are fast, accurate, and easy to use.

Model Python API Jupyter GUI
Regression
Binary Classification
Multi-Class Classification 🔜

Installation

You can install the core features of xplainable with:

pip install xplainable

to use the xplainable gui in a jupyter notebook, install with:

pip install xplainable[gui]

Getting Started

Basic Example

import xplainable as xp
from xplainable.core.models import XClassifier
import pandas as pd
from sklearn.model_selection import train_test_split

# Load data
data = xp.load_dataset('titanic')

X, y = data.drop(columns=['Survived']), data['Survived']

X_train, X_test, y_train, y_test = train_test_split(
     X, y, test_size=0.25, random_state=42)

# Train a model
model = XClassifier()
model.fit(X_train, y_train)

# Explain the model
model.explain()

Features

Xplainable helps to streamline development processes by making model tuning and deployment simpler than you can imagine.

Preprocessing

We built a comprehensive suite of preprocessing transformers for rapid and reproducible data preprocessing.

Feature Python API Jupyter GUI
Data Health Checks
Transformers Library
Preprocessing Pipelines
Pipeline Persistance

Using the API

from xplainable.preprocessing.pipeline import XPipeline
from xplainable.preprocessing import transformers as xtf

pipeline = XPipeline()

# Add stages for specific features
pipeline.add_stages([
    {"feature": "age", "transformer": xtf.Clip(lower=18, upper=99)},
    {"feature": "balance", "transformer": xtf.LogTransform()}
])

# add stages on multiple features
pipeline.add_stages([
    {"transformer": xtf.FillMissing({'job': 'mode', 'age': 'mean'})},
    {"transformer": xtf.DropCols(columns=['duration', 'campaign'])}
])

# Fit and transform the data
train_transformed = pipeline.fit_transform(train)

# Apply transformations on new data
test_transformed = pipeline.transform(test)

Using the GUI

pp = xp.Preprocessor()

pp.preprocess(train)

Modelling

Xplainable models can be developed, optimised, and re-optimised using Pythonic APIs or the embedded GUI.

Feature Python API Jupyter GUI
Classic Vanilla Data Science APIs -
AutoML
Hyperparameter Optimisation
Partitioned Models
Rapid Refitting (novel to xplainable)
Model Persistance

Using the API

import xplainable as xp
from xplainable.core.models import XClassifier
from xplainable.core.optimisation.bayesian import XParamOptimiser
from sklearn.model_selection import train_test_split
import pandas as pd

# Load your data
data = xp.load_dataset('titanic')

# note: the data requires preprocessing, so results may be poor
X, y = data.drop('Survived', axis=1), data['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Optimise params
opt = XParamOptimiser(metric='roc-auc')
params = opt.optimise(X_train, y_train)

# Train your model
model = XClassifier(**params)
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)

# Explain the model
model.explain()

Using the GUI

model = xp.classifier(train)

Rapid Refitting

Fine tune your models by refitting model parameters on the fly, even on individual features.

Using the API

new_params = {
            "features": ['Age'],
            "max_depth": 6,
            "min_info_gain": 0.01,
            "min_leaf_size": 0.03,
            "weight": 0.05,
            "power_degree": 1,
            "sigmoid_exponent": 1,
            "x": X_train,
            "y": y_train
}

model.update_feature_params(**new_params)

Using the GUI


Explainability

Models are explainable and real-time, right out of the box, without having to fit surrogate models such as Shap orLime.

Feature Python API Jupyter GUI
Global Explainers
Regional Explainers
Local Explainers
Real-time Explainability
model.explain()

Action & Optimisation

We leverage the explainability of our models to provide real-time recommendations on how to optimise predicted outcomes at a local and global level.

Feature
Automated Local Prediction Optimisation
Automated Global Decision Optimisation 🔜

Deployment

Xplainable brings transparency to API deployments, and it's easy. By the time your finger leaves the mouse, your model is on a secure server and ready to go.

Feature Python API Xplainable Cloud
< 1 Second API Deployments
Explainability-Enabled API Deployments
A/B Testing - 🔜
Champion Challenger Models (MAB) - 🔜

#FairML

We promote fair and ethical use of technology for all machine learning tasks. To help encourage this, we're working on additional bias detection and fairness testing classes to ensure that everything you deploy is safe, fair, and compliant.

Feature Python API Xplainable Cloud
Bias Identification
Automated Bias Detection 🔜 🔜
Fairness Testing 🔜 🔜

Xplainable Cloud

This Python package is free and open-source. To add more value to data teams within organisations, we also created Xplainable Cloud that brings your models to a collaborative environment.

import xplainable as xp
import os

xp.initialise(api_key=os.environ['XP_API_KEY'])

Contributors

We'd love to welcome contributors to xplainable to keep driving forward more transparent and actionable machine learning. We're working on our contributor docs at the moment, but if you're interested in contributing, please send us a message at [email protected].





Thanks for trying xplainable!

Made with ❤️ in Australia


© copyright xplainable pty ltd

xplainable's People

Contributors

jtuppack avatar lordventer avatar tim-huntley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

xplainable's Issues

X_Classifier.predict_proba returns array of shape (n_samples, 1) instead of (n_samples, 2).

Description of Issue:
predict_proba method implemented in scikit-learn Estimators and LGBM usually returns returns the predicted probability for each class for each sample. XClassifier deviates from this behaviour returning only probabilities estimates for class 1.

How to reproduce:

from xplainable.core.models import XClassifier
from sklearn.datasets import make_classification
import numpy as np
X, y = make_classification(n_samples=1000,n_features=4,random_state=42,n_classes=2)
X = pd.DataFrame(X,columns=['Feat_'+str(i) for i in range(4)])
y = pd.Series(y)
print(f'Shape of training data {X.shape}')
print(f'Shape of target {y.shape}')

x_model = XClassifier()
x_model.fit(X, y)
print(x_model.predict_proba(X).shape)

Screenshot
image

Add key attributes to the BasePartition class

Problem Statement

While using the XClassifier and XRegressor classes, it's straightforward to retrieve the feature importances and profile after training using the model.feature_importances and model.profle attributes. However, the PartitionedClassifier class, which contains multiple model objects, currently lacks this functionality.

Proposed Solution

It would be beneficial if we could list the feature importances of all XClassifier and XRegressor objects contained within a PartitionedClassifier instance. Ideally, we would return the metadata for all partitions in a nested dict.

Auto-Blueprint for Data Preprocessing

Current state

Xplainable has a suite of data transformers aimed at simplifying data preprocessing. These stages are generally added to a Pipeline() object. One of the advantages of using xplainable for modelling is, generally, our models require less complex transformations (no need for class balancing or log transforming), which makes it easier to infer what transformations should be applied given the structure of a feature.

What we want to see

An automated "blueprint" that generates what a transformation pipeline should look like for a given dataset. Essentially, an automated-preprocessing tool. You can then pass the blueprint to a pipeline and it will build it.

Vision

This is a rough vision of what it could look like, but be flexible!

from xplainable.preprocessing.auto import AutoBlueprint
from xplainable.preprocessing.pipeline import Pipeline
import pandas as pd

df = pd.read_csv("path/to/data.csv")

X, y = df.drop(columns=['target']), df['target']

blueprint = AutoBlueprint()
stages = blueprint.fit(X, y)

pipeline = Pipeline(stages=stages)
pipeline.fit(X, y)

df_transformed = pipeline.transform(df)

Creating a Conda Install for Xplainable Python Package

Summary

We aim to expand the availability of our open-source Python package, currently installable via pip, by making it accessible through Conda. This will enhance the ease of installation and broaden the user base.

Objectives

  • Create a Conda package for Xplainable.
  • Submit the package to a widely-used Conda channel (e.g., Anaconda, conda-forge).

Tasks

  1. Research the requirements and guidelines for creating a Conda package.
  2. Adapt the package structure if necessary to meet Conda packaging standards.
  3. Create a recipe for building the Conda package.
  4. Thoroughly test the Conda package to ensure it installs and works correctly.
  5. Submit the package to a Conda channel for review and inclusion.

Considerations

  • Ensure compatibility with major platforms (Windows, macOS, Linux).
  • Provide clear installation instructions for users.

Potential Challenges

  • Handling dependencies that may not be available on Conda.
  • Maintaining the package for future updates.

Contributions and suggestions are welcome.

MLflow Integration (dependent on Surrogate Modelling)

Dependency

#83

What we want to achieve

We would like to integrate xplainable into the MLflow workflow. This involves the following workflows:

  • If the model is an xplainable model, the model is logged to MLflow and Xplainable Cloud (with the same experiment_id)
  • If the model is NOT an xplainable model, a surrogate model is created and logged to MLflow and Xplainable Cloud

Key considerations

This should be as uninvasive as possible. We don't want people to have to change their workflow drastically, rather just import xplainable and automate the explainer creation and logging.

Flexibility in approach

Please run with your own creative freedom here. We would love to discuss different approaches, as long as the disruption to user workflow is minimal.

Vision 1

This idea involves implementing a manual logging step. Behind the scenes it will train an xplainable surrogate model and log to both MLflow and Xplainable Cloud.

import xplainable as xp
import mlflow

# <- Build any model here

with mlflow.start_run():
    signature = infer_signature(X_train, model.predict(X_train))

    # <- other logging here

    mlflow.log_model(
        model=model,
        signature=signature,
        input_example=X_train,
        registered_model_name="model-with-xplainable"
    )

    # This should log to mlflow and to xplainable cloud (if an api key is active)
    xp.mlflow.log_explanation(model.predict, X)

Vision 2

This is similar to vision 1, but without the need to manually log explainers. It's not clear how this would be achieved, but it is an idea worth fleshing out.

import xplainable as xp
import mlflow

# <- Build any model here

xp.mlflow.auto_logging = True

with mlflow.start_run():
    signature = infer_signature(X_train, model.predict(X_train))

    # <- other logging here

    mlflow.log_model(
        model=model,
        signature=signature,
        input_example=X_train,
        registered_model_name="model-with-xplainable"
    ) # < -- This should auto-log to mlflow and to xplainable cloud (if an api key is active)

ENH : Option to explicitly mention categorical columns during fit

Description of Enhancement :
BaseModel during self._fetch_meta() selects categorical based on if the column dtype is 'object'. This is typically acceptable and would cover 95% of use cases but the inbuilt pandas dtype categorical would be missed by this . I propose we add categorical_feature parameter to our fit methods like in LGBM's fit method : categorical_feature (list of str or int, or 'auto', optional (default='auto'))(https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier.fit)

How to reproduce:

from xplainable.core.models import XClassifier
from sklearn.datasets import make_classification
import numpy as np
X, y = make_classification(n_samples=1000,n_features=4,random_state=42,n_classes=2)
X = pd.DataFrame(X,columns=['Feat_'+str(i) for i in range(4)])
y = pd.Series(y)
X['Feat_0'] = X['Feat_0'].astype('category')
print(list(X.select_dtypes('object')))

Screenshot
image

image

Xplainable as a surrogate model

Proposed Feature

Currently, xplainable has a suite of standalone solid models designed to be inherently explainable. We would also like to implement a new feature that allows xplainable to be used as a surrogate model to explain other complex models.

Vision

from xplainable.surrogates import XClfSurrogate

# Other code here ...

# Some complex model
model = XGBClassifier()
model.fit(X_train, y_train)

surrogate = XClfSurrogate()
explainer = surrogate.fit(model, X_train, y_train)

explainer.explain()

XClassifier fails and raises Division By Zero error when training data column contains 100% null values

Description of Issue :
XClassifier fails and raises Division By Zero error when training data column contains 100% null values

How to reproduce:

from xplainable.core.models import XClassifier
from sklearn.datasets import make_classification
import numpy as np
X, y = make_classification(n_samples=1000,n_features=4,random_state=42,n_classes=2)
X = pd.DataFrame(X,columns=['Feat_'+str(i) for i in range(4)])
y = pd.Series(y)
print(f'Shape of training data {X.shape}')
print(f'Shape of target {y.shape}')
print(X.isna().sum())
print(X.nunique(dropna=False))

# Case 1 : 100% null values in feature 0

X_case_1 = X.copy(deep=True)
X_case_1['Feat_0'] = np.nan

print(X_case_1.isna().sum())
print(X_case_1.nunique(dropna=False))

x_model = XClassifier()
x_model.fit(X_case_1, y)

Screenshot
image
image

GUI for Preprocessing

Current Implementation

We are currently using ipywidgets to render an in-notebook GUI for data preprocessing.

The issue

ipywidgets brings with it a multitude of bugs:

  • The GUI doesn't render in environments like Google Collab and Databricks
  • Some ipython dependencies, including tornado, cause kernel crashes when rendering results

Desired Solution

A custom GUI implementation that does not use ipywidgets and will safely render in any notebook environment.

Key considerations

  • The current implementation works in a Jupyter Notebook with the required dependencies installed with pip install xplainable[gui]. You should use this environment to understand the desired functionality.

XMultiClassifier Implementation

Background

Currently, xplainable is limited to Binary Classification (XClassifier) and Regression (XRegressor) models. We have run testing internally that validates the efficacy of using XClassifier with OVM and OVO to achieve robust performance.

Desired Feature

We want to integrate XMultiClassifier into our suite of models.

Key considerations

  • The class must be usable in the same way as XClassifier
  • How can we effectively visualise the explainers?
  • The metadata must be compatible with XClassifier and the xplainable client

Contact us at [email protected] for more information on this feature.

Multiclass Client

Multiclass Client

Following the implementation of XMultiClassifier, the ability to persist to the cloud must be implemented.

Enhance .explain() method in BasePartition

The output of the .explain() method on partitioned models should include another dropdown that allows you to select which partition to interrogate. Currently, this can be achieved only with the 'partition' parameter

Improve and complete unit testing

The Issue

Due to our haste to release xplainable, we haven't completed unit testing across the entire package (I know, I know...).

Desired Input

Someone to help us implement proper unit testing across the entire xplainable library

GUI for loading models and preprocessors

Current implementation

Currently, we are using ipywidgets to render widgets for users to easily load models and preprocessors.

The problem

The widgets don't load in Google Collab, Databricks, and some other notebook environments.

What needs to be done

We would like to implement a custom solution that doesn't use ipywidgets so we can safely render a model selector and a preprocessor selector in any notebook environment

Key considerations

  • The properly test this solution, you will need access the the xplainable client. While you can achieve this with a free xplainable account, we will offer the person who wants to solve this problem with a free premium account. Please contact us at [email protected] and mention this issue.

GUI for Model Training

Current Implementation

We are currently using ipywidgets to render an in-notebook GUI for model training.

The issue

ipywidgets brings with it a multitude of bugs:

  • The GUI doesn't render in environments like Google Collab and Databricks
  • Some ipython dependencies, including tornado, cause kernel crashes when rendering results

Desired Solution

A custom GUI implementation that does not use ipywidgets and will safely render in any notebook environment.

Key considerations

  • The current implementation works in a Jupyter Notebook with the required dependencies installed with pip install xplainable[gui]. You should use this environment to understand the desired functionality.

API key management

Details

The original API key management implementation relied on the machine keychain (using keyring) to store API keys. This was applied for better useability for users in notebook environments. However, this is not well supported by platforms like Google Collab and Databricks, nor is it smooth for Linux virtual machines.

What we want to change

Update API management to use simple instantiation for uses to pass environment variables instead of being prompted to insert an API key. The goal is to improve usability and reduce friction.

Local Explanation Plotting

The Issue

The plotting of local explainers for both XClassifier and XRegressor is relatively primitive and buggy.

Desired Result

A better designed and interactive local explainer plot that will render in all common notebook environments

Functions to alter

  • _plot_local_explainer() found in xplainable.visualisation.explain
  • local_explainer() in the BaseModel class

Key considerations

  • When there are many features, we would like to default to showing the Top 10 contributing features with the rest bundled into "other". The user should be able to specify the top N features.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.