Code Monkey home page Code Monkey logo

scikit-uplift's Introduction

Python3_ PyPi_ Docs_ License_

scikit-uplift: uplift modeling in scikit-learn style in python

scikit-uplift

scikit-uplift (sklift) is an uplift modeling python package that provides fast sklearn-style models implementation, evaluation metrics and visualization tools.

Uplift modeling estimates a causal effect of treatment and uses it to effectively target customers that are most likely to respond to a marketing campaign.

Use cases for uplift modeling:

  • Target customers in the marketing campaign. Quite useful in promotion of some popular product where there is a big part of customers who make a target action by themself without any influence. By modeling uplift you can find customers who are likely to make the target action (for instance, install an app) only when treated (for instance, received a push).
  • Combine a churn model and an uplift model to offer some bonus to a group of customers who are likely to churn.
  • Select a tiny group of customers in the campaign where a price per customer is high.

Read more about uplift modeling problem in User Guide.

Articles in russian on habr.com: Part 1 , Part 2 and Part 3.

Why sklift

  • Сomfortable and intuitive scikit-learn-like API;
  • More uplift metrics than you have ever seen in one place! Include brilliants like Area Under Uplift Curve (AUUC) or Area Under Qini Curve (Qini coefficient) with ideal cases;
  • Supporting any estimator compatible with scikit-learn (e.g. Xgboost, LightGBM, Catboost, etc.);
  • All approaches can be used in the sklearn.pipeline. See the example of usage on the Tutorials page;
  • Also metrics are compatible with the classes from sklearn.model_selection. See the example of usage on the Tutorials page;
  • Almost all implemented approaches solve classification and regression problems;
  • Nice and useful viz for analysing a performance model.

Installation

Install the package by the following command from PyPI:

pip install scikit-uplift

Or install from source:

git clone https://github.com/maks-sh/scikit-uplift.git
cd scikit-uplift
python setup.py install

Documentation

The full documentation is available at uplift-modeling.com.

Or you can build the documentation locally using Sphinx 1.4 or later:

cd docs
pip install -r requirements.txt
make html

And if you now point your browser to _build/html/index.html, you should see a documentation site.

Quick Start

See the RetailHero tutorial notebook (EN Open In Colab1_, RU Open In Colab2_) for details.

Train and predict uplift model

Use the intuitive python API to train uplift models with sklift.models.

# import approaches
from sklift.models import SoloModel, ClassTransformation
# import any estimator adheres to scikit-learn conventions.
from lightgbm import LGBMClassifier

# define models
estimator = LGBMClassifier(n_estimators=10)

# define metamodel
slearner = SoloModel(estimator=estimator)

# fit model
slearner.fit(
    X=X_tr,
    y=y_tr,
    treatment=trmnt_tr,
)

# predict uplift
uplift_slearner = slearner.predict(X_val)

Evaluate your uplift model

Uplift model evaluation metrics are available in sklift.metrics.

# import metrics to evaluate your model
from sklift.metrics import (
    uplift_at_k, uplift_auc_score, qini_auc_score, weighted_average_uplift
)


# Uplift@30%
uplift_at_k = uplift_at_k(y_true=y_val, uplift=uplift_slearner,
                          treatment=trmnt_val,
                          strategy='overall', k=0.3)

# Area Under Qini Curve
qini_coef = qini_auc_score(y_true=y_val, uplift=uplift_slearner,
                           treatment=trmnt_val)

# Area Under Uplift Curve
uplift_auc = uplift_auc_score(y_true=y_val, uplift=uplift_slearner,
                              treatment=trmnt_val)

# Weighted average uplift
wau = weighted_average_uplift(y_true=y_val, uplift=uplift_slearner,
                              treatment=trmnt_val)

Vizualize the results

Visualize performance metrics with sklift.viz.

from sklift.viz import plot_qini_curve
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1)
ax.set_title('Qini curves')

plot_qini_curve(
    y_test, uplift_slearner, trmnt_test,
    perfect=True, name='Slearner', ax=ax
);

plot_qini_curve(
    y_test, uplift_revert, trmnt_test,
    perfect=False, name='Revert label', ax=ax
);

Example of some models qini curves, perfect qini curve and random qini curve

Development

We welcome new contributors of all experience levels.

Thanks to all our contributors!

Contributors

If you have any questions, please contact us at [email protected]


Papers and materials

  1. Gutierrez, P., & Gérardy, J. Y.

    Causal Inference and Uplift Modelling: A Review of the Literature. In International Conference on Predictive Applications and APIs (pp. 1-13).

  2. Artem Betlei, Criteo Research; Eustache Diemert, Criteo Research; Massih-Reza Amini, Univ. Grenoble Alpes

    Dependent and Shared Data Representations improve Uplift Prediction in Imbalanced Treatment Conditions FAIM'18 Workshop on CausalML.

  3. Eustache Diemert, Artem Betlei, Christophe Renaudin, and Massih-Reza Amini. 2018.

    A Large Scale Benchmark for Uplift Modeling. In Proceedings of AdKDD & TargetAd (ADKDD’18). ACM, New York, NY, USA, 6 pages.

  4. Athey, Susan, and Imbens, Guido. 2015.

    Machine learning methods for estimating heterogeneous causal effects. Preprint, arXiv:1504.01132. Google Scholar.

  5. Oscar Mesalles Naranjo. 2012.

    Testing a New Metric for Uplift Models. Dissertation Presented for the Degree of MSc in Statistics and Operational Research.

  6. Kane, K., V. S. Y. Lo, and J. Zheng. 2014.

    Mining for the Truly Responsive Customers and Prospects Using True-Lift Modeling: Comparison of New and Existing Methods. Journal of Marketing Analytics 2 (4): 218–238.

  7. Maciej Jaskowski and Szymon Jaroszewicz.

    Uplift modeling for clinical trial data. ICML Workshop on Clinical Data Analysis, 2012.

  8. Lo, Victor. 2002.

    The True Lift Model - A Novel Data Mining Approach to Response Modeling in Database Marketing. SIGKDD Explorations. 4. 78-86.

  9. Zhao, Yan & Fang, Xiao & Simchi-Levi, David. 2017.

    Uplift Modeling with Multiple Treatments and General Response Types. 10.1137/1.9781611974973.66.

  10. Nicholas J Radcliffe. 2007. Using control groups to target on predicted lift: Building and assessing uplift model. Direct Marketing Analytics Journal, (3):14–21, 2007.
  11. Devriendt, F., Guns, T., & Verbeke, W. 2020. Learning to rank for uplift modeling. ArXiv, abs/2002.05897.

Tags

EN: uplift modeling, uplift modelling, causal inference, causal effect, causality, individual treatment effect, true lift, net lift, incremental modeling

RU: аплифт моделирование, Uplift модель

ZH: uplift增量建模, 因果推断, 因果效应, 因果关系, 个体干预因果效应, 真实增量, 净增量, 增量建模

scikit-uplift's People

Contributors

00helloworld avatar acssar avatar adivarma27 avatar az0 avatar bwbelljr avatar dennisliub avatar elisovaira avatar elmaxuno avatar ezhdi avatar flashlight101 avatar kkapka avatar ksyula avatar lyutov89 avatar maks-sh avatar mcullan avatar mogby avatar muhamob avatar patpanda94 avatar robbstarkk avatar rooti123 avatar semenova-pd avatar sidorovtv avatar spiaz avatar tankudo avatar wrapper228 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scikit-uplift's Issues

Generating a random uplift dataset

💡 Feature request

We need a function to generate a random binary uplift task make_uplift_classification.

It is contains three columns: target, treatment and uplift. Target and treatment columns are binary.

We need some requirements for the result. They are:

  • reproducibility
  • uplift continuality (there should not be many identical values for a large n = number of observations)
  • clear parameters to change the uplift curve closer to the ideal and closer to the random one
  • a random uplift must be non-negative, you need to be able to change it.

Checker: add checker that target is binary

💡 Feature request

check_is_binary checker function already exists. Please add it in the code for target checking.

Motivation

Users try to fit datasets with a continuous target. There is no such option for now.

Additional context

Source:

check_is_binary in utils.py. Link here

Where to add:

Understanding of Uplift curve

Hello! Please, could you help me? How may I find out what do the different lifts mean in the Uplift curve? (compare with the null(random) line and optimal line(with high discriminatory power)
100000000_gini

Qini curve

🐛 Bug

Hi! first of all, great project. I just noticed something weird with the qini curve.
grafik
The model should never be able to surpass the red line, right?

To Reproduce

from sklift.models import SoloModel
from sklift.viz import plot_qini_curve
from lightgbm import LGBMRegressor
import numpy as np

X = np.random.randn(100000, 2)
t = (X[:, 0] + X[:, 1] > 0.5)
y = X[:, 0]*X[:, 1] + 2*t + np.random.randn(100000)

s = SoloModel(LGBMRegressor())
s.fit(X, y, t)

plot_qini_curve(y, s.predict(X), t)

Environment

  • scikit-uplift version: 0.2.0
  • lightgbm version: 3.1.1
  • Python version: 3.7.7
  • OS: Windows 10
  • Any other relevant information:

Thank for your work!

Best regards
Robert

Computing Metrics for a Regression problem

💡 Feature request

For all the sample notebooks given, we are computing the metrics where the outcome y is taken as a binary output, can we also do this for a regression problem and how it will change all the metrics and the corresponding curves?

Motivation

An example in the library for the regression problems will also help the users use sklift for similar problems

Additional context

Currently metrics assume y_val as a binary variable, what if its a regression problem? How to modify the metrics to not check for y_value as boolean and compute them?

Predicted vs actual uplift not possible anymore

Hi!

We are using uplift_by_percentile to compute predicted and actual uplift by decile of predicted uplift.
This is then used to compute MSRC as mentioned in Michel, R., Schnakenburg, I. and von Martens, T., 2019. Targeting Uplift. An Introduction to Net Scores/by René Michel, Igor Schnakenburg, Tobias von Martens.

Since PR 120 this is not possible due to check_is_binary(y_true) in uplift_by_percentile function.
We'd like to use this function passing to y_true a float with the prediction.

May we check y_true is a float instead of a boolean?

check_is_binary(y_true)

Bug when importing all metrics

🐛 Bug

To Reproduce

from sklift.metrics import *

TypeError                                 Traceback (most recent call last)
<ipython-input-2-89f0d9d4fc6d> in <module>
----> 1 from sklift.metrics import *

~/anaconda3/lib/python3.7/importlib/_bootstrap.py in _handle_fromlist(module, fromlist, import_, recursive)

~/anaconda3/lib/python3.7/importlib/_bootstrap.py in _handle_fromlist(module, fromlist, import_, recursive)

TypeError: Item in sklift.metrics.__all__ must be str, not function

Снимок экрана от 2021-07-18 14-49-27

Environment

  • scikit-uplift version (e.g., 0.1.2): 0.3.2
  • Python version (e.g., 3.7): 3.6

Add get_scorer

💡 Feature request

Add get_scorer() function which can be passed to cross-validation pipelines.

Motivation

Sklift lacks of ability to pass uplift metrics to model selection functions such as cross_validate and GridSearchCV. This function creates this ability by creating sklearn.metrics.make_scorer() object which can be passed in the standard manner.

Additional context

Tutorial notebook "uplift_model_selection_tutorial.ipynb" included.

Why is perfect uplift calculated differently for uplift curve and qini curve?

💡 Feature request

Hi! Perfect uplift is required to compute both perfect uplift curve and perfect qini curve. Why is the formula to generate the perfect uplift different? Does it make sense to unify the perfect uplift formula?

perfect uplift curve

cr_num = np.sum((y_true == 1) & (treatment == 0)) # Control   Responders
tn_num = np.sum((y_true == 0) & (treatment == 1))  # Treated Non-Responders
summand = y_true if cr_num > tn_num else treatment
perfect_uplift = 2 * (y_true == treatment) + summand

perfect qini curve
perfect_uplift = y_true * treatment - y_true * (1 - treatment)

EDA notebook for the Megafon dataset

📚 Documentation

Make EDA notebook on top of the Megafon dataset. The function fetch_megafon that loads and returns this dataset are released at PR #115.

Points to include:

  • how to download
  • simple dataset EDA (include treatment column value_counts, target column value_counts)
  • simple model fit/predict

Bug when importing all models

🐛 Bug

from sklift.models import *
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-fb12e2634ba6> in <module>
----> 1 from sklift.models import *

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/importlib/_bootstrap.py in _handle_fromlist(module, fromlist, import_)

TypeError: hasattr(): attribute name must be string

image

Environment

  • scikit-uplift version (e.g., 0.1.2): 0.3.2
  • Python version (e.g., 3.7): 3.6

Dataset: Marketing Promotion Campaign Uplift Modelling

Context

Marketing Promotion Campaign
with a total of 6,400 customers data.

This dataset show customer's brief information,
historical use of discount or BOGO(Buy One Get One) promotion,
offer has been made, and the conversion result(buy or not).
The conversion average value = $25

Acknowledgements
This dataset is a fictional dataset for practicing purpose

Inspiration
Predict customer's conversion rate
Uplift Modelling to maximizing marketing campaign and reducing campaign cost

License:
Public Domain

Outdated data source links in Retaihero (RU+EN) notebooks.

Progress bar | dataset loading

💡 Feature request

Make progress bar for dataset loading in datasets module. Please use tqdm module.

Additional

Example

from tqdm.auto import tqdm

Superfluous treatment feature in a notebook with an example

Hello!

When you create a dataset for training, you save the treatment flags in a separate pd.Series, which will be passed to the model during training along with the main X_train. However, you do not drop the treatment column from X_train. This leads to the fact that the model is trained on two treatment features at once.
During the application of the model, we are dealing with data for which we do not know whether there was a communication with the client or not, we need to calculate the uplift and use it to judge the expediency of communication with the client. At the stage of application, treatment is set to 0 and 1 and the difference between the model predictions for them is calculated. However, if you do not drop treatment from X_train during training, during the application of the model on data with the same features, but without treatment, at least an error will be received about the lack of one feature in the data. But the main problem is the discrepancy with the logic, which corresponds to the theory described in the same article on scikit-uplift.
Please correct this small but serious typo in your example. At least in my case, it became a bottleneck in my work.

Sincerely, Gleb Koptev!

Unwanted treatment variable in plot_uplift_preds( )

🐛 Bug

Error : 'treatment is not defined'

Causing the error 'treatment is not defined' while running the plot_uplift_preds( )

To Rectify

  • check_is_binary(treatment) seems unnecessary for this particular visualization function.

Realize ASD metric

💡 Feature request

Implement the ASD metric from the book René Michel, Igor Schnakenburg, Tobias von Martens. Targeting Uplift. An Introduction to Net Scores.

Where: metrics module

Explanation

📌 The ASD (average squared deviation) is a model stability metric shows that model does not overfit on training data (extremely important in uplift modeling).

image

Explanation of what are l^T and l^V you can get from the table:

image
(page 103)
This table 4.1 is a source for the ASD metric. Table is already implemented as metrics.uplift_by_percentile function.

📌 Realization of the metric comes down to the next steps:

  • call uplift_by_percentile twice: for train data and for the validation data
  • get two tables
  • count the ASD metric and return it

Additional context

Please provide screenshots with working examples.

Add citation in the docstring

💡 Feature request

Add citation of the bibtex in the function docstring.

Where: fetch_criteo function in the datasets module
Code link

Type of docstring format: Sphinx

Motivation

Please refer to the Criteo uplift dataset page:

image

Additional context

Citation have to display in the correct format in the documentation files.

Please provide screenshots.

Documentation how to make citation in the Sphinx docstring
https://readthedocs.org/projects/sphinxcontrib-bibtex/downloads/pdf/latest/

How to build documentation files

you can build the documentation locally using Sphinx 1.4 or later:

cd docs
pip install -r requirements.txt
make html

If you now point your browser to _build/html/index.html, you should see a documentation site.

(Navigate to doc file and right click -> open with your_browser)

Add a page to the `dataset` documentation

📚 Documentation

There should be a table with approximately the following columns:

  • Dataset name,
  • what is the task,
  • the share of the treatment,
  • the share of the target,
  • a link to the API function
  • link to EDA notebook,
  • In which tutorials were used,
  • Source

EDA notebook for the X5 dataset

💡 Feature request

There are 4 datasets available for downloading in the datasets module.

It will be great to make EDA notebook on top of the x5 dataset in notebooks. fetch_x5 function source code.

Points to include:

  • how to download
  • simple dataset EDA (include treatment column value_counts, target column value_counts)
  • simple model fit/predict

Additional context

Notebook example

Total in uplift_by_percentile might be confusing

Hi all,
thanks for providing this package!

I'm getting my fingers dirty on the package for my thesis and came across the following:
In the calculation of "uplift_by_percentile" it is referring to the definition of "weighted_average_uplift.
Unfortunately, the results differ - see in the example:
upbyperc
i calculated the uplift_by_percentile for the models and displayed the total column - same value for all models.
waup
when calculating the weighted_average_uplift with its dedicated function different values are shown...
tz
by setting total=False and calculating it manually, the result is 0.473009...
manu
calculating it manually the result is again 0.473009... or 0.473010 due to rounding.

Maybe someone can shed some light on this issue. Thanks in advance.

Incorrect Percentile binning

In metrics.py file, line no. 533 order = np.argsort(uplift, kind='mergesort')[::-1], here we are defining the order in decreasing order of uplift score, then the method "uplift_by_percentile" should return a dataset where the first row percentile value should be 90-100 and not 0-10. Please have a look in this.

Thanks.

DataFrame object has no attribute 'column' in fetch_x5 function call

🐛 Bug

There is a bag while attempting to fetch x5 dataset with fetch_x5() function. It failts to get attribute 'column' from dataframe

To Reproduce

from sklift import datasets
datasets.fetch_x5()

Expected behavior

Dataset should be fetched with no errors

Environment

  • scikit-uplift version (e.g., 0.1.2): 0.3.1
  • scikit-learn version (e.g., 0.22.2): 0.23.2
  • Python version (e.g., 3.7): 3.7.6
  • OS (e.g., Linux): Debian
  • Any other relevant information:

Adding Jupyter notebooks as pages in the documentation

📚 Documentation

We are equal to the best, so we want our documentation to be convenient like the PyTorch documentation. We want the user not to have to leave the site, and he could read the tutorials directly in the documentation.

Therefore, you need to add a sphinx extension (Jupyter-sphinx, sphinxcontrib-jupyter, etc) that allows you to design laptops in the form of site pages.

NameError: name 'FileNotFoundError' is not defined - cannot import sklift in Jupyter Notebook

ModuleNotFoundError Sklift even though scikit-uplift is installed

I have installed the scikit-uplift with pip3 and I see it is in the correct directory for my Anaconda Jupyter Notebook. However, when I try to import it in Jupyter I get the ModuleNotFoundError: No module named 'sklift'. I then re-installed the package from the Github source as it is said in the docs, and I get the same error.

To Reproduce

After installation I try to run from sklift.metrics import uplift_at_k but I get ModuleNotFoundError: No module named 'sklift'. I checked and the package is correctly installed, I restarted the notebook and it still didn't work.

I decided to reinstall it from source, and now when I run python setup.py install in the scikit-uplift directory in terminal, I get

File "setup.py", line 24, in <module> except FileNotFoundError: NameError: name 'FileNotFoundError' is not defined

##Edit
If I run python3 setup.py install I get this warning:

ERROR: Command errored out with exit status 1:
command: /Library/Frameworks/Python.framework/Versions/3.9/bin/python3.9 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-install-gwr73n6u/matplotlib/setup.py'"'"'; file='"'"'/private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-install-gwr73n6u/matplotlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-pip-egg-info-adyt94va
cwd: /private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-install-gwr73n6u/matplotlib/
Complete output (64 lines):
WARNING: The wheel package is not available.
WARNING: The wheel package is not available.

Edit setup.cfg to change the build options; suppress output with --quiet.

BUILDING MATPLOTLIB
  matplotlib: yes [3.3.2]
      python: yes [3.9.0 (v3.9.0:9cf6752276, Oct  5 2020, 11:29:23)  [Clang 6.0
                  (clang-600.0.57)]]
    platform: yes [darwin]
 sample_data: yes [installing]
       tests: no  [skipping due to configuration]
      macosx: yes [installing]

running egg_info
creating /private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-pip-egg-info-adyt94va/matplotlib.egg-info
writing /private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-pip-egg-info-adyt94va/matplotlib.egg-info/PKG-INFO
writing dependency_links to /private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-pip-egg-info-adyt94va/matplotlib.egg-info/dependency_links.txt
writing namespace_packages to /private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-pip-egg-info-adyt94va/matplotlib.egg-info/namespace_packages.txt
writing requirements to /private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-pip-egg-info-adyt94va/matplotlib.egg-info/requires.txt
writing top-level names to /private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-pip-egg-info-adyt94va/matplotlib.egg-info/top_level.txt
writing manifest file '/private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-pip-egg-info-adyt94va/matplotlib.egg-info/SOURCES.txt'
init_dgelsd failed init
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-install-gwr73n6u/matplotlib/setup.py", line 242, in <module>
    setup(  # Finally, pass this all along to distutils to do the heavy lifting.
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/setuptools/__init__.py", line 165, in setup
    return distutils.core.setup(**attrs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/setuptools/command/egg_info.py", line 297, in run
    self.find_sources()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/setuptools/command/egg_info.py", line 304, in find_sources
    mm.run()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/setuptools/command/egg_info.py", line 535, in run
    self.add_defaults()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/setuptools/command/egg_info.py", line 571, in add_defaults
    sdist.add_defaults(self)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/command/sdist.py", line 228, in add_defaults
    self._add_defaults_ext()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/command/sdist.py", line 311, in _add_defaults_ext
    build_ext = self.get_finalized_command('build_ext')
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/cmd.py", line 299, in get_finalized_command
    cmd_obj.ensure_finalized()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/distutils/cmd.py", line 107, in ensure_finalized
    self.finalize_options()
  File "/private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-install-gwr73n6u/matplotlib/setup.py", line 88, in finalize_options
    self.distribution.ext_modules[:] = [
  File "/private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-install-gwr73n6u/matplotlib/setup.py", line 91, in <listcomp>
    for ext in package.get_extensions()
  File "/private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-install-gwr73n6u/matplotlib/setupext.py", line 345, in get_extensions
    add_numpy_flags(ext)
  File "/private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-install-gwr73n6u/matplotlib/setupext.py", line 469, in add_numpy_flags
    import numpy as np
  File "/private/var/folders/b3/v9dt6kxj0fx7k40tn4ppx6t40000gn/T/pip-install-gwr73n6u/matplotlib/.eggs/numpy-1.19.2-py3.9-macosx-10.9-x86_64.egg/numpy/__init__.py", line 286, in <module>
    raise RuntimeError(msg)
RuntimeError: Polyfit sanity test emitted a warning, most likely due to using a buggy Accelerate backend. If you compiled yourself, see site.cfg.example for information. Otherwise report this to the vendor that provided NumPy.
RankWarning: Polyfit may be poorly conditioned

----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Environment

  • scikit-uplift (0.2.0):
  • scikit-learn (0.22.1):
  • Python (3.9):
  • OS (MacOS)

Site menu on 404 page

📚 Documentation

Now when you get to page 404, the standard readthedocs page falls out:

image

It would be great if when opening a non-existent page, the user saw the menu on the left

image

Another creative design of the content of page 404 is welcome 😄

Design of Jupyter notebooks in the same style

💡 Feature request

We already have a relatively large number of notebooks. It is important that they are all in the same style:

  • The title in the general style
  • Header with links to colab, nbviewer, github
  • a cell for installing the necessary dependencies, so that it works on colab too
  • If possible, approximately the same structure (for example, all EDA notebooks, all tutorials notebooks)
  • It is important that the text in the notebooks does not overlap with the documentation. If such intersections occur, then it is necessary to delete them and add a link to the documentation.

create ClassTransformationReg | sklift.models

💡 Feature request

Create another ClassTransformation model (aka Transformed Outcome approach) as ClassTransformationReg.

There is a ClassTransformation class in models module. It takes classificator object as a parameter.

The goal is to create another class named ClassTransformationReg that takes regressor object as a parameter.

The transformed outcome approach was presented in the paper Athey, Susan & Imbens, Guido & Ramachandra, Vikas. (2015). Machine Learning Methods for Estimating Heterogeneous Causal Effects.

The main idea of the metod is to transform target as:

image

Where

  • Yobs - previous target
  • W - treatment column
  • p - propensity score (= probability to get into the treatment group W = 1). Estimation of the propensity score can be a scalar (share of W = 1) or vector (use classificator to get it)

And fit a classic model on it that will predict uplift score as a result.

Key Features

  • The base estimator should be a regressor
  • propensity score should be a constant or a vector

Additional info

habr.com [RUS]

EDA notebook for the Criteo dataset

💡 Feature request

There are 4 datasets available for downloading in the datasets module.

It will be great to make EDA notebook on top of the Criteo dataset in notebooks. fetch_criteo function source code.

Points to include:

  • how to download
  • simple dataset EDA (include treatment column value_counts, target column value_counts)
  • simple model fit/predict

Additional context

Notebook example

"About the company" section in each dataset

📚 Documentation

We really appreciate companies that share data with the community for building uplift models.
Therefore, we want to have a section About the company on the page of each dataset.

In this section, we would like to see the company's logo, a link, information about its activities, and much more.

image

Use fetch_x5 func in ReatilHero notebook

💡 Feature request

The dataset is downloaded from the link in the RetailHero tutorials EN and RU. It is inconvenient and looks ugly:

image

We have the fetch_x5 function, which easily allows you to download the dataset.

It is necessary to rewrite these notebooks using the fetch_x5 function.

Add Megafon dataset in sklift.datasets

💡 Feature request

Add new function fetch_megafon that loads and return train part of the Megafon dataset

Motivation

Recently, one of the Russian telecom companies (Megafon) provide the MegaFon Uplift Competition

The real data of Megafon company is sensitive data, and they are very responsible for the confidentiality of information. Therefore, they generated synthetic data, trying to bring them closer to the real case that they encountered.

We want it to be convenient for researchers to use our package so we try to collect datasets for uplift modeling in one place

If you have any problems getting access to the data write to us at [email protected]

Add example of usages for fetch func

📚 Documentation

It is necessary to add examples with code for using these functions to the docstrings:

  • fetch_lenta
  • fetch_x5
  • fetch_criteo
  • fetch_hillstrom

An example of how to do this can be found here.

It would also be nice to add a section See also:, which lists links to other datasets by analogy with metrics.

EDA notebook for the Lenta dataset

💡 Feature request

There are 4 datasets available for downloading in the datasets module.

It will be great to make EDA notebook on top of the Lenta dataset in notebooks. fetch_lenta function source code.

Points to include:

  • how to download
  • simple dataset EDA (include treatment column value_counts, target column value_counts)
  • simple model fit/predict

Additional context

Notebook example

Advanced uplift metrics tutorial notebook

💡 Feature request

Create uplift_metrics_tutorial_advanced.ipynb

As a baseline take uplift_metrics_tutorial.ipynb

Use the same dataset.

The next few points should be displayed:

  1. uplift@k with a small step ot the k parameter (smaller than 0.1, for instance 0.01)

  2. ASD metric

  3. display 2 different model uplift scores on one qini plot

EDA notebook for the Hillstrom dataset

💡 Feature request

There are 4 datasets available for downloading in the datasets module.

It will be great to make EDA notebook on top of the Hillstrom dataset in notebooks. fetch_hillstrom function source code.

Points to include:

  • how to download
  • simple dataset EDA (include treatment column value_counts, target column value_counts)
  • simple model fit/predict

Additional context

Notebook example

Request for contribution: Including Interaction terms (X*T) for SoloModel

Hey ! I was wondering if I could contribute to scikit-uplift by including an additional parameter to the SoloModel class.

According to the paper (Lo, Victor. 2002.
The True Lift Model - A Novel Data Mining Approach to Response Modeling in Database Marketing. SIGKDD Explorations. 4. 78-86.), looking at equation (6), which takes the general form, interaction terms are included in the model.

New changes would have the following:

sm = SoloModel(CatBoostClassifier(verbose=100, random_state=777), treatment_interaction=False)

  • We would pass in treatment_interaction variable at a class level, and check for it in both the fit and predict methods to include all feature interactions in X array.

  • If you have any other suggestions to change it elsewhere, we could do that as well.

Kindly let me know your thoughts.

Check_consistent_length not found (plot_uplift_by_percentile)

`NameError: name 'check_consistent_length' is not defined

NameError Traceback (most recent call last)
in
----> 1 eval_obja = evaluate_uplift(models, datasets, feature_group, features_to_add[i], model_type)

in evaluate_uplift(lrModels, datasets, feature_list, features_to_add, model_type)
69 #break
70 print (key + " Curves")
---> 71 plot_uplift_by_percentile(y_true=dataset_uplift_eval["label"], uplift=dataset_uplift_eval["uplift_score"], treatment=dataset_uplift_eval["treatment"], strategy='overall', bins=100)
72 plot_uplift_qini_curves(y_true=dataset_uplift_eval["label"], uplift=dataset_uplift_eval["uplift_score"], treatment=dataset_uplift_eval["treatment"], perfect=True)
73 print ("Finished plotting curves")

/databricks/python/lib/python3.7/site-packages/sklift/viz/base.py in plot_uplift_by_percentile(y_true, uplift, treatment, strategy, bins)
129
130 n_samples = len(y_true)
--> 131 check_consistent_length(y_true, uplift, treatment)
132
133 if strategy not in strategy_methods:

NameError: name 'check_consistent_length' is not defined`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.