Code Monkey home page Code Monkey logo

shapley_regressions's Introduction

Shapley regressions code base (BoE SWP 784)

This repository provides the code, data and results used for Bank of England Staff Working Paper 784

"Shapley regressions: A framework for statistical inference on machine learning models"

by Andreas Joseph (March 2019).

The paper introduces a well-motivated and rigorous approach to address the black-box critique of machine learning models. Model interpretability is transferred to a multiple linear regression analysis - one of the most transparent and most widely used modelling techniques.

The output of machine learning models can now be presented as a regression table. The example below shows inference results for modelling changes in UK and US unemployment using quarterly macroeconomic time series. It compares several machine learning models (columns 1-3 for each country) with a linear regression (Reg column). As expected, all models learn similar variable dependencies, while machine learning models are generally more accurate (RMSE) and provide richer information, e.g. about non-linearity of the data generating process. Please see Table 4 in the paper for technical details.

The material provided here allows to reproduce all empirical and simulation results in the paper. It is not intended as a stand-alone package. However, parts of it may be transfered to other applications. No warranty is given. Please consult the licence file.

Should you have any queries or spot an issue, please email to [email protected] or raise an Issue within the repository.

Link to paper: www.bankofengland.co.uk/working-paper/2019/shapley-regressions-a-framework-for-statistical-inference-on-machine-learning-models

Download of full results: https://www.dropbox.com/s/bkdjpbqrabgtwr4/SWP784_all_results.zip?dl=0

Code structure

- 1_macro_Shapley_regressions.py: UK and US macroeconomic time series analysis using 
	machine learning (ML) models and Shapley regressions for statistical inference (section 5.2 of paper).
- 2a_ML_inference_simulation.py: Simulation of polynomial data-generating processes and
	ML inference based on Shapley decompositions and reconstruction
	(suited for parallel/cloud processing, section 5.1 of paper).
- 2b_ML_inference_analysis.py: Collection of simulation results and graphical output (section 5.1 of paper). 
- ML_inference_aux.py: Auxiliary code for parts 1 and 2, application-specific inputs and 
	general functions (partly inherited from https://github.com/andi-jo/ML_projection_toolbox).
	shapley_coeffs() calculates Shapley share coefficients (SSC).

Instructions

- Parts 1 and 2 are independent from each other.
- Part 2b depends on 2a or on pre-computed results (SWP results are provided in
	ML_inf_joint_results_swp.pkl).
- The "main_dir" variable needs to be set in both parts.
- options can be set at the beginning of parts 1 and 2 (a and b).
- Please consult the comments in the codes and docstrings for further documentation.

Dependencies & versions

- python (3.6.8, Anaconda distribution has been used)
- numpy (1.15.4)
- scipy (1.2.0)
- pandas (0.24.1)
- sklearn (0.20.2)
- shap (0.28.3)
- statsmodels (0.9.0)
- matplotlib (3.0.2)
- patsy (0.5.1)

Data & sources

Data description:

- Quarterly marcoeconomic time series (UK: 1955Q1-2017Q4, US: 1965Q1-2017Q4).
- Series are either yoy percentage changes or 1st difference (see Table 2 of the paper).
- For the analysis, series are standardised to have mean zero and standard deviation one.
- raw data and standardised series provided.
- series names: GDP, labour productivity, broad money, private non-financial sector debt, 
	unemployment rate, household gross-disposable income, consumer price inflation, 
	central banks main policy rate, current account balance, effective exchange rate.

Individual sources by ID:

- BOE: IUQLBEDR, XUQLBK82, IUQLBEDR, LPQAUYN.
- ONS: D7BT, UKEA,PGDP, PRDY, MGSX.
- BIS: US private sector debt: Q:US:P:A:M:XDC:A
       UK: ERI, GBP/USD (1955 only).
- OECD: US CPI, US M3, US GDP, US Unemployment, US CA.
- FRED: RNUSBIS, FEDFUNDS, PRS85006163, A229RX0:
- A Millennium of UK Data, Ryland Thomas (2017): private sector debt, 
	M4, labour productivity.

Data Classification

Bank of England Data Classification: OFFICIAL BLUE

shapley_regressions's People

Contributors

alfirrell avatar andi-jo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.