Code Monkey home page Code Monkey logo

lmdiag's Introduction

lmdiag

Python Library providing Diagnostic Plots for Linear Regression Models. (Like plot.lm in R.)

I built this, because I missed the diagnostics plots of R for a university project. There are some substitutions in Python for individual charts, but they are spread over different libraries and sometimes don't show the exact same. My implementation tries to copycat the R-plots, but I didn't reimplement the R-code: The charts are just based on available documentation.

Installation

pip install lmdiag

Usage

lmdiag generates plots for fitted linear regression models from statsmodels, linearmodels and scikit-learn.

You can find some usage examples in this jupyter notebook.

Example

import numpy as np
import statsmodels.api as sm
import lmdiag

# Fit model with random sample data
np.random.seed(20)
X = np.random.normal(size=30, loc=20, scale=3)
y = 5 + 5 * X + np.random.normal(size=30)
X = sm.add_constant(predictor)  # intercept required by statsmodels
lm = sm.OLS(y, X).fit()

# Plot lmdiag facet chart
lmdiag.style.use(style="black_and_red")  # Mimic R's plot.lm style
fig = lmdiag.plot(lm)
fig.show()

image

Methods

  • Draw matrix of all plots:

    lmdiag.plot(lm)

  • Draw individual plots:

    lmdiag.resid_fit(lm)

    lmdiag.q_q(lm)

    lmdiag.scale_loc(lm)

    lmdiag.resid_lev(lm)

  • Print description to aid plot interpretation:

    lmdiag.help() (for all plots)

    lmdiag.help('<method name>') (for individual plot)

Increase performance

Plotting models fitted on large datasets might be slow. There are some things you can try to speed it up:

1. Tune LOWESS-parameters

The red smoothing lines are calculated using the "Locally Weighted Scatterplot Smoothing" algorithm, which can be quite expensive. Try a lower value for lowess_it and a higher value for lowess_delta to gain speed at the cost of accuracy:

lmdiag.plot(lm, lowess_it=1, lowess_delta=0.02)
# Defaults are: lowess_it=2, lowess_delta=0.005

(For details about those parameters, see statsmodels docs.)

2. Change matplotlib backend

Try a different matplotlib backend. Especially static backends like AGG or Cairo should be faster, e.g.:

import matplotlib
matplotlib.use('agg')

Setup development environment

python -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'
pre-commit install

Certification

image

lmdiag's People

Contributors

dynobo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lmdiag's Issues

Python 2 compatibility

@dynobo I want to start off by saying that I love this package. Inferential statistics is super easy in R thanks to things like plot.lm, so lmdiag really helps bridge the gap between Python and R! I was hunting down an easy way to do something similar to plot.lm in Python without having to manually build a bunch of the plots, so I was very happy to find this package.

So I found one case in info.py where you used some Python 3-specific string formatting that causes this package to not work for Python 2. If you're only planning on having this be a Python 3 package, I completely respect that, though it would be nice if that were the case to add a note of that in the README. If you are interested in making this package work for Python 2, here is a super quick fix I made in a fork of this package that fixes the issue for Python 2 (note: I haven't done extensive testing based on this fix, I just know it lets the package be imported without raising an error).

Slow for large datasets

Thank you for creating a nice package. It is very handy to install such a functionality with pip.

However, I find this package to be rather slow for large datasets in comparison with the LinearRegDiagnostic class described in the Linear regression diagnostics example of statsmodels. This may indicate some inefficiency of the package.

Example:

df = sm.datasets.get_rdataset("ames", "openintro").data
res = smf.ols("np.log10(price) ~ Q('Overall.Qual') + np.log(area)", df).fit()

lmdiag

%%time
lmdiag.plot(res)
CPU times: user 15.1 s, sys: 215 ms, total: 15.3 s
Wall time: 16.1 s

LinearRegDiagnostic

%%time
LinearRegDiagnostic(res)()
CPU times: user 2.17 s, sys: 125 ms, total: 2.29 s
Wall time: 2.17 s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.