Code Monkey home page Code Monkey logo

pylogit's Introduction

PyLogit Logo

Tests

PyLogit

PyLogit is a Python package for performing maximum likelihood estimation of conditional logit models and similar discrete choice models.

Main Features

  • It supports
    • Conditional Logit (Type) Models
      • Multinomial Logit Models
      • Multinomial Asymmetric Models
        • Multinomial Clog-log Model
        • Multinomial Scobit Model
        • Multinomial Uneven Logit Model
        • Multinomial Asymmetric Logit Model
    • Nested Logit Models
    • Mixed Logit Models (with Normal mixing distributions)
  • It supports datasets where the choice set differs across observations
  • It supports model specifications where the coefficient for a given variable may be
    • completely alternative-specific
      (i.e. one coefficient per alternative, subject to identification of the coefficients),
    • subset-specific
      (i.e. one coefficient per subset of alternatives, where each alternative belongs to only one subset, and there are more than 1 but less than J subsets, where J is the maximum number of available alternatives in the dataset),
    • completely generic
      (i.e. one coefficient across all alternatives).

Installation

Available from PyPi:

pip install pylogit

Available through Anaconda:

conda install -c conda-forge pylogit

or

conda install -c timothyb0912 pylogit

Usage

For Jupyter notebooks filled with examples, see examples.

For More Information

For more information about the asymmetric models that can be estimated with PyLogit, see the following paper

Brathwaite, T., & Walker, J. L. (2018). Asymmetric, closed-form, finite-parameter models of multinomial choice. Journal of Choice Modelling, 29, 78–112. https://doi.org/10.1016/j.jocm.2018.01.002

A free and better formatted version is available at ArXiv.

Attribution

If PyLogit (or its constituent models) is useful in your research or work, please cite this package by citing the paper above.

License

Modified BSD (3-clause). See here.

Changelog

See here.

pylogit's People

Contributors

gboeing avatar sash-ko avatar synapticarbors avatar timothyb0912 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pylogit's Issues

Create contributors updating process

Create or find an automated way to update contributors.rst based on the merged pull requests or git commit record of the develop branch.

For instance, I need to add danphan from #72.

Feature Request : Regularization

Not sure if this branches too far from the intended purpose of the packet, is it possible for pylogit to implement L1 or L2 regularization similar to what scikit-learn does.

Add contributing.md

What

This package is missing a CONTRIBUTING.md file.
Contributors therefore start without clarity on how to:

  • begin
  • understand pylogit internals
  • set up their development environment
  • document their architectural / coding design decisions
  • ensure that their code meets minimum package standards
  • document their changes in docstrings / the changelog / example-notebooks
  • prepare their contribution for review
    • run tests
    • prepare their PR (including what branch to target)

This has already come up here and here.

I should add a CONTRIBUTING.md to address all of the issues above.

relation to softmax

Hi,

Yesterday I reread the paper. I woke up thinking that the logit-type formulation is similar to the softmax normalization. I am a little surprised the comparison was not in the paper.

This is not a call to action, but the start of a conversation. I was going to send you an email, but thot may as well work in public.

To what degree can the paper be described as:

  • Probability of choice is softmax of utility.
  • Utility is S(data).
  • Where S is a nonlinear function
  • Justify and test 4 examples for S

Memory error when fitting model

First, thank you so much for creating this wonderful library!

I'm using pylogit for a multinomial logit model that currently, for test purposes, has just four parameters. I can successfully fit the model with up to about 40,000 rows of training data, but when I use 56,000 rows I get a Python memory error (see below).

I was previously getting a memory error at about 20,000 rows but switched to the develop branch and saw some improvement. My code looks like this:

	model = pl.create_choice_model(trainPD,
		alt_id_col='partid',
		obs_id_col='segmentid',
		choice_col='result',
		specification=spec,
		model_type='MNL')
	model.fit_mle(np.zeros(4)) 

Does it seem as if I’m bumping against a known limitation, or is there something else going on? Can you recommend any fix or workaround? My model will eventually have 100+ parameters which I’m assuming will make the memory situation worse.

Thanks in advance for your help.

-Barton Listick

Traceback (most recent call last):
  File "model_20.py", line 369, in <module>
    run()
  File "model_20.py", line 256, in run
    model.fit_mle(np.zeros(4))
  File "C:\ProgramData\Anaconda3\lib\site-packages\pylogit\conditional_logit.py", line 401, in fit_mle
    just_point=just_point)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pylogit\estimation.py", line 707, in estimate
    results = calc_and_store_post_estimation_results(results, estimator)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pylogit\estimation.py", line 574, in calc_and_store_post_estimation_results
    estimator.convenience_calc_fisher_approx(final_params)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pylogit\estimation.py", line 417, in convenience_calc_fisher_approx
    return cc.calc_fisher_info_matrix(*args)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pylogit\choice_calcs.py", line 931, in calc_fisher_info_matrix
    np.max(rows_to_obs.toarray() * weights[:, None], axis=0)
MemoryError

Estimation process unstable

Hi,

When I try run your code with my sample data, parameter coefficients are not stable, not sure why. Can you have a look? Sometimes even the coefficient sign would be changed as well. Such as my "Male" variable, could be both positive and negative coefficients. Can you explain why? Do you have some random "input" in your estimation process?

Update Paper Reference

The readme file only links to the arxiv version. It should be updated to refer to the Journal of Choice Modelling paper.

Refactor

Request

The current codebase should be substantially simplified and clarified. There are a number of code smells (e.g. mega classes with no separation of concerns), and the codebase could benefit from following object oriented programming principles (e.g. composition over inheritance, dependency injection, etc.). The internals of PyLogit should undergo a large architectural refactor.

Things to do include:

  • reflect on and make use of most appropriate design patterns whenever possible. (Helpful tools here)
  • systematically search for and eliminate code smells
  • make use of more classes internally and less reliance on passing tons of arguments around
  • implement composition over inheritance
  • create custom exceptions
  • use python-valid8 for argument validation in functions
  • use attrs for class definition and validation of class initialization arguments
  • use marshmallow or cattr for serialization
  • use pydeps to generate dependency diagrams showing how modules relate to each other
  • use jupytext to version control notebooks
  • grappa for assertions in tests
  • behavioral testing (in Gherkin / English) for user acceptance testing
  • hypothesis for property testing and finding limits to the code
  • papermill for testing that the example notebooks run successfully

Sparse to Dense

In choice_calcs.py line 930, it appears the library is calling rows_to_obs.toarray() which converts a sparse array to a potentially huge dense array (for my use case, the resulting sparse array is small, and the dense version is about 200 GiB).
Here is the existing code:

weights_per_obs =\
        np.max(rows_to_obs.toarray() * weights[:, None], axis=0)

Is the intended behavior given below?

M = rows_to_obs.multiply(weights.reshape(-1,1))
weights_per_obs = np.max(M, axis=0).toarray().reshape(-1)

If so, I think this is a simple fix.

UI

Request

Create a Web UI for PyLogit so users don't have to write much (if any) code in order to estimate basic (and perhaps not so basic) discrete choice models. One option is to use Streamlit to create the UI. The experience could draw inspiration from the UI of Biogeme, as a starting point.

Some desired features would be that the following workflow all be doable from the UI:

  • design matrix specification
  • model instantiation
    • utility specification
    • model type selection
    • model argument setting (e.g. intercept reference alternative)
  • model estimation
  • model viewing (post-estimation results and summary)
  • model checking (interactively visualize and save model checking plots from one's model)
  • model serialization
  • model prediction (new data, batch or online)

Scikit-API compliance

Request

Numerous users have noted the desire for PyLogit to conform to the Scikit-Learn API. See for example #23 and #41. Such a change would enable the use of discrete choice models inside scikit learn pipelines.

If #51 is implemented, then using skorch or a wrapper around pytorch-lightning would be one way to achieve sklearn compatibility.

Derive and Implement Analytic Hessian for Mixed Logit

This is on the agenda for the coming months, but pull requests or contributions are always welcome.

At the moment, the sum of the outer products of the gradient are used as an approximation to the actual hessian.

LinAlgError: singular matrix (with covariant features + ridge)

I'm getting a numpy.linalg.linalg.LinAlgError: singular matrix error from line 1216 of base_multinomial_cm_v2

I'm getting these errors at what seems like random (I am setting numpy.random.seed(1) before I train the model). Note I am working with a feature matrix that has covariant features, but I'm setting the ridge parameter of fit_mle.

Here's small example of what I think is causing the issue.

import numpy as np
import pandas as pd
import pylogit as pl

from collections import OrderedDict

np.random.seed(1)
# x is a valid feature
x = np.array([1,2,3,1.5,3.5])

# x_redundant is a redundant feature
x_redundant = x * -1

fake_df = pd.DataFrame({"obs_id": [1, 1, 1, 2, 2],
                        "alt_id": [1, 2, 3, 1, 3],
                        "choice": [0, 1, 0, 0, 1],
                        "x": x,
                        "x_redundant": x_redundant,
                        "intercept": [1 for i in range(5)]})

specification = OrderedDict()
specification['x'] = 'all_same'
specification['x_redundant'] = 'all_same'


model = pl.create_choice_model(
    data = fake_df,
    obs_id_col='obs_id',
    alt_id_col='alt_id',
    choice_col='choice',
    specification=specification,
    model_type='MNL'
)

model.fit_mle(np.zeros(2))

If I set model.fit_mle(np.zeros(2), ridge=0.8) the error doesn't get raised in this example, yet in my code I'm setting the ridge=0.8 and the error still gets raised.

I read a few articles/cross-validated posts about singular matrices, but I'm still figuring out what it means. I wish I could be more descriptive in this issue.

Thanks for the help.

EDIT:
Here's a screenshot of my hessian when I'm getting the error:

PyTorch as computational backend

Request

Currently, PyLogit is built atop numpy and scipy.sparse for computational of choice probabilities, gradients, and hessians. This computational backend has at least two problems.

  1. It restricts us to analytical derivatives that must be programmed by hand.
  2. It practically restricts us to batch optimization since stochasatic optimization methods are currently only in libraries with automatic differentiation support.

Note, packages such as autograd and jax are of no help here because they don't support sparse matrices.

PyLogit should move to using PyTorch as its computational backend. There are almost no immediately known downsides. Upsides include resolving both problems above, allowing essentially arbitrary chocie models to be estimated through PyLogit, and providing access to a large and growing ecosystem of tools that are all designed around PyTorch (e.g. for model serialization, for scikit-learn compatibility, for standardization of model estimation code by end users, etc.).

Question about using Pylogit

I am very new to Python programming so please excuse me if my questions are naive.

I think the features of pylogit fit perfectly with what I want to do so I tried to get familiar with using it for a Random Coefficients model by simulating some data and then using the example notebooks to process the data and estimate the model parameters. So the problem I created is very simple, 300 "consumers" choose, on five occasions, between one of 4 alternative products, whose utilities are composed of Brand specific intercept and a price term (beta*Price_alternative). So I transformed these 1500 observations into the long form and tried to use Pylogit to estimate the parameters.

When I generate data from a simple MNL, no random coefficients, it works fine. But in the actual RCL, the values seem very sensitive to starting values and don't move much from the starting values. The thing that struck me the most though was the reported log-likelihood when all the parameters are zero. With 4 alternatives, and 1500 choices this should be 1500*Ln(0.25), but the weird thing is that the number that pylogit reports is very different (-1500 or so) instead of -2079, which makes me think I may be doing something else wrong.

Happy to send you the dataset and my Python notebook. I'd really appreciate any insights you might have about this problem.

Regards,
Siddarth

TypeError: invalid type comparison In[14]: of 'Mlogit Benchmark 2: Kenneth Train's Heating Data'

Hi Timothy,

Thank you so much for sharing this wonderful package! I'm a freshman about this mixed logit model. I've run this Heating dataset in R successfully. When I run the above-mentioned example with Pylogit an error arose at In[14].
I downloaded the 'mlogit_Benchmark--Heating-checkpoint.ipynb' and run it in the Jupyter notebook of Anaconda3 for Python 3.5. And I just installed Pylogit two days ago by 'conda install -c timothyb0912 pylogit'. Here is the error information from Jupyter:

TypeError Traceback (most recent call last)
in ()
6 specification=model_1_spec,
7 model_type="MNL",
----> 8 names=model_1_names)
9
10 # Estimate the given model, starting from a point of all zeros

D:\Program Files\Anaconda3\lib\site-packages\pylogit\pylogit.py in create_choice_model(data, alt_id_col, obs_id_col, choice_col, specification, model_type, intercept_ref_pos, shape_ref_pos, names, intercept_names, shape_names, nest_spec, mixing_id_col, mixing_vars)
223 choice_col,
224 specification,
--> 225 **model_kwargs)

D:\Program Files\Anaconda3\lib\site-packages\pylogit\conditional_logit.py in init(self, data, alt_id_col, obs_id_col, choice_col, specification, names, *args, **kwargs)
296 specification,
297 names=names,
--> 298 model_type=model_type_to_display_name["MNL"])
299
300 # Store the utility transform function

D:\Program Files\Anaconda3\lib\site-packages\pylogit\base_multinomial_cm_v2.py in init(self, data, alt_id_col, obs_id_col, choice_col, specification, intercept_ref_pos, shape_ref_pos, names, intercept_names, shape_names, nest_spec, mixing_vars, mixing_id_col, model_type)
877 specification,
878 alt_id_col,
--> 879 names=names)
880
881 ##########

D:\Program Files\Anaconda3\lib\site-packages\pylogit\choice_tools.py in create_design_matrix(long_form, specification_dict, alt_id_col, names)
694 else: # the group is an integer
695 # Create the variable column
--> 696 new_col_vals = ((long_form[alt_id_col] == group).values *
697 long_form[variable].values)
698 independent_vars.append(new_col_vals)

D:\Program Files\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)
1281
1282 with np.errstate(all='ignore'):
-> 1283 res = na_op(values, other)
1284 if is_scalar(res):
1285 raise TypeError('Could not compare {typ} type with Series'

D:\Program Files\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y)
1167 result = method(y)
1168 if result is NotImplemented:
-> 1169 raise TypeError("invalid type comparison")
1170 else:
1171 result = op(x, y)

TypeError: invalid type comparison

I also copied and pasted the code to Spyder in Anaconda3. It generated the same error.
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Larry/Google Drive/CS/Programming_languages/Python/Python programs/exer_19_03_ML_mixlogit_Pylogit.py', wdir='C:/Users/Larry/Google Drive/CS/Programming_languages/Python/Python programs')

File "D:\Program Files\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 678, in runfile
execfile(filename, namespace)

File "D:\Program Files\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 106, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Larry/Google Drive/CS/Programming_languages/Python/Python programs/exer_19_03_ML_mixlogit_Pylogit.py", line 147, in
names=model_1_names)

File "D:\Program Files\Anaconda3\lib\site-packages\pylogit\pylogit.py", line 225, in create_choice_model
**model_kwargs)

File "D:\Program Files\Anaconda3\lib\site-packages\pylogit\conditional_logit.py", line 298, in init
model_type=model_type_to_display_name["MNL"])

File "D:\Program Files\Anaconda3\lib\site-packages\pylogit\base_multinomial_cm_v2.py", line 879, in init
names=names)

File "D:\Program Files\Anaconda3\lib\site-packages\pylogit\choice_tools.py", line 696, in create_design_matrix
new_col_vals = ((long_form[alt_id_col] == group).values *

File "D:\Program Files\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1283, in wrapper
res = na_op(values, other)

File "D:\Program Files\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1169, in na_op
raise TypeError("invalid type comparison")

TypeError: invalid type comparison

Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Larry/Google Drive/CS/Programming_languages/Python/Python programs/exer_19_03_ML_mixlogit_Pylogit.py', wdir='C:/Users/Larry/Google Drive/CS/Programming_languages/Python/Python programs')

File "D:\Program Files\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 678, in runfile
execfile(filename, namespace)

File "D:\Program Files\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 106, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Larry/Google Drive/CS/Programming_languages/Python/Python programs/exer_19_03_ML_mixlogit_Pylogit.py", line 147, in
names=model_1_names)

File "D:\Program Files\Anaconda3\lib\site-packages\pylogit\pylogit.py", line 225, in create_choice_model
**model_kwargs)

File "D:\Program Files\Anaconda3\lib\site-packages\pylogit\conditional_logit.py", line 298, in init
model_type=model_type_to_display_name["MNL"])

File "D:\Program Files\Anaconda3\lib\site-packages\pylogit\base_multinomial_cm_v2.py", line 879, in init
names=names)

File "D:\Program Files\Anaconda3\lib\site-packages\pylogit\choice_tools.py", line 696, in create_design_matrix
new_col_vals = ((long_form[alt_id_col] == group).values *

File "D:\Program Files\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1283, in wrapper
res = na_op(values, other)

File "D:\Program Files\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1169, in na_op
raise TypeError("invalid type comparison")

TypeError: invalid type comparison

Could you give me some ideas about how to fix it?

Thanks,

Larry

Derive and Implement Analytic Hessian for Nested Logit

I plan to get around to this at some point in the coming months. Pull requests, in the meantime, are always welcome though.

At the moment, the sum of the outer products of the gradient are used as an approximation to the actual hessian.

Plot loss during training

Request

Plot the training loss during model estimation so users can know how the estimation is proceeding. The livelossplot package is probably good for this.

Add method for training in batches to avoid memory errors

Awesome package! Looking forward to probit models :)

Hard for me to provide samples at the moment, but for larger datasets (~10000 observations with an average of 10 alternatives per observation) instantiating the mapping matrices can raise a memory error. This was on a 32gb box. I can't get to the exact call at the moment, but basically memory spikes for one call to allocate a massive matrix.

A batch option to iteratively calculate the utilities would be great.

Prediction on values with no "Choice" column

It seems that prediction requires the data to be in long format, however if you have virgin data where choice is not known and are trying to predict what the choice will be, the wide_to_long function does not seem to allow for not specifying a choice column. Are there plans to fix this?

Future Warning

When implementing the model in python 3.7, numpy 1.16 It encountered the following future warning

//anaconda3/lib/python3.7/site-packages/pylogit/choice_tools.py:703: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
design_matrix = np.hstack((x[:, None] for x in independent_vars))
Log-likelihood at zero: -245.0792
Initial Log-likelihood: -245.0792
//anaconda3/lib/python3.7/site-packages/scipy/optimize/_minimize.py:505: RuntimeWarning: Method BFGS does not use Hessian information (hess).
RuntimeWarning)
Estimation Time for Point Estimation: 0.02 seconds.
Final log-likelihood: -240.9183

Feature Request : Frailty

Not sure if this is already implemented, but does the package include a way to include a Frailty parameter to the likelihood, which simply adds a weight factor to the linear combination of coefficients and parameters.

Equivalent of R strata() term in formula

Hi,
Is there an equivalent of R's strata() term in pylogit?
In R's survival package I would type clogit(y ~x1 + … + xn + strata(c),...) for stratified conditional logistic regression, where c labels the strata. How can I do this in pylogit?

Passing dataset in fit method

I am trying to fit pylogit as one of a classifier group and would like to simulate the sci-kit learn format, is it possible to pass the dataset to the model in the 'fit' method instead of 'init' method?

Hessian is incorrect with ridge penalty

Right now, the calc_hessian function adds a constant penalty to all elements of the hessian matrix when using a ridge penalty. See Line 848

This is incorrect. The mixed partial derivatives in the hessian should cause the penalty to only be applied to the diagonal.

In other words, instead of hess -= 2 * ridge I should use hess -= 2 * ridge * np.identity(hess.shape[0]).

new release

@timothyb0912 would you create a release of your fantastic code commits since last winter? This would make installation much easier, especially for python 3 folks. Latest on pypi dates back to December 2016 and there aren't any releases tagged on GitHub.

Clarification on specification

Hey @timothyb0912, thank you for your work on this library!

I'm trying to apply the conditional logit (MNL) model to racing data. I was working through the examples and I don't fully understand what to pass for my specification parameter.

My data is structured like this:

race_id | driver_id | driver_age | attr_a | attr_b | etc.. | was_winner

pl.create_choice_model(
    data       = X,
    alt_id_col = 'driver_id',
    obs_id_col = 'race_id',
    choice_col = 'was_winner',
    specification = ...
)

I saw issue #21 , which is a similar question to mine and currently I'm setting my specification like you suggested:

characteristics = ['driver_age', 'attr_a', 'attr_b']
for variable in characteristics:
    spec[variable] = 'all_same'

The problem is I don't really understand why I'm setting the specification that way. I don't really understand what the specification param represents.

  • Am I setting the param properly?

  • Is this model the correct type for my problem?

Thanks again for your time writing/maintaining pylogit,
George

Issue when trying to run nested logit example

Great package!, thanks for sharing it and for providing very useful examples...I was trying to run the example for nested logit estimation but I am getting this error after I try to fit the model:

ufunc 'isinf' not supported for the input types, and the inputs could not be safely coerced to any
supported types according to the casting rule ''safe''

These are the calls for the error:

File "..\lib\site-packages\pylogit\nested_logit.py", line 261, in convenience_calc_log_likelihood
log_likelihood = general_log_likelihood(*args, **kwargs)
File "..\lib\site-packages\pylogit\nested_choice_calcs.py", line 317, in calc_nested_log_likelihood
return_type='long_probs')
File "..\lib\site-packages\pylogit\nested_choice_calcs.py", line 157, in calc_nested_probs
inf_idx = np.isposinf(ind_exp_sums_per_nest)
File "..\lib\site-packages\numpy\lib\ufunclike.py", line 34, in func
return f(x, out=out, **kwargs)
File "..\lib\site-packages\numpy\lib\ufunclike.py", line 141, in isposinf
return nx.logical_and(nx.isinf(x), ~nx.signbit(x), out)

Any ideas of what might be happening? @timothyb0912 @gboeing

Weighting

Hey @timothyb0912, It seems that functionality for weighing (particularly for mixed logit) is almost there. I'm wondering if there's a workaround to pass weights to the estimator or if you'd be open to guiding me through the implementation?

It seems like the weights parameter is needed here:

mixl_estimator = MixedEstimator(self,

Equivalent of Biogeme Correlation of Coefficients Table?

First, thank you for your excellent work on PyLogit, it is immensely useful! This may already exist but is there a way in which to produce an equivalent of the Biogeme correlation of coefficients table? I am aware that the covariance matrix can be obtained in Pylogit by negating the hessian and then inverting, but is there a way in which to include the t-statistics and p-values?

Thanks!

Remove personal anaconda channel from README

What

Currently, the README references both the conda-forge and the timothyb0912 anaconda channels.
Given that I didn't get the conda skeleton build to succeed for my personal channel, I should remove it from the repo in favor of only pointing users to the conda-forge version.

Pre-commit hooks

Request

To improve code quality and consistency, add pre-commit hooks such as

  • isort / reorder-python-imports
  • flake8
  • black

Fitting into an Ensemble

I am trying to fit the MNL model into an ensemble of classifiers by scikit-learn.
Is there a way to modify the syntax manually so that is similar to the ones in scikit-learn so it can fit in an ensemble
ie When defining the model (init) : pass in the hyperparameters eg : ridge
And passing the data during the fitting processing instead of the defining process
fit_mle : pass in the Choice_Column, basic_specification, data, sample weight
Is there any wrapper for scikit-learn?
Hope it works because it seems pylogit is the only reliable MNL model for python.

Lower memory usage for hessian calculation with large design matrices

Purpose:

This issue is being opened to centralize and publicize discussion on efforts to reduce the memory usage for the computation of the hessian.

Problem:

Currently, large memory spikes occur during the hessian calculation if one's design matrix is large. See for example, Issue 9 and Issue 22.

Root cause:

The problem seems to be caused by the memory allocation for an array dp_dh used during the calculation of the hessian. If one's design matrix has shape (num_rows, num_columns), then dp_dh has shape (num_rows, num_rows). See line 734.

Helpful information:

For the math of what this matrix is used for, see section 3.3 of the pylogit computation document, especially section 3.3.10. dp_dh in the code is the same as dp_ds in the document. Apologies in advance for confusion caused by writing the code and writing the document at different times with different notation.

Question about syntax

The model i am trying to build is predicting the probability of result = 1 given a set of samples, condition being all probi with a set sums to 1

Data
Result ID Attribute_1 Attribute_2
0 1 0.34 0.21
1 1 0.24 0.11
0 1 0.13 0.16
0 1 0.29 0.52
1 2 0.14 0.13
0 2 0.17 0.08
0 2 0.38 0.51
0 2 0.31 0.28

basic_specification_custom = OrderedDict()
basic_specification_custom["Attribute_1"] = [?]
basic_specification_custom["Attribute_2"] = [?]
Choice_Column_custom = "Result"
Obs_Id_Column_custom = "ID"
Model = pylogit.create_choice_model(data=simulation_dataset_train,
alt_id_col=,
obs_id_col=Obs_Id_Column_custom,
choice_col=Choice_Column_custom,
specification=basic_specification_custom,
model_type="MNL")

If the obs_id_col already made sure it starts from 1, what should the alt_id_col and index in the specification dictionary be, thanks

Make static analysis routine

Request

Make use of static analysis tasks such as monitoring code complexity, test coverage, etc. happen on every pull request. See tools like radon for measuring code complexity.

Performance optimization for large matrices

In choice_calcs.py line 928, the library checks if the given weights for computing the weighted log-likelihood are provided and if they are not, it set them to an array of ones, followed by a multiplication and a max (per column) with the rows_to_obs array. However, when the rows_to_obs is pretty large, this can lead to an out of memory error. On the other hand, if the weights are not provided, or they are all one, then I think, we can just set the weights_per_obs to an array of ones without doing the multiplication and max operations, leading to a great improve in performance.

The existing code:

if weights is None:
    weights = np.ones(design.shape[0])
weights_per_obs =\
    np.max(rows_to_obs.toarray() * weights[:, None], axis=0)

and, the proposed fix:

if weights is None or np.all(weights == 1):
    weights_per_obs = np.ones(rows_to_obs.shape[1])
else:
    weights_per_obs = \
        np.max(rows_to_obs.toarray() * weights[:, None], axis=0)

I have created a pull request to address the issue (see #85).

Model attributes and methods not accessible when using point estimation

When the parameter just_point is set to True under the fit_mle method (it could be in other methods as well), one can't access many of the methods and instances are not accessible. For example, when one tries to use the predict method, the resulting error points to the fact that the model does not have a coefs attributes. I'm sure we can get around this by assigning attributes their values.

Please correct me if I am wrong on this issue. Also, more than happy to help solve it.

Create release runbook wiki-page

Request

Create a wiki page on the repo that describes (in detail) the steps to be performed before releasing a new version of PyLogit to PyPI and Conda. Should include instructions about and a checklist for passing all tests, updating the changelog, bumping the package version number, uploading files to PyPI and Conda, etc. See tools like the following for updating one's changelog and the following for bumping one's version number.

Python 2/3 compatibility for xrange()

Hi! I just encountered a small bug using PyLogit on a near-clean install of Anaconda Python 3.6.

choice_calcs.py uses the xrange() function, which has been replaced by range() in Python 3. To get around this, it tries to import xrange() from the future/past library. But Anaconda Python 3 no longer installs future/past by default, so this failed silently.

https://github.com/timothyb0912/pylogit/blob/master/pylogit/choice_calcs.py#L575
https://github.com/timothyb0912/pylogit/blob/master/pylogit/choice_calcs.py#L17-L21

Two easy potential solutions:

  1. Add the future package to requirements.txt

  2. Replace xrange() with range()

The second seems better, but I haven't verified that it will work. I'm happy to take care of this and submit a PR, but in the meantime wanted to write it up in case you've looked into this before!

Does the package work with outside option?

I am trying to fit the model with outside (no-purchase) option. In that, I will not have any feature that is available to this outside alternative. Am I able to fit the MNL model using this package?

Conditional Logistic Regression

When modeling a conditional logit model with five parameters using the MNL model type, what will be the specification dictionary for those attributes?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.