Code Monkey home page Code Monkey logo

Comments (6)

ricardoV94 avatar ricardoV94 commented on May 28, 2024 1

@AlexAndorra,

A better model would be the one that actually corresponds to the data generating process. To achieve that it should include the income information off all categories:

with pm.Model() as m_11_13:
    a = pm.Normal('a', 0., 1., shape=2)
    b = pm.HalfNormal('b', 0.5)
    s1 = a[0] + b*income[0]
    s2 = a[1] + b*income[1]
    s3 = 0 + b*income[2]
    
    p = tt.nnet.softmax(pm.math.stack([s1, s2, s3]))
    obs = pm.Categorical('career', p=p, observed=career)
    
    trace_11_13 = pm.sample(target_accept=.95)

The model above runs just fine and makes correct inferences. Equivalently, if you want to shift one of the categories to be zero before the softmax transformation (which is not needed), you should do the following:

with pm.Model() as m_11_13_alt:
    a = pm.Normal('a', 0., 1., shape=2)
    b = pm.HalfNormal('b', 0.5)
    s1 = a[0] + b*(income[0] - income[2])
    s2 = a[1] + b*(income[1] - income[2])
    s3 = 0
    
    p = tt.nnet.softmax(pm.math.stack([s1, s2, s3]))
    obs = pm.Categorical('career', p=p, observed=career)
    
    trace_11_13_alt = pm.sample(target_accept=.95)

Which is not the same as simply setting the unnormalized probability of the pivot category to zero, as the original model was doing. I think this stems from a confusion (which I also used to have) about conventional multinomial models where the predictors are shared and different choices have different coefficients. In that more common case, setting the pivot coefficients (and intercept) to zero, is exactly the same as setting the unnormalized probability of the pivot category to zero.

Here is a Colab Notebook showing the results: https://colab.research.google.com/drive/1GajarpZ3M-QLs7pQWLDpoqC-YOWBp7aU?usp=sharing

from pymc-resources.

AlexAndorra avatar AlexAndorra commented on May 28, 2024

Thanks for your insight @ricardoV94 ! Indeed this original model has some problems. I did try to bound the b coefficient but (I don't remember why exactly but I remember banging my head against the wall quite a lot 😅 ) this didn't work at all -- I think it's related to the choice of pivot.

I reached out to R. McElreath about this but he never answered. So I opted for the model that seemed the most accurate and close to the book to me (and as you can see, b is indeed inferred to be positive when you take the right pivot).

As a result, I don't think it's an issue with the PyMC model per se, so I'm closing this, but feel free to reopen if you think you can do a PR with a better model -- that'd be awesome!

from pymc-resources.

AlexAndorra avatar AlexAndorra commented on May 28, 2024

Thanks a lot @ricardoV94, this makes sense and is very interesting!
Wanna make a PR to implement this in the NB? Also, do you happen to have a reference that helped you clear that confusion between the two types of multinomial models?

from pymc-resources.

ricardoV94 avatar ricardoV94 commented on May 28, 2024

@AlexAndorra,

I can do the PR if you think it makes sense to include this model which is different from the one implemented in the book (I guess the current one is already different).

Two sources really helped me clear the confusion:

  • Hoffman, S. D., & Duncan, G. J. (1988). Multinomial and conditional logit discrete-choice models in demography. Demography, 25(3), 415-427. pdf link
  • Croissant, Y. (2020). Estimation of Random Utility Models in R: The mlogit Package. Journal of Statistical Software, 95(1), 1-41. pdf link

This second paper describes the R library mlogit, which can run these types of multinomial models. You can jump to section 2.2. Model description for the relevant distinction between shared and unique coefficients / covariates.

The terms are a bit confusing, but this gets to the heart of if:

It is clear from the previous expression that coefficients of choice situation specific variables
(the intercept being one of those) should be alternative specific, otherwise they would disap-
pear in the differentiation. Moreover, only differences of these coefficients are relevant and can
be identified. For example, with three alternatives 1, 2 and 3, the three coefficients γ 1 , γ 2 , γ 3
associated to a choice situation specific variable cannot be identified, but only two linear
combinations thereof. Therefore, one has to make a choice of normalization and the simplest
one is to simply set γ 1 = 0.

Coefficients for alternative and choice situation specific variables may (or may not) be al-
ternative specific. For example, transport time is alternative specific, but 10 min in public
transport may not have the same impact on utility than 10 min in a car. In this case, al-
ternative specific coefficients are relevant. Monetary cost is also alternative specific, but in
this case, one can consider that 1$ is 1$ whether it is spent for the use of a car or in public
transports. In this case, a generic coefficient is appropriate.

He doesn't write it out loud, but the second type of coefficients do not need to be pivoted, as you can also see from the output of the model in section 3.5. Application.

The author also clears a bit the confusion between different terms people have been using:

A logit model with only choice situation specific variables is sometimes called a multinomial
logit model, one with only alternative specific variables a conditional logit model and one with
both kind of variables a mixed logit model. This is seriously misleading: conditional logit
model is also a logit model for longitudinal data in the statistical literature and mixed logit
is one of the names of a logit model with random parameters. Therefore, in what follows, we
will use the name multinomial logit model for the model we have just described whatever the
nature of the explanatory variables used.

from pymc-resources.

AlexAndorra avatar AlexAndorra commented on May 28, 2024

This is super useful! Honestly, I think this would clearly be valuable to make a PR out of it (and you can briefly summarize the problem and differences as you did there + link to the two references).
As the original model from the book seems to have issues, me might as well err towards a stratistically sounder one, while expliciting our choice -- and as you said, the PyMC3 model is already different 🤷‍♂️
Reopening as a consequence, to link the issue to the PR 😉

from pymc-resources.

ricardoV94 avatar ricardoV94 commented on May 28, 2024

Will do ;)

from pymc-resources.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.