Just wanted to mention that the first model in section 11.3 is slightly different than

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for your insight <a class="user-mention notranslate" data-hovercard-type="user"

Thanks a lot <a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Rethinking2 Chapter 11 Multinomial model 11.3, different prior for beta parameter is used about pymc-resources HOT 6 CLOSED

pymc-devs commented on May 28, 2024

Rethinking2 Chapter 11 Multinomial model 11.3, different prior for beta parameter is used

from pymc-resources.

Comments (6)

ricardoV94 commented on May 28, 2024 1

@AlexAndorra,

A better model would be the one that actually corresponds to the data generating process. To achieve that it should include the income information off all categories:

with pm.Model() as m_11_13:
    a = pm.Normal('a', 0., 1., shape=2)
    b = pm.HalfNormal('b', 0.5)
    s1 = a[0] + b*income[0]
    s2 = a[1] + b*income[1]
    s3 = 0 + b*income[2]
    
    p = tt.nnet.softmax(pm.math.stack([s1, s2, s3]))
    obs = pm.Categorical('career', p=p, observed=career)
    
    trace_11_13 = pm.sample(target_accept=.95)

The model above runs just fine and makes correct inferences. Equivalently, if you want to shift one of the categories to be zero before the softmax transformation (which is not needed), you should do the following:

with pm.Model() as m_11_13_alt:
    a = pm.Normal('a', 0., 1., shape=2)
    b = pm.HalfNormal('b', 0.5)
    s1 = a[0] + b*(income[0] - income[2])
    s2 = a[1] + b*(income[1] - income[2])
    s3 = 0
    
    p = tt.nnet.softmax(pm.math.stack([s1, s2, s3]))
    obs = pm.Categorical('career', p=p, observed=career)
    
    trace_11_13_alt = pm.sample(target_accept=.95)

Which is not the same as simply setting the unnormalized probability of the pivot category to zero, as the original model was doing. I think this stems from a confusion (which I also used to have) about conventional multinomial models where the predictors are shared and different choices have different coefficients. In that more common case, setting the pivot coefficients (and intercept) to zero, is exactly the same as setting the unnormalized probability of the pivot category to zero.

Here is a Colab Notebook showing the results: https://colab.research.google.com/drive/1GajarpZ3M-QLs7pQWLDpoqC-YOWBp7aU?usp=sharing

from pymc-resources.

AlexAndorra commented on May 28, 2024

Thanks for your insight @ricardoV94 ! Indeed this original model has some problems. I did try to bound the b coefficient but (I don't remember why exactly but I remember banging my head against the wall quite a lot 😅 ) this didn't work at all -- I think it's related to the choice of pivot.

I reached out to R. McElreath about this but he never answered. So I opted for the model that seemed the most accurate and close to the book to me (and as you can see, b is indeed inferred to be positive when you take the right pivot).

As a result, I don't think it's an issue with the PyMC model per se, so I'm closing this, but feel free to reopen if you think you can do a PR with a better model -- that'd be awesome!

from pymc-resources.

AlexAndorra commented on May 28, 2024

Thanks a lot @ricardoV94, this makes sense and is very interesting!
Wanna make a PR to implement this in the NB? Also, do you happen to have a reference that helped you clear that confusion between the two types of multinomial models?

from pymc-resources.

ricardoV94 commented on May 28, 2024

@AlexAndorra,

I can do the PR if you think it makes sense to include this model which is different from the one implemented in the book (I guess the current one is already different).

Two sources really helped me clear the confusion:

Hoffman, S. D., & Duncan, G. J. (1988). Multinomial and conditional logit discrete-choice models in demography. Demography, 25(3), 415-427. pdf link
Croissant, Y. (2020). Estimation of Random Utility Models in R: The mlogit Package. Journal of Statistical Software, 95(1), 1-41. pdf link

This second paper describes the R library mlogit, which can run these types of multinomial models. You can jump to section 2.2. Model description for the relevant distinction between shared and unique coefficients / covariates.

The terms are a bit confusing, but this gets to the heart of if:

It is clear from the previous expression that coefficients of choice situation specific variables
(the intercept being one of those) should be alternative specific, otherwise they would disap-
pear in the differentiation. Moreover, only differences of these coefficients are relevant and can
be identified. For example, with three alternatives 1, 2 and 3, the three coefficients γ 1 , γ 2 , γ 3
associated to a choice situation specific variable cannot be identified, but only two linear
combinations thereof. Therefore, one has to make a choice of normalization and the simplest
one is to simply set γ 1 = 0.

Coefficients for alternative and choice situation specific variables may (or may not) be al-
ternative specific. For example, transport time is alternative specific, but 10 min in public
transport may not have the same impact on utility than 10 min in a car. In this case, al-
ternative specific coefficients are relevant. Monetary cost is also alternative specific, but in
this case, one can consider that 1$ is 1$ whether it is spent for the use of a car or in public
transports. In this case, a generic coefficient is appropriate.

He doesn't write it out loud, but the second type of coefficients do not need to be pivoted, as you can also see from the output of the model in section 3.5. Application.

The author also clears a bit the confusion between different terms people have been using:

A logit model with only choice situation specific variables is sometimes called a multinomial
logit model, one with only alternative specific variables a conditional logit model and one with
both kind of variables a mixed logit model. This is seriously misleading: conditional logit
model is also a logit model for longitudinal data in the statistical literature and mixed logit
is one of the names of a logit model with random parameters. Therefore, in what follows, we
will use the name multinomial logit model for the model we have just described whatever the
nature of the explanatory variables used.

from pymc-resources.

AlexAndorra commented on May 28, 2024

This is super useful! Honestly, I think this would clearly be valuable to make a PR out of it (and you can briefly summarize the problem and differences as you did there + link to the two references).
As the original model from the book seems to have issues, me might as well err towards a stratistically sounder one, while expliciting our choice -- and as you said, the PyMC3 model is already different 🤷‍♂️
Reopening as a consequence, to link the issue to the PR 😉

from pymc-resources.

ricardoV94 commented on May 28, 2024

Will do ;)

from pymc-resources.

Rethinking2 Chapter 11 Multinomial model 11.3, different prior for beta parameter is used about pymc-resources HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent