jwiley / brmsmargins Goto Github PK

View Code? Open in Web Editor NEW

20.0 20.0 2.0 1.13 MB

Average marginal estimates for Bayesian models fit with the brms package

Home Page: https://joshuawiley.com/brmsmargins

License: Other

R 89.46% C++ 10.08% C 0.46%

brmsmargins's People

Contributors

Stargazers

Watchers

Forkers

allisterh jwrozelle

brmsmargins's Issues

Question about difference between

Dear Joshua,

Thank you very much for developing the brmsmargins package. I am trying to understand a paragraph included in the help file for the brmsmargins() function and would appreciate any clarifications you are able to provide. The paragraph in question is as follows:

"Use the fitted() function to generate predictions based on this modified dataset. If effects is set to “fixedonly” (meaning only generate predictions using fixed effects) or to “includeRE” (meaning generate predictions using fixed and random effects), then predictions are generated entirely using the fitted() function and are, typically back transformed to the response scale. For mixed effects models with fixed and random effects where effects is set to “integrateoutRE”, then fitted() is only used to generate predictions using the fixed effects on the linear scale. For each prediction generated, the random effects are integrated out by drawing k random samples from the model assumed random effect(s) distribution. These are added to the fixed effects predictions, back transformed, and then averaged over all k random samples to perform numerical Monte Carlo integration."

Question 1

Can you confirm whether the last part of the above paragraph, which starts with "For each prediction generated, the random effects are integrated out", applies to each of the settings effects = "fixedonly”, effects = "includeRE" and effects =“integrateoutRE”? Or does it apply only to the effects =“integrateoutRE” setting?

Question 2

Can you confirm whether both effects = "fixedonly” and effects =“integrateoutRE” ignore the random effects (i.e., they set them to 0), with the difference between them being that effects = "fixedonly” returns the results on the natural response scale while effects =“integrateoutRE” on the link scale? (For example, for a mixed effects binary regression model, effects = "fixedonly” returns fitted probabilities whereas effects =“integrateoutRE” returns fitted log odds.)

Question 3

If both effects = "fixedonly” and effects =“integrateoutRE” set the random effects in the fitted part of the model to 0, I do not understand why we would want to "integrate out the random effects" for these two settings. The only setting for which it would make sense to "integrate out the random effects" would be effects = "includeRE" (since that is the only setting where the random effects would NOT be set to 0; unless, perhaps, there are multiple random effects in the model and only some of those would be set to 0?). I don't see a re_formula as an argument to brmsmargins() which would be used to indicate which random effects are set to 0 and which are not, so I am not sure how one would even specify which random effects are set to 0 and which are kept in the model?

Question 4

When would one want to use effects = "includeRE" versus effects =“integrateoutRE”?

Question 5

Getting back to the example of a mixed effects binary logistic regression model (with just a random intercept u for simplicity and 2 predictors, X1 and X2), if I needed to compute something like Prob(Y = 1|X1, X2, u) for X1 and X2 set at 2 relevant values x1 and x2 and u averaged across the observed random intercept values, would there be a way to get this out of brmsmargins? If I then needed to compute Prob(Y = 1|X1, X2, u) but averaged across random realizations of u from the assumed distribution of the random intercepts - say, Normal(0, sigma_u^2) with sigma_u^2 estimated from the data - would there be a way to compute this via brmsmargins? What does the latter estimate? What does the former estimate? Why not use the former instead of the latter (e.g., because we can get a better approximation if we use more random effects than what was observed in the data)?

Multivariable model "object 'post' not found", p.s. thank you so much for your amazing work

First of all, this package is great - huge thanks to you and to Paul Buerkner for making bayesian modeling so much more accessible for those among us that are slow on learning STAN / JAGS.

edit: correction to a previous version of this issue.

I stumbled on your package trying to estimate average marginal effects for a brms model, and was getting Error in brmsmargins(model.fit$model) : object 'post' not found. Couldn't figure out why when I was reviewing the function code. I have two random intercepts in the model, if that makes a difference, and five fixed effects.

Error when setting newdata argument in brmsmargins

I'm trying to estimate average marginal effects for a societal growth curve model (see Fairbrother 2014) and I've run into an error when specifying the newdata argument. The model is as follows

# Specify the formula for the unconditional linear varying slopes model
sgc_uncond_linvar <- bf(
  churchill ~ time + (1 + time | country_jj) + (1 | survey_tt),
  family = bernoulli(link = "logit"),
  decomp = "QR"
)

# Specify Priors for the model
sgc_uncond_linvar_priors <-
  prior(student_t(3, 0, 2.5), class = "Intercept") +
  prior(normal(0, 2), class = "b") +
  prior(exponential(0.8), class = "sd", group = "survey_tt") +
  prior(exponential(0.8), class = "sd", group = "country_jj") +
  prior(lkj(4), class = "cor")

# Fit the model (6 chains, 3k iterations), takes ~5 hours
sgc_uncond_linvar_fit <- brm(
  formula = sgc_uncond_linvar,
  prior = sgc_uncond_linvar_priors,
  data = model_df,
  cores = 12,
  chains = 6,
  iter = 3000,
  warmup = 1500,
  refresh = 10,
  sample_prior = "yes",
  threads = threading(
    threads = 2,
    grainsize = 500
  ),
  save_pars = save_pars(all = TRUE),
  seed = 666,
  backend = "cmdstanr",
  save_model = str_c(stan_dir, "SGC_HLogit_LV_Unconditional.stan"),
  file = str_c(models_dir, "SGC_HLogit_LV_Unconditional")
)

I then try to estimate an AME for time based on the syntax in the vignette

h <- .001
ames <- brmsmargins(
  object = sgc_uncond_linvar_fit,
  newdata = model.frame(sgc_uncond_linvar_fit) %>% 
    distinct(churchill, time, country_jj, survey_tt),
  add = data.frame(time = c(0, h)),
  contrasts = cbind("AME time" = c(-1 / h, 1 / h)),
  effects = "integrateoutRE", 
  k = 100L, 
  seed = 1234
  )

which fails with the following error

Error in `[.data.table`(data, , ..n) : column(s) not found: L_2[1,1]

Since the time slope only varies across countries there is no L_2 parameter in the model but I'm not sure I understand why its looking for it in the first place?

support multivariate outcomes

Currently brmsmargins does not support any outcome where a call to fitted() would produce more than one value for one input. This means that multivariate models and models like zero-inflated models that have a predicted probability of being zero / non-zero and then predicted value if not zero are not supported.

Error: column(s) not found: [L_1[1,1], L_1[2,1], L_1[1,2], L_1[2,2]]

When I reran the example to get the AMEs and marginal coefficients on the page, I encountered an error like this:

Error: column(s) not found: [L_1[1,1], L_1[2,1], L_1[1,2], L_1[2,2]]

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.14.9 brmsmargins_0.2.0 brms_2.20.6 Rcpp_1.0.11

Comparison to `marginaleffects` (and a few questions)

Hi @JWiley!

I just spent a half day playing with brmsmargins. Awesome, powerful stuff! There are some really neat design ideas in there.

Cards on the table: I was particularly interested in learning about brmsmargins because I am developing a similar R package: marginaleffects. As you know, this is kind of a crowded field, and people are often asking me how the packages relate to each others. So I started writing a vignette with short discussions and side-by-side syntax comparisons for different packages. Obviously, I know that I am fundamentally biased, but I am still trying hard to be as generous as possible in highlighting the benefits of other packages, and in giving them some exposure (we’re all on the same FOSS team!).

I would like to add brmsmargins to that comparisons page. Here’s a preliminary draft of the section:

Rendered HTML: http://www.arelbundock.com/brmsmargins.html
Rmarkdown: https://github.com/vincentarelbundock/marginaleffects/blob/main/sandbox/brmsmargins.Rmd

Note that the notebook requires the development version of marginaleffects:

remotes::install_github("vincentarelbundock/marginaleffects")

I was wondering if you could answer a couple questions I had while trying to replicate the analyses in your vignettes. No pressure if you don’t have time; feel to ignore and close this Issue:

Are there any cool brmsfit features that I ignored in the notebook and that you would like me to highlight?
Do we know why there are small numerical differences between our results in the “integrate out” example?

Cheers!

Vincent

Random-Effect Integration for Factors

Hello,

I'm having a bit of trouble getting random-effect integration to produce marginal sigma effects when the key predictor is a factor. Things work just fine when I ask for just the fixed effects, but they break for the more complex case.

A reproducible example is below:

dmixed <- withr::with_seed(
seed = 12345, code = {
nGroups <- 100
nObs <- 20
theta.location <- matrix(rnorm(nGroups * 2), nrow = nGroups, ncol = 2)
theta.location[, 1] <- theta.location[, 1] - mean(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] - mean(theta.location[, 2])
theta.location[, 1] <- theta.location[, 1] / sd(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] / sd(theta.location[, 2])
theta.location <- theta.location %*% chol(matrix(c(1.5, -.25, -.25, .5^2), 2))
theta.location[, 1] <- theta.location[, 1] - 2.5
theta.location[, 2] <- theta.location[, 2] + 1
dmixed <- data.table::data.table(
x = rep(rep(0:1, each = nObs / 2), times = nGroups))
dmixed[, ID := rep(seq_len(nGroups), each = nObs)]

for (i in seq_len(nGroups)) {
  dmixed[ID == i, y := rnorm(
    n = nObs,
    mean = theta.location[i, 1] + theta.location[i, 2] * x,
    sd = exp(1 + theta.location[i, 1] + theta.location[i, 2] * x))
  ]
}
data.table::copy(dmixed)

})

#Making 'x' into a factor#
dmixed$x<-factor(dmixed$x)

ls.me <- bf(y ~ 1 + x + (1+x|ID),
sigma ~ 1 + x + (1+x|ID))
ls.me_out <- brm(ls.me,
data = dmixed,
chains = 4, cores = 8, iter = 5000, control = list(adapt_delta = .85, max_treedepth = 15),
seed = 7766, save_all_pars = T)

#This seems to work#
ame2b.lsme.f <- brmsmargins(
ls.me_out,
at = data.frame(x = c("0", "1")),
contrasts = cbind("AME x" = c(-1, 1)),
dpar = "sigma",
effects = "fixed", k = 100L, seed = 1234)

#This does not seem to work#
ame2b.lsme.i <- brmsmargins(
ls.me_out,
at = data.frame(x = c("0", "1")),
contrasts = cbind("AME x" = c(-1, 1)),
dpar = "sigma",
effects = "integrateoutRE", k = 100L, seed = 1234)

Fix Documentation & Helpful Errors

Thanks to feedback / suggestions from @jwilliamrozelle several priority areas to improve are:

fix documentation for brmsmargins() that misleading suggests both at and add are optional when at least one is required
ensure documentation for brmsmargins() and prediction() are clear (& check for) which arguments are required vs. optional
add helpful errors for all required arguments in brmsmargins()
add helpful errors for arguments in prediction()

support centered, categorical predictors

In multilevel data, categorical predictors should be centered:

Yaremych, H. E., Preacher, K. J., & Hedeker, D. (2021). Centering categorical predictors in multilevel models: Best practices and interpretation. Psychological Methods.

Marginal effects on categorical predictors (e.g., using dummy coding) typically hold values at "0" and "1" and take the difference in the marginal predictions. This breaks when centering by cluster. Instead, plan is to implement something like:

at = c("Low", "High")

for categorical predictors centered by cluster. Approach is to set the categorical predictors at the "0" and "1" values for each individual person. That is, set the "observed" value at 1, then center using the existing mean, then predict. Repeat setting the "observed" value at 1, then center, then predict. The figure below is an example of this.

jwiley / brmsmargins Goto Github PK

brmsmargins's People

Contributors

Stargazers

Watchers

Forkers

brmsmargins's Issues

Question about difference between

Multivariable model "object 'post' not found", p.s. thank you so much for your amazing work

Error when setting newdata argument in brmsmargins

support multivariate outcomes

Error: column(s) not found: [L_1[1,1], L_1[2,1], L_1[1,2], L_1[2,2]]

Comparison to `marginaleffects` (and a few questions)

Random-Effect Integration for Factors

Fix Documentation & Helpful Errors

support centered, categorical predictors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent