Code Monkey home page Code Monkey logo

ggeffects's Introduction

ggeffects - Estimated Marginal Means and Adjusted Predictions from Regression Models

DOI Documentation downloads total

Lüdecke D (2018). ggeffects: Tidy Data Frames of Marginal Effects from Regression Models. Journal of Open Source Software, 3(26), 772. doi: 10.21105/joss.00772

Why do we need (marginal/conditional) effects or (adjusted) predicted values?

After fitting a model, it is useful generate model-based estimates (expected values, or adjusted predictions) of the response variable for different combinations of predictor values. Such estimates can be used to make inferences about relationships between variables.

Adjusted predictions or marginal means are often easier to understand than raw regression coefficients. In particular, their visualization allows to intuitively get the idea of how predictors and outcome are associated, even for complex models.

Aims of this package

ggeffects is a light-weight package that aims at easily calculating adjusted predictions and estimated marginal means at meaningful values of covariates from statistical models. Furthermore, it is possible to compute contrasts or pairwise comparisons, to test predictions and differences in predictions for statistical significance. Finally, you can easily produce nice figures to visualize the results.

What you basically would need for your workflow is:

  • predict_response() (understand your results)
  • test_predictions() (check for “significant” results)
  • plot() (communicate your results)

Three core ideas describe the philosophy of the function design and help users to achieve the above mentioned goals:

  1. Functions are type-safe and always return a data frame with the same, consistent structure;

  2. there is a simple, unique approach to calculate adjusted predictions and estimated marginal means for many different models;

  3. the package supports “labelled data” (Lüdecke 2018), which allows human readable annotations for graphical outputs.

This means, users do not need to care about any expensive steps after modeling to visualize the results. The returned as data frame is ready to use with the ggplot2-package, however, there is also a plot()-method to easily create publication-ready figures.

Adjusted predictions or estimated marginal means are always calculated on the response scale, which is the easiest and most intuitive scale to interpret the results.

It is easy to start, you just need one function: predict_response(), and two arguments: the model and the “focal terms”, i.e. the predictors that you are mainly interested in. Examples are shown below.

So, when do I need the ggeffects package?

You should use ggeffects

  • … when you want to understand how predictors and outcome are related, no matter whether you have simple or complex models, interaction or transformed terms. See how to start in this vignette. The syntax for the ggeffects functions is super easy and consistent across the different type of models and complexity.

  • … when you want to perform pairwise comparisons, in order to see whether there are statistically significant differences in the association of, for instance, different groups or categories of your predictors and your outcome (“effects”, or sometimes “marginal effects”). There are several vignettes describing this in detail, starting with this vignette.

  • … when you need impressive figures instead of long, confusing tables to describe your results. There are several examples in the documentation, for example this vignette.

  • … and even when you want to check, whether your model appropriately describes your data. See this vignette to learn more about how to use ggeffects for model diagnostics.

A workflow in R would then include using following functions in this order: predict_response(), plot(), and test_predictions() - that’s all you need! See also this example workflow using logistic regression.

Installation

CRAN parameters status badge

Type Source Command
Release CRAN install.packages("ggeffects")
Development r - universe install.packages("ggeffects", repos = "https://strengejacke.r-universe.dev")
Development GitHub remotes::install_github("strengejacke/ggeffects")

Or you can run ggeffects::install_latest() to install the latest development version from r-universe.

Adjusted predictions at…: marginalizing over non-focal predictors

predict_response() is a wrapper around three “workhorse” functions, ggpredict(), ggemmeans() and ggaverage(). Depending on the value of the margin argument, predict_response() calls one of those functions, with different arguments. It’s important to note that:

  1. Predictions are always returned on the response scale, no matter which model is used. This is the most intuitive scale to interpret your results (the predicted values).

  2. The response is predicted at the values or levels of your focal terms, i.e. you specify the predictors you are mainly interested in, using the terms argument. The predicted values are calculated for these values, while all other predictors are marginalized over.

The margin argument in predict_response() indicates how to marginalize over the non-focal predictors, i.e. those variables that are not specified in terms. Each option answers slightly different questions. Possible values are:

  • "mean_reference" and "mean_mode": "mean_reference" calls ggpredict(), i.e. non-focal predictors are set to their mean (numeric variables), reference level (factors), or “most common” value (mode) in case of character vectors. "mean_mode" calls ggpredict(typical = c(numeric = "mean", factor = "mode")), i.e. non-focal predictors are set to their mean (numeric variables) or mode (factors, or “most common” value in case of character vectors).

    Question answered: “What is the predicted value of the response at meaningful values or levels of my focal terms for a ‘typical’ observation in my data?”, where ‘typical’ refers to certain characteristics of the remaining predictors.

  • "marginalmeans": calls ggemmeans(), i.e. non-focal predictors are set to their mean (numeric variables) or marginalized over the levels or “values” for factors and character vectors. Marginalizing over the factor levels of non-focal terms computes a kind of “weighted average” for the values at which these terms are hold constant. There are different weighting options that can be chosen with the weights argument.

    Question answered: “What is the predicted value of the response at meaningful values or levels of my focal terms for an ‘average’ observation in my data?”. It refers to randomly picking a subject of your sample and the result you get on average.

  • "empirical" (or "counterfactual"): calls ggaverage(), i.e. non-focal predictors are marginalized over the observations in your sample. The response is predicted for each subject in the data and predicted values are then averaged across all subjects, aggregated/grouped by the focal terms. Averaging is applied to “counterfactual” predictions (Dickerman and Hernán 2020). There is a more detailed description in this vignette.

    Question answered: “What is the predicted value of the response at meaningful values or levels of my focal terms for the ‘average’ observation in the population?”. It does not only refer to the actual data in your sample, but also “what would be if” we had more data, or if we had data from a different population.

And what about marginal effects? Marginal effects refer to the difference between two adjacent predictions. They are not the same as marginal means or adjusted predictions. However, calculating contrasts or pairwise comparisons with test_predictions() can be used to test for differences in predictions (aka marginal effects).

Here’s an example:

library(ggeffects)
library(marginaleffects)
m <- lm(Petal.Width ~ Petal.Length + Species, data = iris)

# we want the marginal effects for "Species". We can calculate
# the marginal effect using the "marginaleffects" package
marginaleffects::avg_slopes(m, variables = "Species")
#> 
#>     Term                        Contrast Estimate Std. Error    z Pr(>|z|)    S
#>  Species mean(versicolor) - mean(setosa)    0.435      0.103 4.23   <0.001 15.4
#>  Species mean(virginica) - mean(setosa)     0.838      0.145 5.76   <0.001 26.9
#>  2.5 % 97.5 %
#>  0.234  0.637
#>  0.553  1.123
#> 
#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
#> Type:  response
# here we show that marginal effects refer to the difference between
# predictions for a one-unit change of "Species"
out <- predict_response(m, "Species", margin = "empirical")
out
#> # Average predicted values of Petal.Width
#> 
#> Species    | Predicted |     95% CI
#> -----------------------------------
#> setosa     |      0.77 | 0.61, 0.94
#> versicolor |      1.21 | 1.15, 1.27
#> virginica  |      1.61 | 1.48, 1.74
# marginal effects of "versicolor", relative to "setosa"
out$predicted[2] - out$predicted[1]
#> [1] 0.44
# marginal effects of "virginica", relative to "setosa"
out$predicted[3] - out$predicted[1]
#> [1] 0.84
# finally, test_predictions() returns the same. while the previous results
# report the marginal effect compared to the reference level "setosa",
# test_predictions() returns the marginal effects for all pairwise comparisons
test_predictions(m, "Species")
#> # Pairwise comparisons
#> 
#> Species              | Contrast |       95% CI |      p
#> -------------------------------------------------------
#> setosa-versicolor    |    -0.44 | -0.64, -0.23 | < .001
#> setosa-virginica     |    -0.84 | -1.12, -0.55 | < .001
#> versicolor-virginica |    -0.40 | -0.52, -0.29 | < .001

Documentation and Support

Please visit https://strengejacke.github.io/ggeffects/ for documentation and vignettes. For questions about the functionality, you may either contact me via email or also file an issue.

ggeffects supports many different models and is easy to use

Adjusted predictions can be calculated for many different models. Currently supported model-objects are: averaging, bamlss, bayesx, betabin, betareg, bglmer, bigglm, biglm, blmer, bracl, brglm, brmsfit, brmultinom, cgam, cgamm, clm, clm2, clmm, coxph, feglm, fixest, flac, flic, gam, Gam, gamlss, gamm, gamm4, gee, geeglm, glimML, glm, glm.nb, glmer.nb, glmerMod, glmgee, glmmPQL, glmmTMB, glmrob, glmRob, glmx, gls, hurdle, ivreg, lm, lm_robust, lme, lmerMod, lmrob, lmRob, logistf, logitr, lrm, mblogit, mclogit, MCMCglmm, merModLmerTest, MixMod, mixor, mlogit, multinom, negbin, nestedLogit, nlmerMod, ols, orm, phyloglm, phylolm, plm, polr, rlm, rlmerMod, rq, rqs, rqss, sdmTMB, speedglm, speedlm, stanreg, survreg, svyglm, svyglm.nb, tidymodels, tobit, truncreg, vgam, vglm, wblm, wbm, Zelig-relogit, zeroinfl, zerotrunc.

Support for models varies by marginalization method (the margin argument), i.e. although predict_response() supports most models, some models are only supported exclusively by one of the four downstream functions (ggpredict(), ggemmeans(), ggeffect() or ggaverage()). This means that not all models work for every margin option of predict_response(). Other models not listed here might work as well, but are currently not tested.

Interaction terms, splines and polynomial terms are also supported. There is a generic plot()-method to plot the results using ggplot2.

Examples

The returned data frames always have the same, consistent structure and column names, so it’s easy to create ggplot-plots without the need to re-write the function call. x and predicted are the values for the x- and y-axis. conf.low and conf.high could be used as ymin and ymax aesthetics for ribbons to add confidence bands to the plot. group can be used as grouping-aesthetics, or for faceting.

predict_response() requires at least one, but not more than four terms specified in the terms-argument. Predicted values of the response, along the values of the first term are calculated, optionally grouped by the other terms specified in terms.

Adjusted predictions for one focal predictor

library(ggeffects)
library(splines)
library(datawizard)
data(efc, package = "ggeffects")
efc <- to_factor(efc, c("c161sex", "e42dep"))
fit <- lm(barthtot ~ c12hour + bs(neg_c_7) * c161sex + e42dep, data = efc)

predict_response(fit, terms = "c12hour")
#> # Predicted values of barthtot
#> 
#> c12hour | Predicted |       95% CI
#> ----------------------------------
#>       4 |     89.91 | 84.18, 95.63
#>      12 |     89.34 | 83.62, 95.06
#>      22 |     88.63 | 82.90, 94.36
#>      36 |     87.64 | 81.88, 93.40
#>      49 |     86.72 | 80.90, 92.53
#>      70 |     85.23 | 79.30, 91.16
#>     100 |     83.10 | 76.92, 89.29
#>     168 |     78.28 | 71.24, 85.33
#> 
#> Adjusted for:
#> * neg_c_7 =       11.83
#> * c161sex =        Male
#> *  e42dep = independent

A possible call to ggplot could look like this:

library(ggplot2)
mydf <- predict_response(fit, terms = "c12hour")
ggplot(mydf, aes(x, predicted)) +
  geom_line() +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.1)

However, there is also a plot()-method. This method uses convenient defaults, to easily create the most suitable plot for the predictions.

mydf <- predict_response(fit, terms = "c12hour")
plot(mydf)

Adjusted predictions for several focal predictors

With three variables, predictions can be grouped and faceted.

result <- predict_response(fit, terms = c("neg_c_7", "c161sex", "e42dep"))
# we want a more compact table, thus we use `print()` explicitly
print(result, collapse_table = TRUE, collapse_ci = TRUE)
#> # Predicted values of barthtot
#> 
#> neg_c_7 | c161sex |               e42dep |     Predicted (95% CI)
#> -----------------------------------------------------------------
#>       7 |    Male |          independent |  93.73 (87.01, 100.44)
#>      12 |         |                      |  86.89 (81.09,  92.70)
#>      17 |         |                      |  80.62 (73.69,  87.54)
#>      28 |         |                      | 148.54 (85.66, 211.42)
#>       7 |         |   slightly dependent |  87.41 (81.27,  93.56)
#>      12 |         |                      |  80.58 (76.32,  84.84)
#>      17 |         |                      |  74.31 (68.46,  80.15)
#>      28 |         |                      | 142.23 (79.71, 204.75)
#>       7 |         | moderately dependent |  78.29 (72.08,  84.49)
#>      12 |         |                      |  71.46 (67.64,  75.27)
#>      17 |         |                      |  65.18 (59.75,  70.60)
#>      28 |         |                      | 133.10 (70.44, 195.76)
#>       7 |         |   severely dependent |  41.93 (35.66,  48.21)
#>      12 |         |                      |  35.10 (30.98,  39.22)
#>      17 |         |                      |  28.82 (23.41,  34.24)
#>      28 |         |                      |  96.75 (34.08, 159.41)
#>       7 |  Female |          independent |  98.04 (93.06, 103.02)
#>      12 |         |                      |  86.61 (81.85,  91.37)
#>      17 |         |                      |  82.58 (77.33,  87.82)
#>      28 |         |                      |  81.57 (64.41,  98.73)
#>       7 |         |   slightly dependent |  91.73 (87.89,  95.57)
#>      12 |         |                      |  80.30 (77.43,  83.17)
#>      17 |         |                      |  76.26 (72.57,  79.96)
#>      28 |         |                      |  75.26 (58.64,  91.87)
#>       7 |         | moderately dependent |  82.60 (78.62,  86.59)
#>      12 |         |                      |  71.17 (68.79,  73.56)
#>      17 |         |                      |  67.14 (63.95,  70.33)
#>      28 |         |                      |  66.13 (49.52,  82.74)
#>       7 |         |   severely dependent |  46.25 (41.93,  50.57)
#>      12 |         |                      |  34.82 (32.27,  37.37)
#>      17 |         |                      |  30.78 (27.67,  33.90)
#>      28 |         |                      |  29.78 (13.33,  46.23)
#> 
#> Adjusted for:
#> * c12hour = 42.10
ggplot(result, aes(x = x, y = predicted, colour = group)) +
  geom_line() +
  facet_wrap(~facet)

plot() works for this case, as well:

plot(result)

Contrasts and pairwise comparisons

Next, an example of an interaction term. We want to know whether the two slopes are significantly different from each other.

fit <- lm(neg_c_7 ~ c12hour + barthtot * c161sex + e42dep, data = efc)
result <- predict_response(fit, c("barthtot", "c161sex"))
plot(result)

This can be achieved by test_predictions().

test_predictions(result)
#> # (Average) Linear trend for barthtot
#> 
#> c161sex     | Contrast |      95% CI |     p
#> --------------------------------------------
#> Male-Female |     0.01 | -0.01, 0.03 | 0.466

We can conclude that slopes (or “linear trends”) of barthtot for the different groups of c161sex are not statistically significantly different from each other.

More features are explained in detail in the package-vignettes.

Citation

In case you want / have to cite my package, please use citation('ggeffects') for citation information:

Lüdecke D (2018). ggeffects: Tidy Data Frames of Marginal Effects from Regression Models. Journal of Open Source Software, 3(26), 772. doi: 10.21105/joss.00772

References

Dickerman, Barbra A., and Miguel A. Hernán. 2020. “Counterfactual Prediction Is Not Only for Causal Inference.” European Journal of Epidemiology 35 (7): 615–17. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7393612/.

Lüdecke, Daniel. 2018. “Sjlabelled: Labelled Data Utility Functions,” May. https://doi.org/10.5281/zenodo.1249215.

ggeffects's People

Contributors

b0ydt avatar crsh avatar gregorsteiner avatar indrajeetpatil avatar kant avatar mattansb avatar sam-crawley avatar seananderson avatar strengejacke avatar vincentarelbundock avatar wibeasley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ggeffects's Issues

Support

It would be nice to include in the README file some discussion of what users should do if they need support.

Incorrect Predictions for Non-linear Transformations

When using the log transform within the model, the wrong predictions are generated. The ggpredict function seems to treat the logged values of income as the raw values of income and then logs them again to generate the prediction.

> library(car)
> library(ggeffects)
> data(Prestige)
> summary(Prestige$income)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    611    4106    5930    6798    8187   25879 
> mod <- lm(prestige ~ log(income), data=Prestige)
> ggpredict(mod, terms="income")
Following variables had many unique values and were prettified: income. Use `pretty = FALSE` to get smoother plots with all values, however, at the cost of increased memory usage.
# A tibble: 4 x 5
      x predicted conf.low conf.high group
  <int>     <dbl>    <dbl>     <dbl> <fct>
1     6    -101.     -128.     -74.8 1    
2     8     -95.0    -120.     -69.7 1    
3    10     -90.2    -115.     -65.8 1    
4    12     -86.3    -110.     -62.5 1    

The values above in x are the logged values of income. Using the raw values of income to generate predictions from the model, we get something different and more resonable

> predict(mod, newdata=data.frame(income = exp(6)))
        1 
-10.51755 

Allow "constant" option

Add argument "constant", that allows specific values to hold a variable constant at, e.g. constant = c(age = 40) would use "40" for age, instead of the typical value like mean.

Support for factors in plot.ggeffects()

At the moment plot.ggeffects() handle variables that are labelled but fails with factors. Could it be possible to make it handle also factors, as this is rather widely used data type in R.

data(efc)
fitLab <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)
(efLab = ggeffect(fitLab, "c161sex"))
plot(efLab)

efc$c161sex = factor(efc$c161sex, labels = names(attributes(efc$c161sex)$labels))
fitFact <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)
(efFact = ggeffect(fitFact, "c161sex"))
plot(efFact)

ci.style for plot()-method

dot-style and dash-style, currently also available throught the ci-argument, will be options for the ci.style-argument. This argument may also get an option to plot lines and error bars, like in this example: #48

`ggpredict` with `ppd = TRUE` fails for multivariate `brmsfit` objects

Here is a reproducible example:

library(brms)
library(ggeffects)

x <- rnorm(10, 0)
b <- runif(2)
s <- ifelse(diag(2) == 0, 0.23, 1)
er <- mvtnorm::rmvnorm(10, rep(0, 2), s)
y <- apply(t(b), 2, `*`, x) + er

d <- data.frame(y1 = y[,1], y2 = y[,2], x)

m <- brm(cbind(y1, y2) ~ 1 + x,
        data = d)

p <- ggpredict(m, ppd = TRUE)
Error in predictive_interval.default(prdat2) : 
  For the default method 'object' should be a matrix.
traceback()
9: stop("For the default method 'object' should be a matrix.")
8: predictive_interval.default(prdat2)
7: rstantools::predictive_interval(prdat2)
6: get_predictions_stan(model, expanded_frame, ci.lvl, type, faminfo, 
       ppd, terms, ...)
5: select_prediction_method(fun, model, expanded_frame, ci.lvl, 
       type, faminfo, ppd, terms = ori.terms, typical, vcov.fun, 
       vcov.type, vcov.args, ...)
4: ggpredict_helper(model = model, terms = .x, ci.lvl = ci.lvl, 
       type = type, full.data = full.data, typical = typical, ppd = ppd, 
       x.as.factor = x.as.factor, condition = condition, vcov.fun = vcov.fun, 
       vcov.type = vcov.type, vcov.args = vcov.args, ...)
3: .f(.x[[i]], ...)
2: purrr::map(predictors, function(.x) {
       tmp <- ggpredict_helper(model = model, terms = .x, ci.lvl = ci.lvl, 
           type = type, full.data = full.data, typical = typical, 
           ppd = ppd, x.as.factor = x.as.factor, condition = condition, 
           vcov.fun = vcov.fun, vcov.type = vcov.type, vcov.args = vcov.args, 
           ...)
       tmp$group <- .x
       tmp
   })
1: ggpredict(m, ppd = TRUE)

Please see the associated PR #45, which should fix the issue.
You may also consider including a test on a case that looks like the above.

Automated tests

Hi @strengejacke. I'm currently moving my way through your JOSS review. I can't seem to find any information about how the function of the software is verified. Is this done through either manual or automatic tests? My apologies if this information is listed in documentation or paper, but I couldn't find it in either. Thanks!

No applicable method for 'select_'

I figured out that the error comes from a somewhat damaged tidyverse installation. Even though I had some issues with the SE ranges beforehand, now it seems to result in the same plot for both, old and new versions of R and ggplot2.

Since I cannot delete this issue, I am goign to close it again.

Thank you very much for the package!

problems install nloptr causing ggeffects not to install

see astamm/nloptr#40

sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2 scales_0.4.1     assertthat_0.1   lazyeval_0.2.0   plyr_1.8.4       tools_3.3.3      gtable_0.2.0     tibble_1.2      
 [9] remotes_1.1.0    Rcpp_0.12.8      ggplot2_2.2.0    grid_3.3.3       munsell_0.4.3   

x.as.factor = TRUE does not work for several levels?

Hi Daniel, again thank you very much for this nice package!

I have used your package a lot in the last few months to plot numeric as well as factorial explanatories. Until now, everything was working as expected but today I increased the numbers of levels in my factorial explanatory (going from seasons = 4 levels to months = 12 levels). As usual, I transformed the variable to a factor before fitting the model. But however ggpredict() plots a line as for numeric values, even when trying to force it not to do so by adding x.as.factor = TRUE.

Is this the behaviour we should expect? Is there any way to plot this as factors (point + errorbar), even though it are many levels?

Thank you very much in advance!

A bug in ggpredict()?

Hi,

Thank you for sharing very nice package! I might have a bug with ggpredict().
It didn't seem to work with glmer.nb and nbinom1 and nbinom2 families in glmmTMB properly. The following is an example using the Owls dataset.

m00 <- glmmTMB(SiblingNegotiation~SexParent+(1|Nest),data=Owls,family=nbinom1)
m01 <- glmmTMB(SiblingNegotiation~SexParent+(1|Nest),data=Owls,family=nbinom2)
m02 <- glmer.nb(SiblingNegotiation~SexParent+(1|Nest),data=Owls)

ggpredict of the first two models returned an error.

> ggpredict(m00,terms="SexParent")
Error in lf(prdat$fit) : could not find function "lf"

ggpredict of the last model returned results, but the confidence intervals were very large.
Is this expected?

> ggpredict(m02,terms="SexParent")
Note: uncertainty of the random effects parameters are not taken into account for confidence intervals.
# A tibble: 2 x 5
      x predicted conf.low conf.high  group
  <dbl>     <dbl>    <dbl>     <dbl> <fctr>
1     1  5.721317 250.0507  372.7732      1
2     2  6.433730 517.0217  749.4767      1

Thanks!!

Problems with offsets

For calls to ggpredict on glm model objects from poisson family with more than one offset, the following error occurs:
Error in offset(log(xxxxxx)) :
object 'xxxxxx' not found

For calls to ggpredict on glm model objects from poisson family with an offset with arithmetic performed within the offset call (e.g. offset(log(xxxxxx + 0.00001)), the same error as above occurs.

Change default for parameter `ppd` to TRUE

Since the recommendation for running ggpredict with stan models is to set ppd = TRUE, with which I agree that it is the sensible thing to do, shouldn't the default be set to TRUE? Unless I am mistaken, I don't see any unintended side-effect from this choice.

Error message while installing: Ggeffects is corrupt

Hi!

I tried to install sjPlot and all relevant dependencies via github (to run the syntax for plotting my graphs, which you provided earlier), using the following syntax:

library(devtools)
devtools::install_github("strengejacke/sjlabelled", force= TRUE)
devtools::install_github("strengejacke/sjmisc", force= TRUE)
devtools::install_github("strengejacke/sjstats", force= TRUE)
devtools::install_github("strengejacke/ggeffects", force= TRUE)
devtools::install_github("strengejacke/sjPlot", force= TRUE)

However, when I try to install ggeffects, I get the following error message:

Downloading GitHub repo strengejacke/ggeffects@master
from URL https://api.github.com/repos/strengejacke/ggeffects/zipball/master
Installation failed: lazy-load database 'C:/Users/Marco/Documents/R/win-library/3.5/ggeffects/R/ggeffects.rdb' is corrupt
Warning messages:
1: In eapply(ns_env(pkg), force, all.names = TRUE) :
restarting interrupted promise evaluation
2: In eapply(ns_env(pkg), force, all.names = TRUE) :
internal error -3 in R_decompress1

Can you please tell me how to fix this issue? Thanks in advance!

Kind regards,

Marco

closure error

I have a simple multilevel (mixed) model I am fitting with glmer(). Here is the call:

fit.r <- glmer(switch~(1|st)+(1|year)+(1|k.id)+diff, family="binomial", data=subset(sw.sample,party=="R"),control=glmerControl(optimizer="bobyqa"))

I have no problem fitting it, I get sensible results, etc.

> summary(fit.r)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: switch ~ (1 | st) + (1 | year) + (1 | k.id) + diff
   Data: subset(sw.sample, party == "R")
Control: glmerControl(optimizer = "bobyqa")

     AIC      BIC   logLik deviance df.resid 
   697.2    726.4   -343.6    687.2     2549 

Scaled residuals: 
   Min     1Q Median     3Q    Max 
 -1.62  -0.17  -0.10  -0.06 323.79 

Random effects:
 Groups Name        Variance  Std.Dev. 
 k.id   (Intercept) 2.297e-14 1.516e-07
 st     (Intercept) 1.605e+00 1.267e+00
 year   (Intercept) 2.852e-01 5.341e-01
Number of obs: 2554, groups:  k.id, 2229; st, 49; year, 31

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -4.4517     0.3195 -13.934   <2e-16 ***
diff         -4.3385     0.4461  -9.724   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
     (Intr)
diff 0.446 

When I try to use ggeffects on the fit object and the primary variable of interest (diff), I get the following error:

ggpredict(fit.r,terms="diff")
Error in model.frame.default(object, data, xlev = xlev) : 
  invalid type (closure) for variable 'diff'

typical values for mermod object

Hello, I am trying to hold other variables in my model at 0 for continuous and the mode for categorical variables, using ggpredict. When I try to run this:
effect<- ggpredict(model, terms = "x1", type = c("fe", "re"),ci.lvl = NA,typical =c(numeric = "zero", factor = "mode"))

I get the following error:
Error in model.frame.default(delete.response(Terms), newdata, na.action = na.action, : factor x2 has new level 1.

Setting it just to zero gives me the same problem. Is there a way around this? I would ideally like to set factors at their reference level in the model. Thanks.

Richard

Error since updating to version 0.2.2

I've been using ggpredict on a polr object without any problem, but since updating to the latest CRAN version, I get the following error:

Error in UseMethod("droplevels") :
no applicable method for 'droplevels' applied to an object of class "character"

The command that generates this error is:
ggpredict(fit.tri.int.b, terms = c('chain_type', 'sector'))

The sector variable in the data is of class character rather than factor, which seems to be the source of the error. I converted sector to a factor and the error disappears, and that is a simple workaround, but I'd prefer not to have to use factors if I can avoid it. Is this something that could be fixed?

Thanks,
David

Confidence intervals with survey weighted data

I am running into an issue with the confidence intervals produced after ggpredict. Well, not an issue per se, but they don't match the output produced by Stata, although the point estimates match Stata's exactly. Here is a working example:

library(tibble)
library(survey)
library(ggeffects)

# Data set 1

set.seed(123)

df1 <- tibble(id      = seq(1, 100, by = 1),
              gender  = as.factor(rbinom(n = 100, size = 1, prob = 0.50)),
              working = as.factor(rbinom(n = 100, size = 1, prob = 0.40)),
              income  = sample(0:100000, 100),
              pweight = sample(50:500, 100,  replace   = TRUE))


# Apply weights via svydesign

weights <- svydesign(id      = ~id,
                      weights = ~pweight,
                       data    = df1)
                                            

# Logit model with weights

logit <- svyglm(gender ~ working*income,
                family = quasibinomial(),
                data = df1,
                design = weights)

ggpredict(logit, terms = c("working [0, 1]", "income [100, 1000, 10000]"))

I get the following results:

# A tibble: 6 x 5
      x predicted  conf.low conf.high  group
  <dbl>     <dbl>     <dbl>     <dbl> <fctr>
1     0 0.4906395 0.2467707 0.7390463    100
2     0 0.4893027 0.2484972 0.7351775   1000
3     0 0.4759466 0.2652582 0.6955592  10000
4     1 0.5794210 0.2617570 0.8425931    100
5     1 0.5773197 0.2637649 0.8388980   1000
6     1 0.5561654 0.2830412 0.7990959  10000

I do the same exact analysis in Stata. The output from svyglm in R and svy: glm in Stata match perfectly (i.e., point estimates, t-values, confidence intervals). When I run the margins command in Stata, which is just like ggpredict in this example, I get the same point estimate, but different confidence intervals:

margins, at(working=(0 1) income=(100 1000 10000))

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   .4906395   .1375135     3.57   0.001     .2177829    .7634961
          2  |   .4893027   .1356353     3.61   0.000     .2201729    .7584325
          3  |   .4759466   .1173992     4.05   0.000     .2430012     .708892
          4  |    .579421   .1687542     3.43   0.001      .244576    .9142659
          5  |   .5773197    .166619     3.46   0.001     .2467115    .9079279
          6  |   .5561654   .1454702     3.82   0.000      .267521    .8448097
------------------------------------------------------------------------------

As you can see, the confidence intervals don't match. In this case they are relatively similar, but in my real-world analyses, there is a bigger discrepancy. I was wondering if you could shed some light on what is happening? Are the confidence intervals in ggpredict incorporating the survey weights? I know they don't have to exactly match, but since everything else does, it made me wonder how ggpredict is handling the confidence intervals.

Confidence intervals not included for polr

First of all, a great package – thanks a lot for your work. I ran an ordinal logistic regression using the polr function from the MASS package. While the predicted probabilities are returned, the data frame does not include confidence intervals. How can I retrieve them? Thanks for a reply and your help!

library(MASS)
options(contrasts = c("contr.treatment", "contr.poly"))
house_plr <- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)

predict_house <- ggpredict(house_plr, terms = "Type")

head(predict_house)
# # A tibble: 6 x 4
# x predicted response.level  group
# <dbl>     <dbl>          <chr> <fctr>
#   1     1 0.3784493            Low      1
# 2     1 0.2876752         Medium      1
# 3     1 0.3338755           High      1
# 4     2 0.5190445            Low      1
# 5     2 0.2605077         Medium      1
# 6     2 0.2204478           High      1

Request: could you make ggeffects take objects from metafor (class rma.mv rma)?

Dear SJ

I was wondering whether you could get your wonderful package to work with rma.mv and rma objects? I just recently came across (yesterday) your sjstats and this package so wondering how difficult it would be to do this for the metafor package (a package for meta-analysis). In a meta-analysis, we often have a lot of categories so it would be good to know marginal effects but currently, this is not possible (at least easily) with metafor. Anyway, many thanks for these cool packages!

Thanks!

Shinichi

ggpredict is not working for glmmTMB hurdle-models

library(glmmTMB)
library(ggeffects)

fit1 <- glmmTMB(SiblingNegotiation~FoodTreatment+ArrivalTime+(1|Nest),
                data=Owls,
                ziformula=~1,
                family=truncated_poisson)

When I try to use ggeffects on the fit object, I get the following error:

> ggpredict(fit1, "FoodTreatment")
Error in lf(prdat$fit) : could not find function "lf"

(It's working for zero-inflated-models with family=poisson)

ggeffect applied to a factor term

When you apply ggeffect to a factor term, e.g.

ggeffect(mod, "sex")

the values of x in the return object are numbers (i.e. 1 and 2) instead of the factor labels.

Differences between ggeffect and ggpredict

The package is really useful. I have been using Effects for a while, and having it tidy is great.

May I ask about the differences between ggeffect, ggpredict and ggaverage? I am having problems understanding the differences between the first and the other two. Adding ggeffect to the Vignette might be very useful. For instance, I have been trying to replicate the results provided by effects:Effect using predict with no success.

clustered standard errors with glm logit

Hi,

I am wondering if it is possible to use clustered standard errors with ggeffects to plot confidence intervals accordingly. I found this thread on how to do it with the effects package, but it's somewhat counterintuitive, and I was not sure I really get the syntax right. Therefore I wanted something more straightforward, preferably in a format I can then plot with ggplot (hence this package). I need this from a glm logit.

Thanks!

maintain variable names in data frame

first of all: thank you so much for this work! I've been searching for something like this for years. I will most likely introduce ggeffects in the book I am writing.

Still, I am wondering, why in the produced data frames, the original effect names are not kept, but replaced by x and group. As ggpredict even takes the variable names as argument, this seems a missed chance to create fully fledged plots right out of the box.

Memory error with high number of observations

Hi,
I'm trying to plot the fixed effects of a lmer model, but the number of data points seems to high for ggeffects despite being a reasonnalby large dataset.

reproducible example :

library(lme4)
data = data.frame(X=rnorm(30000), Y=rnorm(30000), Z=rep(1:2),cell=rep(c(1:10),3000))
LME <- lmer(X ~ Y * Z + (1|cell), data=data)
library(ggeffects)
dat <- ggpredict(LME, terms = c("Y","Z"))
plot(dat)

Erreur : impossible d'allouer un vecteur de taille 26.8 Go

sessionInfo() :

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
ggeffects_0.3.1 lme4_1.1-15 

Is there a way to reduce the object size ?

Thanks for your package by the way !

Cannot get ggeffects for glm.nb and glm with contiunous covariate

I'm using ggeffects to get marginal effects plotted from different glmer and glm models and I've found out ggeffects gives the following result when asking for a marginal effect of one of the continuous variables with glm.nb or glm models:

[1] x predicted conf.low conf.high
<0 rows> (or 0-length row.names)

For example, I cannot get ggeffects for anything with anosc in this model:
fit<- glm.nb(formula = ncone.san ~ Tratamiento * C + anosc + I(anosc^2) +
anosc:Tratamiento + I(anosc^2):Tratamiento + ivsc, data = datos,
init.theta = 1.133641349, link = log)

ggeffect(fit, c("anosc")) gives 0 rows, while ggeffect(fit, c("ivsc")) gives correct ggeffects:
# Predicted incidents for N.sanas.sum
# x = ivsc

x predicted conf.low conf.high
-1 442.571 254.379 769.991
0 274.681 170.779 441.798
1 170.480 82.409 352.674
2 105.808 35.151 318.496
3 65.670 14.490 297.622
4 40.758 5.898 281.638

Both are scales variables and are numeric (class(fit$model$anosc) = class(fit$model$ivsc)=numeric). The only difference is that anosc has only four unique values, but I don't think this should make a difference.

Any hint of what's going on?
Best regards,
Asier

modify_if() has stricter requirements with data frame columns

This line is problematic with the new implementation of modify():

mf <- suppressWarnings(as.data.frame(purrr::modify_if(mf, is.array, as.vector)))

We're now wrapping around [[<- instead of [<-. The data frame method for [[<- is stricter and throws an error when the output is too long. The method for [<- just truncates with a warning, which you're hiding with suppressWarnings.

Could you fix this problem please? We're planning to release purrr in a couple of weeks.

Error when plotting raw data for values at specific levels

Gives error:

data(efc)
m <- lm(neg_c_7 ~ e42dep * c160age, data = efc)
dat <- ggpredict(m, c("e42dep", "c160age [40,50,60]"))
plot(dat, rawdata = T)

Workaround:

p <- plot(dat, rawdata = F)
library(ggplot2)
p + geom_point(data = attr(dat, "rawdata"),
               mapping = aes(x = x, y = as.numeric(group)), colour = "black", fill = "black")

ggpredict fail when model is coxph with splines and polynomial

Thanks for your awesome package
It, seems that ggpredict fails when I use a coxph model with a polynomial when the variable in Surv() are not called time and status. Works fine for glm()

d1 <- tibble(status = sample(0:1, 100, replace = TRUE), time = rnorm(100, 60, 30), age = rnorm(100, 60,15))
d2 <- tibble(alive = sample(0:1, 100, replace = TRUE), delay = rnorm(100, 60, 30), age = rnorm(100, 60,15))

mod1 <- coxph(Surv(time, status) ~ poly(age, 3), data = d1)
mod2 <- coxph(Surv(delay, alive) ~ poly(age, 3), data = d2)

# the first one will work while the second one will fail
ggpredict(mod1, "age")
ggpredict(mod2, "age")


## Not happenning without poly or ns
mod2_ter <- coxph(Surv(delay, alive) ~ age, data = d2)
ggpredict(mod2_ter)

## glm works
mod2_bis <- glm(alive ~ poly(age, 3), data = d2)
ggpredict(mod2_bis, "age")

The functions fails with "Error: Unknown columns time and status"

Hope it won't be to hard to fix
Thanks again, work like a charm for interaction and others and will likely replace rms::Predict() for me !

Question regarding squared terms and multiple interactions

Hi,

first of all thanks for yet another nice package :)
I don't fully understand whether in ggeffects all interactions and polynomial terms are correctly accounted for when computing average marginal effects.
As an example, this is a nbreg model, effects calculated with your package:

fit <- glm.nb(comments_count_fb ~ maxprop + I(day^2) + day * maxprop , data = d)
ame <- ggaverage(fit, terms = c("day", "maxprop"))

ame %>% filter(group %in% c(6,9)) %>% 
ggplot( aes(x, predicted, colour = group)) + 
  geom_line()  + ylim(0,2000) + theme_bw() + 
  scale_x_continuous(breaks = c(6,63,123,184,246,308, 369, 429,489),
                     labels = c("15/01", "15/03", "15/05",
                                "15/07", "15/09", "15/11",
                                "16/01", "16/03", "16/05"))

image

And this is the same in Stata (the coefficients are equivalent):

 nbreg comments_count_fb c.day##c.day##i.maxprop
 margins, at(day=(6  63  123  184  246 308 369  429  489) maxprop=(9 6)) 

 marginsplot, legend(ring(0) position(12) bmargin(large)) title("") ///
  xlabel(6 "15/01" 63 "15/03"  123 "15/05" ///
 184 "15/07" 246  "15/09"  308 "15/11" 369  ///
 "16/01"  429 "16/03" 489 "16/05",  angle(vertical)) ///
 ytitle("Predicted number of Comments") scheme(s1mono)

graph

Especially for the red curve this looks a little different to me. Do you have an idea why? And is there a reason why confidence intervals are missing for ggaverage() estimates? The two columns conf.low and conf.high are filled with NA values for me.

memory allocation error with ggeffect, ggpredict

I have a three-way interaction from an lmer model that I was able to summarize and plot easily with the effects package. However, when I try to summarize it using ggeffect I get the following error message: Error: cannot allocate vector of size 84619.5 Gb. With ggpredict I get a similar but less alarming message (the vector size is ~8.3 Gb). I have no sample data I can share, but the fact that this works in effects but not ggeffects makes me suspect a bug.

`ggeffect()` not working for `mgcv::gam()`

Hello, I'm getting errors when trying to use ggeffect() on a model fitted by mgcv::gam() with and without spline terms.

> library(mgcv)
Loading required package: nlme
This is mgcv 1.8-24. For overview type 'help("mgcv-package")'.
> library(ggeffects)
> 
> fit1 <- gam(mpg ~ s(wt), data=mtcars)
> ggeffect(fit1, "wt")
Error in model.frame.default(mpg ~ s(wt) + wt, data = list(mpg = c(21,  : 
  invalid type (list) for variable 's(wt)'
> 
> fit2 <- gam(mpg ~ wt, data=mtcars)
> ggeffect(fit2, "wt")
Error in estimate.gam(G, method, optimizer, control, in.out, scale, gamma,  : 
  unknown smoothness selection criterion
> 
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Manjaro Linux

Matrix products: default
BLAS: /usr/lib/libblas.so.3.8.0
LAPACK: /usr/lib/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggeffects_0.4.0 mgcv_1.8-24     nlme_3.1-137   

loaded via a namespace (and not attached):
 [1] tidyr_0.8.1        splines_3.5.0      carData_3.0-1      modelr_0.1.2      
 [5] assertthat_0.2.0   stats4_3.5.0       coin_1.2-2         pillar_1.2.3      
 [9] lattice_0.20-35    glue_1.2.0         glmmTMB_0.2.2.0    snakecase_0.9.1   
[13] minqa_1.2.4        colorspace_1.3-2   sandwich_2.4-0     Matrix_1.2-14     
[17] survey_3.33-2      plyr_1.8.4         psych_1.8.4        pkgconfig_2.0.1   
[21] broom_0.4.5        haven_1.1.2        purrr_0.2.5        xtable_1.8-2      
[25] mvtnorm_1.0-8      scales_0.5.0       stringdist_0.9.5.1 lme4_1.1-17       
[29] emmeans_1.2.2      tibble_1.4.2       effects_4.0-2      bayesplot_1.5.0   
[33] ggplot2_3.0.0      sjlabelled_1.0.11  TH.data_1.0-8      nnet_7.3-12       
[37] TMB_1.7.14         lazyeval_0.2.1     mnormt_1.5-5       survival_2.41-3   
[41] magrittr_1.5       crayon_1.3.4       estimability_1.3   MASS_7.3-49       
[45] forcats_0.3.0      foreign_0.8-70     tools_3.5.0        data.table_1.11.4 
[49] hms_0.4.2          multcomp_1.4-8     stringr_1.3.1      munsell_0.5.0     
[53] prediction_0.3.6   bindrcpp_0.2.2     compiler_3.5.0     rlang_0.2.1       
[57] grid_3.5.0         nloptr_1.0.4       ggridges_0.5.0     rstudioapi_0.7    
[61] gtable_0.2.0       codetools_0.2-15   sjstats_0.15.0     reshape2_1.4.3    
[65] sjmisc_2.7.3       R6_2.2.2           zoo_1.8-2          pwr_1.2-2         
[69] dplyr_0.7.6        bindr_0.1.1        modeltools_0.2-21  stringi_1.2.3     
[73] parallel_3.5.0     Rcpp_0.12.17       tidyselect_0.2.4   lmtest_0.9-36

Passing poly objects to ggpredict

I've been struggling for a while now to visualize a set of relatively complex three-way interactions arising from a series of linear mixed effects model in R. I recently came across the ggeffects package, which looked like it might finally offer a solution, and I've mostly been having a lot of success using it. The problem I'm having is that one of my models involves a polynomial term -- for simplicity sake, let's say that the model takes the form of

m <- lmer(poly(X, 2)*Y*Z + (1|SubjectID))

where X and Y are continuous variables, and Z is a categorical variable. However, when I attempt to run the command...

dat <- ggpredict(model.Test1, terms = c("X", "Y [alpha, beta, gamma]", "Z"), type = "re", allow.new.levels = TRUE)

...it produces the following error: "Column poly(x, 2) must be a 1d atomic vector or a list." I'm wondering if there are any good solutions that might allow me to visualize the three-way interaction while still retaining the polynomial term. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.