Code Monkey home page Code Monkey logo

bayestestr's Introduction

bayestestR

DOI status lifecycle

Become a Bayesian master you will


⚠️ We changed the default the CI width! Please make an informed decision and set it explicitly (ci = 0.89, ci = 0.95 or anything else that you decide) ⚠️


Existing R packages allow users to easily fit a large variety of models and extract and visualize the posterior draws. However, most of these packages only return a limited set of indices (e.g., point-estimates and CIs). bayestestR provides a comprehensive and consistent set of functions to analyze and describe posterior distributions generated by a variety of models objects, including popular modeling packages such as rstanarm, brms or BayesFactor.

You can reference the package and its documentation as follows:

  • Makowski, D., Ben-Shachar, M. S., & Lüdecke, D. (2019). bayestestR: Describing Effects and their Uncertainty, Existence and Significance within the Bayesian Framework. Journal of Open Source Software, 4(40), 1541. 10.21105/joss.01541
  • Makowski, D., Ben-Shachar, M. S., Chen, S. H. A., & Lüdecke, D. (2019). Indices of Effect Existence and Significance in the Bayesian Framework. Frontiers in Psychology 2019;10:2767. 10.3389/fpsyg.2019.02767

Installation

CRAN insight status badge R-CMD-check

The bayestestR package is available on CRAN, while its latest development version is available on R-universe (from rOpenSci).

Type Source Command
Release CRAN install.packages("bayestestR")
Development R-universe install.packages("bayestestR", repos = "https://easystats.r-universe.dev")

Once you have downloaded the package, you can then load it using:

library("bayestestR")

Tip

Instead of library(bayestestR), use library(easystats). This will make all features of the easystats-ecosystem available.

To stay updated, use easystats::install_latest().

Documentation

Access the package documentation and check-out these vignettes:

Tutorials

Articles

Features

In the Bayesian framework, parameters are estimated in a probabilistic fashion as distributions. These distributions can be summarised and described by reporting four types of indices:

describe_posterior() is the master function with which you can compute all of the indices cited below at once.

describe_posterior(
  rnorm(10000),
  centrality = "median",
  test = c("p_direction", "p_significance"),
  verbose = FALSE
)
## Summary of Posterior Distribution
## 
## Parameter | Median |        95% CI |     pd |   ps
## --------------------------------------------------
## Posterior |  -0.01 | [-1.98, 1.93] | 50.52% | 0.46

describe_posterior() works for many objects, including more complex brmsfit-models. For better readability, the output is separated by model components:

zinb <- read.csv("http://stats.idre.ucla.edu/stat/data/fish.csv")
set.seed(123)
model <- brm(
  bf(
    count ~ child + camper + (1 | persons),
    zi ~ child + camper + (1 | persons)
  ),
  data = zinb,
  family = zero_inflated_poisson(),
  chains = 1,
  iter = 500
)

describe_posterior(
  model,
  effects = "all",
  component = "all",
  test = c("p_direction", "p_significance"),
  centrality = "all"
)
## Summary of Posterior Distribution
## 
## Parameter   | Median |  Mean |   MAP |         95% CI |     pd |   ps |  Rhat |    ESS
## --------------------------------------------------------------------------------------
## (Intercept) |   0.96 |  0.96 |  0.96 | [-0.81,  2.51] | 90.00% | 0.88 | 1.011 | 110.00
## child       |  -1.16 | -1.16 | -1.16 | [-1.36, -0.94] |   100% | 1.00 | 0.996 | 278.00
## camper      |   0.73 |  0.72 |  0.73 | [ 0.54,  0.91] |   100% | 1.00 | 0.996 | 271.00
## 
## # Fixed effects (zero-inflated)
## 
## Parameter   | Median |  Mean |   MAP |         95% CI |     pd |   ps |  Rhat |    ESS
## --------------------------------------------------------------------------------------
## (Intercept) |  -0.48 | -0.51 | -0.22 | [-2.03,  0.89] | 78.00% | 0.73 | 0.997 | 138.00
## child       |   1.85 |  1.86 |  1.81 | [ 1.19,  2.54] |   100% | 1.00 | 0.996 | 303.00
## camper      |  -0.88 | -0.86 | -0.99 | [-1.61, -0.07] | 98.40% | 0.96 | 0.996 | 292.00
## 
## # Random effects (conditional) Intercept: persons
## 
## Parameter |    Median |  Mean |   MAP |         95% CI |     pd |   ps |  Rhat |    ESS
## ---------------------------------------------------------------------------------------
## persons.1 |     -0.99 | -1.01 | -0.84 | [-2.68,  0.80] | 92.00% | 0.90 | 1.007 | 106.00
## persons.2 | -4.65e-03 | -0.04 |  0.03 | [-1.63,  1.66] | 50.00% | 0.45 | 1.013 | 109.00
## persons.3 |      0.69 |  0.66 |  0.69 | [-0.95,  2.34] | 79.60% | 0.78 | 1.010 | 114.00
## persons.4 |      1.57 |  1.56 |  1.56 | [-0.05,  3.29] | 96.80% | 0.96 | 1.009 | 114.00
## 
## # Random effects (zero-inflated) Intercept: persons
## 
## Parameter | Median |  Mean |   MAP |         95% CI |     pd |   ps |  Rhat |    ESS
## ------------------------------------------------------------------------------------
## persons.1 |   1.10 |  1.11 |  1.08 | [-0.23,  2.72] | 94.80% | 0.93 | 0.997 | 166.00
## persons.2 |   0.18 |  0.18 |  0.22 | [-0.94,  1.58] | 63.20% | 0.54 | 0.996 | 154.00
## persons.3 |  -0.30 | -0.31 | -0.54 | [-1.79,  1.02] | 64.00% | 0.59 | 0.997 | 154.00
## persons.4 |  -1.45 | -1.46 | -1.44 | [-2.90, -0.10] | 98.00% | 0.97 | 1.000 | 189.00
## 
## # Random effects (conditional) SD/Cor: persons
## 
## Parameter   | Median | Mean |  MAP |         95% CI |   pd |   ps |  Rhat |    ESS
## ----------------------------------------------------------------------------------
## (Intercept) |   1.42 | 1.58 | 1.07 | [ 0.71,  3.58] | 100% | 1.00 | 1.010 | 126.00
## 
## # Random effects (zero-inflated) SD/Cor: persons
## 
## Parameter   | Median | Mean |  MAP |         95% CI |   pd |   ps |  Rhat |    ESS
## ----------------------------------------------------------------------------------
## (Intercept) |   1.30 | 1.49 | 0.99 | [ 0.63,  3.41] | 100% | 1.00 | 0.996 | 129.00

bayestestR also includes many other features useful for your Bayesian analyses. Here are some more examples:

Point-estimates

library(bayestestR)

posterior <- distribution_gamma(10000, 1.5) # Generate a skewed distribution
centrality <- point_estimate(posterior) # Get indices of centrality
centrality
## Point Estimate
## 
## Median | Mean |  MAP
## --------------------
## 1.18   | 1.50 | 0.51

As for other easystats packages, plot() methods are available from the see package for many functions:

While the median and the mean are available through base R functions, map_estimate() in bayestestR can be used to directly find the Highest Maximum A Posteriori (MAP) estimate of a posterior, i.e., the value associated with the highest probability density (the “peak” of the posterior distribution). In other words, it is an estimation of the mode for continuous parameters.

Uncertainty (CI)

hdi() computes the Highest Density Interval (HDI) of a posterior distribution, i.e., the interval which contains all points within the interval have a higher probability density than points outside the interval. The HDI can be used in the context of Bayesian posterior characterization as Credible Interval (CI).

Unlike equal-tailed intervals (see eti()) that typically exclude 2.5% from each tail of the distribution, the HDI is not equal-tailed and therefore always includes the mode(s) of posterior distributions.

posterior <- distribution_chisquared(10000, 4)

hdi(posterior, ci = 0.89)
## 89% HDI: [0.18, 7.63]

eti(posterior, ci = 0.89)
## 89% ETI: [0.75, 9.25]

Existence and Significance Testing

Probability of Direction (pd)

p_direction() computes the Probability of Direction (pd, also known as the Maximum Probability of Effect - MPE). It varies between 50% and 100% (i.e., 0.5 and 1) and can be interpreted as the probability (expressed in percentage) that a parameter (described by its posterior distribution) is strictly positive or negative (whichever is the most probable). It is mathematically defined as the proportion of the posterior distribution that is of the median’s sign. Although differently expressed, this index is fairly similar (i.e., is strongly correlated) to the frequentist p-value.

Relationship with the p-value: In most cases, it seems that the pd corresponds to the frequentist one-sided p-value through the formula p-value = (1-pd/100) and to the two-sided p-value (the most commonly reported) through the formula p-value = 2*(1-pd/100). Thus, a pd of 95%, 97.5% 99.5% and 99.95% corresponds approximately to a two-sided p-value of respectively .1, .05, .01 and .001. See the reporting guidelines.

posterior <- distribution_normal(10000, 0.4, 0.2)
p_direction(posterior)
## Probability of Direction
## 
## Parameter |     pd
## ------------------
## Posterior | 97.72%

ROPE

rope() computes the proportion (in percentage) of the HDI (default to the 89% HDI) of a posterior distribution that lies within a region of practical equivalence.

Statistically, the probability of a posterior distribution of being different from 0 does not make much sense (the probability of it being different from a single point being infinite). Therefore, the idea underlining ROPE is to let the user define an area around the null value enclosing values that are equivalent to the null value for practical purposes Kruschke (2018).

Kruschke suggests that such null value could be set, by default, to the -0.1 to 0.1 range of a standardized parameter (negligible effect size according to Cohen, 1988). This could be generalized: For instance, for linear models, the ROPE could be set as 0 +/- .1 * sd(y). This ROPE range can be automatically computed for models using the rope_range function.

Kruschke suggests using the proportion of the 95% (or 90%, considered more stable) HDI that falls within the ROPE as an index for “null-hypothesis” testing (as understood under the Bayesian framework, see equivalence_test).

posterior <- distribution_normal(10000, 0.4, 0.2)
rope(posterior, range = c(-0.1, 0.1))
## # Proportion of samples inside the ROPE [-0.10, 0.10]:
## 
## inside ROPE
## -----------
## 4.40 %

Bayes Factor

bayesfactor_parameters() computes Bayes factors against the null (either a point or an interval), bases on prior and posterior samples of a single parameter. This Bayes factor indicates the degree by which the mass of the posterior distribution has shifted further away from or closer to the null value(s) (relative to the prior distribution), thus indicating if the null value has become less or more likely given the observed data.

When the null is an interval, the Bayes factor is computed by comparing the prior and posterior odds of the parameter falling within or outside the null; When the null is a point, a Savage-Dickey density ratio is computed, which is also an approximation of a Bayes factor comparing the marginal likelihoods of the model against a model in which the tested parameter has been restricted to the point null (Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010).

prior <- distribution_normal(10000, mean = 0, sd = 1)
posterior <- distribution_normal(10000, mean = 1, sd = 0.7)

bayesfactor_parameters(posterior, prior, direction = "two-sided", null = 0, verbose = FALSE)
## Bayes Factor (Savage-Dickey density ratio)
## 
## BF  
## ----
## 1.94
## 
## * Evidence Against The Null: 0

The lollipops represent the density of a point-null on the prior distribution (the blue lollipop on the dotted distribution) and on the posterior distribution (the red lollipop on the yellow distribution). The ratio between the two - the Savage-Dickey ratio - indicates the degree by which the mass of the parameter distribution has shifted away from or closer to the null.

For more info, see the Bayes factors vignette.

Utilities

Find ROPE’s appropriate range

rope_range(): This function attempts at automatically finding suitable “default” values for the Region Of Practical Equivalence (ROPE). Kruschke (2018) suggests that such null value could be set, by default, to a range from -0.1 to 0.1 of a standardized parameter (negligible effect size according to Cohen, 1988), which can be generalised for linear models to -0.1 * sd(y), 0.1 * sd(y). For logistic models, the parameters expressed in log odds ratio can be converted to standardized difference through the formula sqrt(3)/pi, resulting in a range of -0.05 to 0.05.

rope_range(model)

Density Estimation

estimate_density(): This function is a wrapper over different methods of density estimation. By default, it uses the base R density with by default uses a different smoothing bandwidth ("SJ") from the legacy default implemented the base R density function ("nrd0"). However, Deng & Wickham suggest that method = "KernSmooth" is the fastest and the most accurate.

Perfect Distributions

distribution(): Generate a sample of size n with near-perfect distributions.

distribution(n = 10)
##  [1] -1.55 -1.00 -0.66 -0.38 -0.12  0.12  0.38  0.66  1.00  1.55

Probability of a Value

density_at(): Compute the density of a given point of a distribution.

density_at(rnorm(1000, 1, 1), 1)
## [1] 0.41

Code of Conduct

Please note that the bayestestR project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science, 1(2), 270–280.

Kruschke, J. K., & Liddell, T. M. (2018). The bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a bayesian perspective. Psychonomic Bulletin & Review, 25(1), 178–206.

Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the savage–dickey method. Cognitive Psychology, 60(3), 158–189.

bayestestr's People

Contributors

adamhsparks avatar ajnafa avatar bwiernik avatar canyon289 avatar danielskatz avatar dominiquemakowski avatar etiennebacher avatar github-actions[bot] avatar humanfactors avatar indrajeetpatil avatar jackdolgin avatar leungi avatar mattansb avatar mutlusun avatar nahorp avatar rempsyc avatar sam-crawley avatar singmann avatar strengejacke avatar tjmahr avatar vincentarelbundock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bayestestr's Issues

Plotting

Now that the initial release rush is over, I've checked your plotting methods, which have a very ingenious way of retrieving the original data.

However, I feel like this would need a more systematic implementation (so that the data retrieval attempt works for all functions, p_direction, hdi, etc.).

Do you think adding the name of the object as an attribute would work in most cases? (I believe it won't work in cases such as hdi(rnorm(1000)). Maybe it would be simpler (not from an efficiency point of view obviously) to store the original data in the objects (especially when these are non-models)?

Unify printing methods

Would be nice to unify the printing output up to the standard of ROPE/equivalence test.

  • p_direction: dataframe-like output needed
  • p_map: everything needed

DESCRIPTION: Remotes

Do we need insight listed under "remotes" when it's also listed under "imports"?

Multivariate response models

Maybe we manage it to make functions like equivalence_test() also work for multivariate response models (like stanmvreg), however, this seems more like a long-term goal to me.

plot() / print() methods

Once I'm done with the remaining stuff for insight and it's submitted to CRAN, I would try to write some print / plot methods for some of the bayestestR functions.

I think especially equivalence_test() would benefit from a nice print-method. What would you say?

Remove priors from output

equivalence_test() returns results for the complete posterior distribiution. However, if you fit your model with brms and use sample_prior = TRUE, then you have parameters for the priors as well, which are most likely of no interest for functions like equivalence_test().

Hence, remove all paramaters starting with prior_.

I can work on this issue...

Remove dataset from package

Hum, I just realised that the bayes_indices.csv dataset, used in the vignettes, is loaded with the package. Which is unintended and unwanted from a weight perspective. Moreover, the vignettes work by downloading the dataset from github, so no need for it to be stored in the package.

From here:

  • If you want to store parsed data, but not make it available to the user, put it in R/sysdata.rda. This is the best place to put data that your functions need.
  • If you want to store raw data, put it in inst/extdata.

Improve testing

it would be good if we managed to improve testing coverage to > 90% before submitting to JOSS or somewhere else, which might necessitate to try making testing work for brms models.

`equivalence_test()` plot shows error

I ran a Bayesian multilevel model in brms and tried to plot the results using the equivalence_test() function:

library(bayestestR)
x <- equivalence_test(model)
plot(x)

and got the following error:
Error in FUN(X[[i]], ...) : object 'grp' not found

I then used the function equi_test from the sjstats package and it worked:

library(sjstats)
equi_test(x, rope = c(-0.05, 0.05), plot = TRUE, out = 'plot')

which did not give me the error. I wonder if I am doing something wrong or if this might be a bug?

max density as an index of posterior characterisation

the individual y values of the density function are not commonly used (as their meaning is in relationship with portions of the function from which to estimate probabilities).
Nevertheless, it seems like the density value of the MAP is sensitive to the general precision (I.e., range and spread) of the density function. The higher it is, and the more "precise" is the posterior. Hence, I wonder how this index would behave in relationship to other markers such as rope percentage, pd and such. Could be interesting to include it in a revised simulation study.

To Do for current PR

  • I'll check the new hdi() for some models tonight, and if it works, we may extend the arguments effects and component to rope() as well.
  • The print()-method for equivalence_test() for models should be revised, so it does not print redundant information, but includes useful information (HDI for each parameter) instead.
  • The print()-method for equivalence_test() should work for multiple CI-values.
  • Then, the ci()-method can be targeted, and this PR is "complete".

Originally posted by @strengejacke in #29 (comment)

Consistent structure for HDI_area attribute in rope()

@strengejacke I am having issues for ROPE support (see bottom of README for example). It seems the rope object based on models does contain all the information regarding HDI_area. It has (in attributes) only one value for it, instead of several for parameters and levels of ci. Is it possible to address it here, or modify the thing in bayestestR?

Docs for frequentist CI and equivalence test

It seems that the extension of ci and equivalence_test in parameters works, although documenting their methods separately is not straightforward. Users will probably be shown bayestestR's documentation instead of the one specific to frequentist models.

Thus, I think we will have to add a link to the parameters' website "for frequentist models, you can find the documentation here" and/or directly a small paragraph in details documenting the method for frequentist models.

rope_range(): allow for a flexible "method" paradigm

Altough for now, there is only the "kruschke2018" method, I might be interesting to prepare the terrain for an easy switch between methods of default values selection, in a similar way to the interpret_* functions in report. As with every rules of thumb, I can foresee a debate surrounding the appropriate values in the literature, so we might as well support them here :)

Scope of the package

Related to #5 #6

Currently, all functions in the package are "model agnostic", i.e., they don't work on models but directly on the posterior vectors.

These are fundamental operations that would be used in functions dealing with Bayesian models.

In fact, I first considered dissociating the "model-related" part with the "posterior-related" part, but writing about this, I am not sure it makes much sense. First, the scope (and name) of the package suggests its ability to deal with models. And second, it would nice to have the same function names for posteriors and models (hdi(rnorm(1000)) and hdi(stanmodel)).

The other possibility would be to move the model-targetted functions to the "parameters" (such as sjstats::tidy_stan) and "model performance" (e..g. for sjstats::msce) packages.

What is your feeling about this?

ci()

We already have a ci() method currently defined in parameters. However, parameters does depend on bayestestR. Thus, we can try defining the method in bayestestR and expanding it to frequentist models in parameters. We just have to find a way to put the links of the online documentation of each method in the docstring of bayestestR's definition.

Allow specifying "pars" for inclusion in bayestestR functions

See strengejacke/sjstats#72

I fit an errors-in-variables model in brms (i.e., with the term me(age, age_se)) and sjstats parameter extraction misfired (it shows a bunch of parameters in tab_model and equi_test that are not shown when calling summary.brmsfit. It would be cool, if you permitted a pars argument like many brms and bayesplot functions to allow customisation.

This would not just be useful for misfiring like above, but also to trim the tested parameters in equi_test if not all are of substantial interest.

Add function for Savage-Dickey density ratio Bayes factors

Would you consider adding a function for a Savage-Dickey density ratio Bayes factor?

Something like this, perhaps?

savage_dickey_BF <- function(xpost,xprior,null = 0,direction = 0){
  if (!require(logspline)) {
    stop('Need logspline to run')
  }
  f_post <- suppressWarnings(logspline(xpost))
  f_prior <- suppressWarnings(logspline(xprior))
  
  d_post <- dlogspline(null,f_post)
  d_prior <- dlogspline(null,f_prior)
  
  norm_post <- norm_prior <- 1
  if (direction < 0) {
    norm_post <- plogspline(null,f_post)
    norm_prior <- plogspline(null,f_prior)
  } else if (direction > 0) {
    norm_post <- 1-plogspline(null,f_post)
    norm_prior <- 1-plogspline(null,f_prior)
  }
  
  (d_post/norm_post)/(d_prior/norm_prior)
}

Vignette builder

I can't check the package because vignettes can't be built. The vignette index is missing. Any ideas? Or is it because I'm testing the dev branch?

To do for CRAN submission

My impression is that bayestestR is not far away from its initial CRAN submission. Would you agree?
Let's put together a to-do list to check what needs to be done before submission:

  • implement ci() (related to #30) (DL)
  • maybe implement plot() for equivalence_test()? I really would like to have this feature, because I think that (visual) equivalence-testing is rising more and more awareness, and I've been already contacted two or three times regarding this issue (related to #15 and #34) (DL)
  • check test-coverage (DL / DM)
  • compile vignettes (to see if it works and they look good?) (related to #32) (DM)
  • remove Remotes field from DESCRIPTION (DM)
  • related to the previous point, do we need to fix the test-files, i.e. to make them not run on CRAN? (also related to #19) (DL / DM)
  • check docs / helpfiles (related to #33 ) (DL / DM)
  • allow specification of pars (related to #37) (DL)
  • final check / website

What else? I also made some suggestions who takes over which tasks, feel free to add / remove yourself from the list. ;-)

CRAN submission

Thanks,

we see:
Size of tarball: 5048980 bytes

Not more than 5 MB for a CRAN package, please.

If there are any references describing the methods in your package,
please add these in the Description field of your DESCRIPTION file in
the form
authors (year) doi:...
authors (year) arXiv:...
authors (year, ISBN:...)
with no space after 'doi:', 'arXiv:' and angle brackets for auto-linking.

Please fix and resubmit.

HDI_area attribute as data frame also for rope.numeric()

model <- rstanarm::stan_glm(mpg ~ wt + cyl, data = mtcars)
attributes(rope(model))$HDI_area
#> $`(Intercept)`
#>     CI  HDI_low HDI_high
#> 1 0.90 36.60966 42.66055
#> 2 0.95 36.12163 43.45867
#> 
#> $wt
#>     CI   HDI_low  HDI_high
#> 1 0.90 -4.432082 -1.776227
#> 2 0.95 -4.668768 -1.481225
#> 
#> $cyl
#>     CI   HDI_low   HDI_high
#> 1 0.90 -2.254774 -0.8010249
#> 2 0.95 -2.396010 -0.6578996

Since we return one or multiple dataframes for models (which I like), could we also return such dataframe for the cases where a vector is passed to ROPE?

library(bayestestR)
attributes(rope(x = rnorm(1000, 1, 1)))$HDI_area
#> [[1]]
#> [1] -0.5071979  2.7443272
attributes(rope(x = rnorm(1000, 1, 1), ci = c(.90, .95)))$HDI_area
#> [[1]]
#> [1] -0.8034466  2.5204936
#> 
#> [[2]]
#> [1] -0.8451218  2.9656530

So the output would like:

attributes(rope(x = rnorm(1000, 1, 1), ci = c(.90, .95)))$HDI_area
#>     CI  HDI_low HDI_high
#> 1 0.90 36.60966 42.66055
#> 2 0.95 36.12163 43.45867

Sorry to bother with this, I would gladly change it myself but I am afraid of breaking everything since the attribute code is still not perfectly clear to me (I am studying it tho :)

Originally posted by @DominiqueMakowski in #69 (comment)

difference between bayestestR::hdi() and sjstats::hdi()

I found a strange inconsistency for hdi() with brms-models, but I'm not sure if this is only happening for brms-models or not? I'm not sure in how far your implementation of HDI differs from mine (which is adapted from Kruscke 2015).

stanreg = identical
brms = differences

m1 <- insight::download_model("stanreg_lm_1")

as.data.frame(bayestestR::hdi(m1, ci = .95))
#>     Parameter CI    CI_low   CI_high
#> 1 (Intercept) 95 33.641431 41.069859
#> 2          wt 95 -6.418331 -4.236012

as.data.frame(sjstats::hdi(m1, prob = .95))
#>          term   hdi.low  hdi.high
#> 1 (Intercept) 33.641431 41.069859
#> 2          wt -6.418331 -4.236012

set.seed(123)
library(brms)
b4 <- brms::brm(mpg ~ wt + cyl + gear, data = mtcars, iter = 500, chains = 1)

as.data.frame(bayestestR::hdi(b4, ci = .95))
#>     Parameter CI    CI_low    CI_high
#> 1 b_Intercept 95 34.996964 50.9281495
#> 2        b_wt 95 -5.339557 -1.9348616
#> 3       b_cyl 95 -2.380895 -0.6869611
#> 4      b_gear 95 -1.790944  1.0685669

as.data.frame(sjstats::hdi(b4, prob = .95))
#>          term   hdi.low   hdi.high
#> 1 b_Intercept 34.549999 50.8355624
#> 2        b_wt -5.373982 -1.9348616
#> 3       b_cyl -2.295808 -0.5923283
#> 4      b_gear -1.845044  1.0685669

Created on 2019-03-14 by the reprex package (v0.2.1)

allow more objects

I think it would be nice to have methods for stanfit or brms-objects as well, so the functions not only expect a single vector, but may also cope with model objects.

rope/equivalence test

@strengejacke I am still unsure about the name of the equivalence test, currently called rope_test() (as it is directly related to the rope() function. I know that you named it equi_test in sjstats (although to be used on models instead of vectors of posteriors). What is your opinion?

docs map_estimate

@DominiqueMakowski This commit requires some attention! Since map_estimate() returns different things, depending on the input and the arguments, I tried to add the section for the return-value to the docs. However, I'm not sure what density does. so can you check this please, and add a short description to the @return-part of the docs?

hdi error

I do have this error when testing hdi() for brms and stanreg models:

> bayestestR::hdi(x)
 Hide Traceback
 
 Rerun with Debug
 Error in data[[variable]] : subscript out of bounds 
5.
which(data[[variable]] == value) at utils.R#22
4.
.select_rows(dat, "Group", "fixed") at utils_hdi_ci.R#79
3.
.compute_interval_stanreg(x, ci, effects, parameters, verbose, 
    fun = "hdi") at hdi.R#72
2.
hdi.stanreg(x) at hdi.R#54
1.
bayestestR::hdi(x) 

I think I have the other packages correctly updated (insight)
do you have it too?

output from equivalence_test

Especially for models, the function has some redundant output (CI and rope bounds are identical for each parameter). Maybe we can also add the HDI? So users see the HDI and the rope, and can see which part lies inside/outside the rope? We should probably think about a print()-method, to have some user-friendly output.

class class class

library(bayestestR)
library(insight)
m <- insight::download_model("stanreg_glm_5")

class(hdi(m))
#> [1] "hdi"        "hdi"        "hdi"        "data.frame"

class(ci(m, ci = c(.5, .8)))
#> [1] "ci"         "data.frame"

class(rope(m))
#> [1] "rope"       "data.frame"

Created on 2019-04-18 by the reprex package (v0.2.1)

Have found this only for hdi() yet. We should make sure that class-attributes simply are unique().

Reorder vignettes order

Turns out vignettes files can't start with numbers, which was added for dealing with their order in the website. Must specify it in pkgdown parameters:

check sensitivity to priors specification

It would be interesting to include some methods to check the parameters sensitivity to different priors specification. I am not sure if there is any "standard default" method for doing so.

Description p_rope()

The ROPE-based p-value represents the maximum Credible Interval (HDI) that does not contain (positive values) or is entirely contained (negative values) in the negligible values space defined by the ROPE. A ROPE-based p of 97

  1. it's not only positive, but also only negative values.
  2. the description ends abrupt.

Design

@strengejacke I would be grateful if you could give me your input and opinion regarding this package and its functions 😺

Publication in JOSS

What do you think about preparing (and submitting) a paper for JOSS?

https://joss.theoj.org/

It would be a very short paper, mostly similar to a readme-file, but would have the advantage of a "nicer" citation for the package. The journal has an ISSN, so it's being considered as "citable".

docs

I have revised the docs slightly in this commit: 856cd25

  1. I changed Value or vector of HDI probability to alue or vector of probability of the interval as we inherits the docs from hdi() also in ci(), so this is more general.

  2. In ci(). I removed For Bayesian Credible Intervals, CIs are often computed by the hdi method. I'm not sure if HDI's are really more often computed than equal-tailed credible intervals, so unless we don't have a proof for this statement, we may better remove it.

Documentation overhaul

This issue to plan the documentation, to avoid redundancy and making the documentation useful, clear and easily findable

  • README
    • All main functions with their docstrings?
  • Vignettes
    • Get started
      • Why use the Bayesian framework?
      • A very simple linear model
    • Example: Bayesian GLMs
    • Example: Bayesian mixed models
    • In depth: Point estimates comparison
    • In depth: Indices of effect existence comparison
    • Reporting guidelines

@strengejacke How would do you see it?

Altough I feel like analyses examples should be in this repo, the home for Bayesian-related stuff, at the same time it would make then sense to add here the other aspects related to Bayesian models present in easystats (such as model's performance, contrasts estimates, report (and visreport?) etc.

Ressources (for inspiration):

What does p_rope() exactly return?

I would have thought that p_rope() is similar or even equivalent to sjstats::equi_test(). In this model, 28.6% of the values of the posterior from the parameter am are inside the rope of (-.1, .1). p_rope() returns a p-value of 8.5. How do these values relate to each other?

library(sjstats)
library(bayestestR)
library(rstanarm)
set.seed(123)
m <- stan_glm(mpg ~ wt + am, data = mtcars, chains = 1)

equi_test(m)
#> 
#> # Test for Practical Equivalence of Model Predictors
#> 
#>   Effect Size: 0.10
#>          ROPE: [-0.60 0.60]
#>       Samples: 1000
#> 
#>                         H0 %inROPE      HDI(95%)
#>  (Intercept) (*)    reject    0.00 [30.54 42.76]
#>  wt (*)             reject    0.00 [-6.96 -3.79]
#>  am (*)          undecided   28.60 [-2.92  2.98]
#>  sigma (*)          reject    0.00 [ 2.47  4.08]
#> 
#> (*) the number of effective samples may be insufficient for some parameters

p_rope(as.data.frame(m)$am)
#> [1] 8.5

ROPE: default

Enable the ROPE parameter to "default", which will be -0.1 0.1 for rope.numeric and 0.1 * SD(response) for models... But what in the case of logistic models? What's the best way to return the ROPE bounds?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.