Comments (10)
This is actually okay. The actual definition of the pd is:
That is, the maximum aposteriori probably that x has a sign.
Currently, the docs reflect only cases in which x is continuous, which is the cause for confusion.
I will update the docs to reflect the possible values and interpretation for cases with discrete (or mixture with discrete) values.
from bayestestr.
That makes sense but in this case the formula for conversion of a p-direction into a two-sided frequentist p-value does not work anymore, since by the formula 2 * (1 - p_direction) we would reach values larger than 1 if p_direction < 0.5.
Is there any way to fix this calculation in my scenario to get the equivalency of the p-direction with frequentist p-values back?
from bayestestr.
I'm not sure if or how the pd to p conversion is valid if there are discrete 0s.
Do you have a reproducible example you can share?
from bayestestr.
It is hard to provide an example without posting lots of code but what i am trying to do is compare the inferences made by Bayesian Model Averaging of linear regression models with frequentist methods. As it is a known problem with model selection techniques such as Stepwise Regression that p-values are systematically underestimated, i would like to examine if BMA is more robust in this regard. The p-direction seems very fitting for this question, as it translates pretty well into frequentist p-values.
Right now i am just creating data following a linear model y <- X %*% Beta + rnorm(0,1) with some positive beta coefficients and some betas equal to zero. I am giving this to the BMS package and looking at the marginal densities it returns.
Now the problem is that the coefficient plot is only accurate if the top models considered by BMA actually include this predictor. Since this is almost never the case for all variables, i have used their posterior inclusion probabilities to inflate the model with zeroes, such that the marginal density now also includes information about those models, that did not include the predictor at all. This was done by sampling 1.000.000 times from the marginal distribution and then replacing a portion of the sample with zeroes. The proportion of zeroes is equal to the posterior probability that was given to those models, that do not contain the regressor in question.
Depending on how important the BMA Ensemble considers a predictor to be, the coefficient plots look like either one of the following plots. These plots are the zero-inflated marginal densities of each beta averaged across multiple simulation runs to make them more robust against sample variations.
Intuitively, the last plot suggests a very high probability that the predictor variable is zero (or very close) and so i would like to have a p-value like metric to reflect this idea.
from bayestestr.
Two points:
- Once you discretize the parameter space like this, the pd-to-p conversion is not valid.
See example:
library(MuMIn)
library(brms)
library(bayestestR)
set.seed(1234)
d <- tibble::tibble(
x = rnorm(100),
y = rnorm(100, x, sd = 6)
)
## Freq ---------------------------------------------------
m0 <- lm(y ~ 1, d)
m1 <- lm(y ~ 1 + x, d)
ma <- model.avg(m0, m1)
## Bayes ---------------------------------------------------
m0b <- brm(y ~ 1, d, backend = "cmdstanr", cores = 4)
m1b <- brm(y ~ 1 + x, d, backend = "cmdstanr", cores = 4)
mab <- posterior_average(m1b, m0b, missing = list(b_x = 0))
## Compare ------------------------------------------------
### Estimate
apply(mab[1:2], 2, mean)
#> b_Intercept b_x
#> 0.1567762 0.3292163
coef(ma, full = TRUE)
#> (Intercept) x
#> 0.1523379 0.3932120
### P vs PD
(pd1 <- p_direction(m1b, parameters = "x"))
#> Probability of Direction
#>
#> Parameter | pd
#> ------------------
#> x | 91.03%
pd_to_p(pd1[["pd"]])
#> [1] 0.1795
summary(m1)[["coefficients"]]
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.2229243 0.6298725 0.3539198 0.7241593
#> x 0.8434903 0.6226555 1.3546662 0.1786388
(pda <- p_direction(mab["b_x"]))
#> Probability of Direction
#>
#> Parameter | pd
#> ------------------
#> b_x | 35.68%
summary(ma)[["coefmat.full"]]
#> Estimate Std. Error Adjusted SE z value Pr(>|z|)
#> (Intercept) 0.1523379 0.6306727 0.6384294 0.2386136 0.8114052
#> x 0.3932120 0.5981554 0.6019445 0.6532364 0.5136039
pd_to_p(pda[["pd"]])
#> Probability of Direction: 0.71
Also, a gentle reminder that p-values cannot be used to support the null. However...
- When you discretize the parameter space like this, pd can be used to support the null! If pd << 0.5, the is more evidence that the parameter does not have a direction. You can read more about this in this paper: https://doi.org/10.1177/2515245921992035
To do:
- Revise p-d docs
- How the pd is actually computed
- possible ranges of values for continuous (0.5-1) vs discretized (0-1) parameter spaces.
- How values close to 0 can be used to support the null (discretized) hypothesis (https://doi.org/10.1177/2515245921992035)
- Revise pd-to-p
- docs: not valid for discretized parameter spaces
- when pd is smaller than 0.5 we need to either:
- error (@DominiqueMakowski we discussed this in the past, seems like we do need an error here).
- return p of exactly 1 with a warning
from bayestestr.
Yes, we should remove the definition to "of the median's direction".
Note that during a PR to add pd to Julia/Turing, one of the dev's came up with a much more efficient algorithm than the one-liner equivalent that we have in bayestestR
(though it doesn't look like it):
function p_direction(x::AbstractArray{<:Union{Missing,Real}})
ntotal = 0
npositive = 0
nnegative = 0
for xi in x
if xi > 0
npositive += 1
elseif xi < 0
nnegative += 1
end
ntotal += 1
end
return max(npositive, nnegative) / ntotal
end
We might run a quick benchmark to see if it's also more efficient.
we discussed this in the past, seems like we do need an error here).
I'm alright with that
We can mention a weird theoretical caveat: let say a posterior has like 999 values of 0
and one 0.1
, then the pd using the formula above will be 100%. But that kind of weirdness should be spottable by inspecting the proportion of exactly 0 values
from bayestestr.
The Julia solution is less efficient in R (probably because R doesn't have the +=
operator?), but I found a solution that is marginally more efficient to the one we currently have:
pd_classic <- function(x, null = 0) {
# Current algo
max(c(length(x[x > null])/length(x), length(x[x < null])/length(x)))
}
pd_Julia <- function(x, null = 0) {
# Julia algo
ntotal <- 0
npositive <- 0
nnegative <- 0
for (xi in x) {
if (xi > null) {
npositive <- npositive + 1
} else if (xi < null) {
nnegative <- nnegative + 1
}
ntotal <- ntotal + 1
}
return(max(npositive, nnegative) / ntotal)
}
pd_new <- function(x, null = 0) {
# New algo?
max(sum(x > null), sum(x < null)) /length(x)
}
# continuous
microbenchmark::microbenchmark(
pd_classic(rnorm(10000)),
pd_Julia(rnorm(10000)),
pd_new(rnorm(10000)),
setup = set.seed(111)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> pd_classic(rnorm(10000)) 450.0 464.60 495.095 473.35 537.3 671.1 100 a
#> pd_Julia(rnorm(10000)) 727.8 800.45 817.058 817.70 828.8 884.4 100 b
#> pd_new(rnorm(10000)) 369.1 403.15 420.443 409.85 440.6 520.0 100 c
# discrete
microbenchmark::microbenchmark(
pd_classic(rpois(10000, 0.4)),
pd_Julia(rpois(10000, 0.4)),
pd_new(rpois(10000, 0.4)),
setup = set.seed(111)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> pd_classic(rpois(10000, 0.4)) 178.7 187.80 190.381 188.7 190.40 246.4 100 a
#> pd_Julia(rpois(10000, 0.4)) 493.2 520.35 528.264 525.8 532.90 599.0 100 b
#> pd_new(rpois(10000, 0.4)) 160.0 170.80 173.672 172.5 173.95 210.8 100 c
Also,
We can mention a weird theoretical caveat: let say a posterior has like 999 values of 0 and one 0.1, then the pd using the formula above will be 100%. But that kind of weirdness should be spottable by inspecting the proportion of exactly 0 values
The current solution and the Julia solution would give
x <- c(rep(0, 999), 0.1)
bayestestR::p_direction(x)
#> Probability of Direction: 1.00e-03
pd_Julia(x)
#> [1] 0.001
pd_new(x)
#> [1] 0.001
from bayestestr.
(Why would a loop be faster than a vectorized operation in Julia? How odd....)
from bayestestr.
Your answers were very helpful and the paper is also very interesting.
Thank you very much!
from bayestestr.
The current solution and the Julia solution would give .001
right yeah I brainfroze
from bayestestr.
Related Issues (20)
- Missing images in BF Vignette HOT 1
- Formatting probability of direction objects fails HOT 7
- Add support for data frames with an `rvar` column HOT 4
- Be consistent with return values across functions
- error using describe_posterior HOT 3
- bayestestR broken after revising `pd()`. HOT 13
- Iris data example for Bayes Factor: with or without random effect? HOT 3
- "LogBF = Inf" problem HOT 5
- Issues with blavaan HOT 9
- Workflow badges not updating? HOT 9
- Infinite probability? HOT 2
- mediation with censored data using brm and bayestestR
- Is a directional p_rope possible? HOT 1
- hdi() fails with brms model outputs HOT 9
- CRAN submission HOT 11
- Flexible ROPE values for describe_posterior HOT 1
- Question on contr.equalprior function
- `bayesian_as_frequentist()` fails for brms `0 + Intercept` formula HOT 4
- Avoid resampling priors for `brmsfit` objects that contain prior samples HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bayestestr.