Hello, i would like to calculate the p-direction of a marginal densi

Two points: Once you discretize the parameter space like this,

P-Direction in zero-inflated models about bayestestr HOT 10 CLOSED

tobiasweckop commented on August 15, 2024

P-Direction in zero-inflated models

from bayestestr.

Comments (10)

mattansb commented on August 15, 2024

This is actually okay. The actual definition of the pd is:

$$ max(p(x < 0), p(x > 0)) $$

That is, the maximum aposteriori probably that x has a sign.

Currently, the docs reflect only cases in which x is continuous, which is the cause for confusion.
I will update the docs to reflect the possible values and interpretation for cases with discrete (or mixture with discrete) values.

from bayestestr.

tobiasweckop commented on August 15, 2024

That makes sense but in this case the formula for conversion of a p-direction into a two-sided frequentist p-value does not work anymore, since by the formula 2 * (1 - p_direction) we would reach values larger than 1 if p_direction < 0.5.

Is there any way to fix this calculation in my scenario to get the equivalency of the p-direction with frequentist p-values back?

from bayestestr.

mattansb commented on August 15, 2024

I'm not sure if or how the pd to p conversion is valid if there are discrete 0s.

Do you have a reproducible example you can share?

from bayestestr.

tobiasweckop commented on August 15, 2024

It is hard to provide an example without posting lots of code but what i am trying to do is compare the inferences made by Bayesian Model Averaging of linear regression models with frequentist methods. As it is a known problem with model selection techniques such as Stepwise Regression that p-values are systematically underestimated, i would like to examine if BMA is more robust in this regard. The p-direction seems very fitting for this question, as it translates pretty well into frequentist p-values.

Right now i am just creating data following a linear model y <- X %*% Beta + rnorm(0,1) with some positive beta coefficients and some betas equal to zero. I am giving this to the BMS package and looking at the marginal densities it returns.

Now the problem is that the coefficient plot is only accurate if the top models considered by BMA actually include this predictor. Since this is almost never the case for all variables, i have used their posterior inclusion probabilities to inflate the model with zeroes, such that the marginal density now also includes information about those models, that did not include the predictor at all. This was done by sampling 1.000.000 times from the marginal distribution and then replacing a portion of the sample with zeroes. The proportion of zeroes is equal to the posterior probability that was given to those models, that do not contain the regressor in question.

Depending on how important the BMA Ensemble considers a predictor to be, the coefficient plots look like either one of the following plots. These plots are the zero-inflated marginal densities of each beta averaged across multiple simulation runs to make them more robust against sample variations.

Intuitively, the last plot suggests a very high probability that the predictor variable is zero (or very close) and so i would like to have a p-value like metric to reflect this idea.

from bayestestr.

mattansb commented on August 15, 2024

Two points:

Once you discretize the parameter space like this, the pd-to-p conversion is not valid.

See example:

library(MuMIn)
library(brms)
library(bayestestR)

set.seed(1234)
d <- tibble::tibble(
  x = rnorm(100),
  y = rnorm(100, x, sd = 6)
)

## Freq ---------------------------------------------------

m0 <- lm(y ~ 1, d)
m1 <- lm(y ~ 1 + x, d)

ma <- model.avg(m0, m1)

## Bayes ---------------------------------------------------

m0b <- brm(y ~ 1, d, backend = "cmdstanr", cores = 4)
m1b <- brm(y ~ 1 + x, d, backend = "cmdstanr", cores = 4)

mab <- posterior_average(m1b, m0b, missing = list(b_x = 0))

## Compare ------------------------------------------------

### Estimate

apply(mab[1:2], 2, mean)
#> b_Intercept         b_x 
#>   0.1567762   0.3292163 
coef(ma, full = TRUE)
#> (Intercept)           x 
#>   0.1523379   0.3932120 

### P vs PD

(pd1 <- p_direction(m1b, parameters = "x"))
#> Probability of Direction 
#> 
#> Parameter |     pd
#> ------------------
#> x         | 91.03%
pd_to_p(pd1[["pd"]])
#> [1] 0.1795
summary(m1)[["coefficients"]]
#>              Estimate Std. Error   t value  Pr(>|t|)
#> (Intercept) 0.2229243  0.6298725 0.3539198 0.7241593
#> x           0.8434903  0.6226555 1.3546662 0.1786388


(pda <- p_direction(mab["b_x"]))
#> Probability of Direction
#> 
#> Parameter |     pd
#> ------------------
#> b_x       | 35.68%
summary(ma)[["coefmat.full"]]
#>              Estimate Std. Error Adjusted SE   z value  Pr(>|z|)
#> (Intercept) 0.1523379  0.6306727   0.6384294 0.2386136 0.8114052
#> x           0.3932120  0.5981554   0.6019445 0.6532364 0.5136039
pd_to_p(pda[["pd"]])
#> Probability of Direction: 0.71

Also, a gentle reminder that p-values cannot be used to support the null. However...

When you discretize the parameter space like this, pd can be used to support the null! If pd << 0.5, the is more evidence that the parameter does not have a direction. You can read more about this in this paper: https://doi.org/10.1177/2515245921992035

To do:

Revise p-d docs
- How the pd is actually computed
- possible ranges of values for continuous (0.5-1) vs discretized (0-1) parameter spaces.
- How values close to 0 can be used to support the null (discretized) hypothesis (https://doi.org/10.1177/2515245921992035)
Revise pd-to-p
- docs: not valid for discretized parameter spaces
- when pd is smaller than 0.5 we need to either:
  - error (@DominiqueMakowski we discussed this in the past, seems like we do need an error here).
  - return p of exactly 1 with a warning

from bayestestr.

DominiqueMakowski commented on August 15, 2024

Yes, we should remove the definition to "of the median's direction".

Note that during a PR to add pd to Julia/Turing, one of the dev's came up with a much more efficient algorithm than the one-liner equivalent that we have in bayestestR

(though it doesn't look like it):

function p_direction(x::AbstractArray{<:Union{Missing,Real}})
    ntotal = 0
    npositive = 0
    nnegative = 0
    for xi in x
        if xi > 0
            npositive += 1
        elseif xi < 0
            nnegative += 1
        end
        ntotal += 1
    end
    return max(npositive, nnegative) / ntotal
end

We might run a quick benchmark to see if it's also more efficient.

we discussed this in the past, seems like we do need an error here).

I'm alright with that

We can mention a weird theoretical caveat: let say a posterior has like 999 values of 0 and one 0.1, then the pd using the formula above will be 100%. But that kind of weirdness should be spottable by inspecting the proportion of exactly 0 values

from bayestestr.

mattansb commented on August 15, 2024

The Julia solution is less efficient in R (probably because R doesn't have the += operator?), but I found a solution that is marginally more efficient to the one we currently have:

pd_classic <- function(x, null = 0) {
  # Current algo
  max(c(length(x[x > null])/length(x), length(x[x < null])/length(x)))  
}

pd_Julia <- function(x, null = 0) {
  # Julia algo
  ntotal <- 0
  npositive <- 0
  nnegative <- 0
  for (xi in x) {
    if (xi > null) {
      npositive <- npositive + 1  
    } else if (xi < null) {
      nnegative <- nnegative + 1  
    }
    ntotal <- ntotal + 1
  }
  return(max(npositive, nnegative) / ntotal)
}

pd_new <- function(x, null = 0) {
  # New algo?
  max(sum(x > null), sum(x < null)) /length(x)
}


# continuous
microbenchmark::microbenchmark(
  pd_classic(rnorm(10000)),
  pd_Julia(rnorm(10000)),
  pd_new(rnorm(10000)), 
  
  setup = set.seed(111)
)
#> Unit: microseconds
#>                      expr   min     lq    mean median    uq   max neval cld
#>  pd_classic(rnorm(10000)) 450.0 464.60 495.095 473.35 537.3 671.1   100 a  
#>    pd_Julia(rnorm(10000)) 727.8 800.45 817.058 817.70 828.8 884.4   100  b 
#>      pd_new(rnorm(10000)) 369.1 403.15 420.443 409.85 440.6 520.0   100   c

# discrete
microbenchmark::microbenchmark(
  pd_classic(rpois(10000, 0.4)),
  pd_Julia(rpois(10000, 0.4)),
  pd_new(rpois(10000, 0.4)), 
  
  setup = set.seed(111)
)
#> Unit: microseconds
#>                           expr   min     lq    mean median     uq   max neval cld
#>  pd_classic(rpois(10000, 0.4)) 178.7 187.80 190.381  188.7 190.40 246.4   100 a  
#>    pd_Julia(rpois(10000, 0.4)) 493.2 520.35 528.264  525.8 532.90 599.0   100  b 
#>      pd_new(rpois(10000, 0.4)) 160.0 170.80 173.672  172.5 173.95 210.8   100   c

Also,

We can mention a weird theoretical caveat: let say a posterior has like 999 values of 0 and one 0.1, then the pd using the formula above will be 100%. But that kind of weirdness should be spottable by inspecting the proportion of exactly 0 values

The current solution and the Julia solution would give $pd = 0.001$:

x <- c(rep(0, 999), 0.1)

bayestestR::p_direction(x)
#> Probability of Direction: 1.00e-03
pd_Julia(x)
#> [1] 0.001
pd_new(x)
#> [1] 0.001

from bayestestr.

mattansb commented on August 15, 2024

(Why would a loop be faster than a vectorized operation in Julia? How odd....)

from bayestestr.

tobiasweckop commented on August 15, 2024

Your answers were very helpful and the paper is also very interesting.

Thank you very much!

from bayestestr.

DominiqueMakowski commented on August 15, 2024

The current solution and the Julia solution would give .001

right yeah I brainfroze

from bayestestr.

P-Direction in zero-inflated models about bayestestr HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent