Code Monkey home page Code Monkey logo

Comments (10)

mattansb avatar mattansb commented on August 15, 2024

This is actually okay. The actual definition of the pd is:

$$ max(p(x < 0), p(x > 0)) $$

That is, the maximum aposteriori probably that x has a sign.

Currently, the docs reflect only cases in which x is continuous, which is the cause for confusion.
I will update the docs to reflect the possible values and interpretation for cases with discrete (or mixture with discrete) values.

from bayestestr.

tobiasweckop avatar tobiasweckop commented on August 15, 2024

That makes sense but in this case the formula for conversion of a p-direction into a two-sided frequentist p-value does not work anymore, since by the formula 2 * (1 - p_direction) we would reach values larger than 1 if p_direction < 0.5.

Is there any way to fix this calculation in my scenario to get the equivalency of the p-direction with frequentist p-values back?

from bayestestr.

mattansb avatar mattansb commented on August 15, 2024

I'm not sure if or how the pd to p conversion is valid if there are discrete 0s.

Do you have a reproducible example you can share?

from bayestestr.

tobiasweckop avatar tobiasweckop commented on August 15, 2024

It is hard to provide an example without posting lots of code but what i am trying to do is compare the inferences made by Bayesian Model Averaging of linear regression models with frequentist methods. As it is a known problem with model selection techniques such as Stepwise Regression that p-values are systematically underestimated, i would like to examine if BMA is more robust in this regard. The p-direction seems very fitting for this question, as it translates pretty well into frequentist p-values.

Right now i am just creating data following a linear model y <- X %*% Beta + rnorm(0,1) with some positive beta coefficients and some betas equal to zero. I am giving this to the BMS package and looking at the marginal densities it returns.

Now the problem is that the coefficient plot is only accurate if the top models considered by BMA actually include this predictor. Since this is almost never the case for all variables, i have used their posterior inclusion probabilities to inflate the model with zeroes, such that the marginal density now also includes information about those models, that did not include the predictor at all. This was done by sampling 1.000.000 times from the marginal distribution and then replacing a portion of the sample with zeroes. The proportion of zeroes is equal to the posterior probability that was given to those models, that do not contain the regressor in question.

Depending on how important the BMA Ensemble considers a predictor to be, the coefficient plots look like either one of the following plots. These plots are the zero-inflated marginal densities of each beta averaged across multiple simulation runs to make them more robust against sample variations.

7a8b999e-98e3-4fb5-8876-a164b8f1b947
0944056e-2f5e-49d9-841b-2fdb34bc0498
4e134d94-6309-46e1-8c3b-cc1378544f01

Intuitively, the last plot suggests a very high probability that the predictor variable is zero (or very close) and so i would like to have a p-value like metric to reflect this idea.

from bayestestr.

mattansb avatar mattansb commented on August 15, 2024

Two points:

  1. Once you discretize the parameter space like this, the pd-to-p conversion is not valid.

See example:

library(MuMIn)
library(brms)
library(bayestestR)

set.seed(1234)
d <- tibble::tibble(
  x = rnorm(100),
  y = rnorm(100, x, sd = 6)
)

## Freq ---------------------------------------------------

m0 <- lm(y ~ 1, d)
m1 <- lm(y ~ 1 + x, d)

ma <- model.avg(m0, m1)

## Bayes ---------------------------------------------------

m0b <- brm(y ~ 1, d, backend = "cmdstanr", cores = 4)
m1b <- brm(y ~ 1 + x, d, backend = "cmdstanr", cores = 4)

mab <- posterior_average(m1b, m0b, missing = list(b_x = 0))

## Compare ------------------------------------------------

### Estimate

apply(mab[1:2], 2, mean)
#> b_Intercept         b_x 
#>   0.1567762   0.3292163 
coef(ma, full = TRUE)
#> (Intercept)           x 
#>   0.1523379   0.3932120 

### P vs PD

(pd1 <- p_direction(m1b, parameters = "x"))
#> Probability of Direction 
#> 
#> Parameter |     pd
#> ------------------
#> x         | 91.03%
pd_to_p(pd1[["pd"]])
#> [1] 0.1795
summary(m1)[["coefficients"]]
#>              Estimate Std. Error   t value  Pr(>|t|)
#> (Intercept) 0.2229243  0.6298725 0.3539198 0.7241593
#> x           0.8434903  0.6226555 1.3546662 0.1786388


(pda <- p_direction(mab["b_x"]))
#> Probability of Direction
#> 
#> Parameter |     pd
#> ------------------
#> b_x       | 35.68%
summary(ma)[["coefmat.full"]]
#>              Estimate Std. Error Adjusted SE   z value  Pr(>|z|)
#> (Intercept) 0.1523379  0.6306727   0.6384294 0.2386136 0.8114052
#> x           0.3932120  0.5981554   0.6019445 0.6532364 0.5136039
pd_to_p(pda[["pd"]])
#> Probability of Direction: 0.71

Also, a gentle reminder that p-values cannot be used to support the null. However...

  1. When you discretize the parameter space like this, pd can be used to support the null! If pd << 0.5, the is more evidence that the parameter does not have a direction. You can read more about this in this paper: https://doi.org/10.1177/2515245921992035

To do:

  • Revise p-d docs
    • How the pd is actually computed
    • possible ranges of values for continuous (0.5-1) vs discretized (0-1) parameter spaces.
    • How values close to 0 can be used to support the null (discretized) hypothesis (https://doi.org/10.1177/2515245921992035)
  • Revise pd-to-p
    • docs: not valid for discretized parameter spaces
    • when pd is smaller than 0.5 we need to either:
      • error (@DominiqueMakowski we discussed this in the past, seems like we do need an error here).
      • return p of exactly 1 with a warning

from bayestestr.

DominiqueMakowski avatar DominiqueMakowski commented on August 15, 2024

Yes, we should remove the definition to "of the median's direction".

Note that during a PR to add pd to Julia/Turing, one of the dev's came up with a much more efficient algorithm than the one-liner equivalent that we have in bayestestR

(though it doesn't look like it):

function p_direction(x::AbstractArray{<:Union{Missing,Real}})
    ntotal = 0
    npositive = 0
    nnegative = 0
    for xi in x
        if xi > 0
            npositive += 1
        elseif xi < 0
            nnegative += 1
        end
        ntotal += 1
    end
    return max(npositive, nnegative) / ntotal
end

We might run a quick benchmark to see if it's also more efficient.

we discussed this in the past, seems like we do need an error here).

I'm alright with that

We can mention a weird theoretical caveat: let say a posterior has like 999 values of 0 and one 0.1, then the pd using the formula above will be 100%. But that kind of weirdness should be spottable by inspecting the proportion of exactly 0 values

from bayestestr.

mattansb avatar mattansb commented on August 15, 2024

The Julia solution is less efficient in R (probably because R doesn't have the += operator?), but I found a solution that is marginally more efficient to the one we currently have:

pd_classic <- function(x, null = 0) {
  # Current algo
  max(c(length(x[x > null])/length(x), length(x[x < null])/length(x)))  
}

pd_Julia <- function(x, null = 0) {
  # Julia algo
  ntotal <- 0
  npositive <- 0
  nnegative <- 0
  for (xi in x) {
    if (xi > null) {
      npositive <- npositive + 1  
    } else if (xi < null) {
      nnegative <- nnegative + 1  
    }
    ntotal <- ntotal + 1
  }
  return(max(npositive, nnegative) / ntotal)
}

pd_new <- function(x, null = 0) {
  # New algo?
  max(sum(x > null), sum(x < null)) /length(x)
}


# continuous
microbenchmark::microbenchmark(
  pd_classic(rnorm(10000)),
  pd_Julia(rnorm(10000)),
  pd_new(rnorm(10000)), 
  
  setup = set.seed(111)
)
#> Unit: microseconds
#>                      expr   min     lq    mean median    uq   max neval cld
#>  pd_classic(rnorm(10000)) 450.0 464.60 495.095 473.35 537.3 671.1   100 a  
#>    pd_Julia(rnorm(10000)) 727.8 800.45 817.058 817.70 828.8 884.4   100  b 
#>      pd_new(rnorm(10000)) 369.1 403.15 420.443 409.85 440.6 520.0   100   c

# discrete
microbenchmark::microbenchmark(
  pd_classic(rpois(10000, 0.4)),
  pd_Julia(rpois(10000, 0.4)),
  pd_new(rpois(10000, 0.4)), 
  
  setup = set.seed(111)
)
#> Unit: microseconds
#>                           expr   min     lq    mean median     uq   max neval cld
#>  pd_classic(rpois(10000, 0.4)) 178.7 187.80 190.381  188.7 190.40 246.4   100 a  
#>    pd_Julia(rpois(10000, 0.4)) 493.2 520.35 528.264  525.8 532.90 599.0   100  b 
#>      pd_new(rpois(10000, 0.4)) 160.0 170.80 173.672  172.5 173.95 210.8   100   c

Also,

We can mention a weird theoretical caveat: let say a posterior has like 999 values of 0 and one 0.1, then the pd using the formula above will be 100%. But that kind of weirdness should be spottable by inspecting the proportion of exactly 0 values

The current solution and the Julia solution would give $pd = 0.001$:

x <- c(rep(0, 999), 0.1)

bayestestR::p_direction(x)
#> Probability of Direction: 1.00e-03
pd_Julia(x)
#> [1] 0.001
pd_new(x)
#> [1] 0.001

from bayestestr.

mattansb avatar mattansb commented on August 15, 2024

(Why would a loop be faster than a vectorized operation in Julia? How odd....)

from bayestestr.

tobiasweckop avatar tobiasweckop commented on August 15, 2024

Your answers were very helpful and the paper is also very interesting.

Thank you very much!

from bayestestr.

DominiqueMakowski avatar DominiqueMakowski commented on August 15, 2024

The current solution and the Julia solution would give .001

right yeah I brainfroze

from bayestestr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.