betanalpha / knitr_case_studies Goto Github PK

View Code? Open in Web Editor NEW

162.0 22.0 94.0 943.19 MB

Inference case studies in knitr

HTML 99.08% R 0.35% Stan 0.07% TeX 0.45% Shell 0.05% Mathematica 0.01%

r case-studies stan statistics

knitr_case_studies's People

Contributors

Stargazers

Watchers

Forkers

bluelou apollostream jguan1 mcsalgado gladstone-institutes zdelrosario jmc129 blacng royemanuel imai-n rich-d-wilkinson eugmath lyshds babicb miloventimiglia jgcolman ianasta23 leandroroser jmaurit alessandrocasa nhanhuynh46 hal2001 josuema stonegold546 davidlovitz alkemalabcourses dakshabopanna25 evgeni-nikolaev keithorourke campbell-2589 mcol shiraamitchell richarddmorey vasishth jerzybaranowski waniomar tinyvanboekel csetraynor erocarrera eipi10 andrscyv dr-dong anhnguyendepocen kyhkong innocentposas vreuter ideas-lab-nus mpmendespt baruuum schollj michaelbiafore hrgentry fazepher spinkney nicokist akatie mirca l-d-s datafusion18 maansea scisus standardgalactic geraintpratten bnesposito juliengamartin peter277 shleeneu japhir brochhagen cjyarnell svilupp zhuanzhuan0316 nickdgardner stjordanis t-huck jmkuhns asunyliu gugushvili pitmonticone yuchenjt tatsuookubo stevenworthington adayim stephenll-forks yuhangx susurant trevorlolsen animesh jpiburn am0505 divagora sarahsmithtripp zacharmentrout

knitr_case_studies's Issues

Typo

I noticed what may be a minor typo in your excellent Gaussian Processes tutorial. I believe that line 1342 (at commit 8db2461) of knitr_case_studies/gaussian_processes/gaussian_processes.Rmd should be
N_obs=data$N_obs, x_obs=data$x_obs, y_obs=data$y_obs,
currently it is
N_obs=data$N_obs, x=data$x_obs, y_obs=data$y_obs,

Syntax Error in `principled_bayesian_workflow/sample_joint_ensemble4.stan`

I just tried to compile the principled bayesian workflow case study, but this fails with a stan syntax error.
In sample_joint_ensemble4.stan you declare an array y[N] but try to assing to y_ppc[n] in the loop. Changig y_ppc to y would be consistent with the other sample*.stan files.

Section 2.5 first sentence points to section 3.4 rather than 2.4

Inversed definition of biased SBC histograms?

The case study "Towards A Principled Bayesian Workflow" presents two possible definitions of SBC ranks:

$$ \rho = \sharp \left{ \tilde{\theta} < \tilde{\theta}'_{r} \right}. $$

and

$$ \rho = \sharp \left{ \tilde{\theta} > \tilde{\theta}'_{r} \right} $$

with $\tilde{\theta}$ being the true parameter value and $\tilde{\theta}'_{r}$ being a posterior draw (I assume). For the first definition, you say

Similarly if the estimated posterior is biased low then the prior ranks will be biased high and the SBC histogram will concentrate at the right boundary.

For the second definition, you say

then a low bias in the posterior fit will manifest in a SBC histogram that concentrates at the left boundary.

My question is if these two interpretations shouldn't be inversed.

\mathcal{} symbols like chi-square are not rendered properly

\mathcal{} symbols like chi-square are not rendered properly in html. They appear as an empty rectangles in a web browser.

Typo in 'Underdetermined linear regression'?

I am not sure if I got everything from the 'Underdetermined linear regression' case study correctly, but line 50 of linear_regression_w_proj.stan is

gamma_par = append_row(beta, alpha) - gamma_perp;

though I would have thought it should be

gamma_par = (append_row(beta, alpha) - R) - gamma_perp;

Am I getting something wrong or is this a typo?

Possible typo in "Identifying Bayesian Mixture models"

Hi Michael,

Thanks for these write-ups; they are fantastic. In "Identifying Bayesian Mixture Models" I think the sum in the first equation of section 4 should be a product, to properly define the likelihood.

Best,
Connor

Minor comment on censoring vs truncation in "A Principled Bayesian Workflow"

Thanks for the great article on the Bayesian workflow. This is just a minor comment on the terminology of censoring and truncation in the given example case.

The text says:
"When the student repeated the measurement for detectors in this state they unintentionally induced a censored obsevation where the Poisson distribution for the observations is cutoff at a certain value."

But based on the terminology that I've learned from survival analysis, this would actually be truncation rather than censoring, since the observations that went over the scale of the sensor are not recorded at all in the data (they would be censored, if the student had recorded each case where the measurement went over the scale). This terminology seems also to follow the one in Bayesian Data Analysis book (2nd edition), Chapter 7.8.

(There's also a typo "obsevation" in the quoted sentence above.)

Ordinal Regression: Wrong indices in figure multi.tex

Thank you for all your excellent case studies!

In the right plot of figure multi.tex in the Ordinal Regression case study, some of the cut point indices are wrong.
I think it should be:
$p_{4} = \Pi(c_{4}) - \Pi(c_{3})$
$\Pi(c_{4}) = 1$
$\Pi(c_{3})$
$\Pi(c_{2})$
$\Pi(c_{1})$
$\Pi(c_{0}) = 0$

Prior standard deviation in "A Principled Bayesian Workflow"

Since the prior is a half-normal (i.e., truncated at 0), shouldn't we use qnorm(0.995) instead of qnorm(0.99) to get the prior standard deviation?

TruncatedNormal::qtnorm(0.99, 0, 15 / qnorm(.995), 0, Inf)
#> [1] 15
TruncatedNormal::qtnorm(0.99, 0, 15 / qnorm(.99), 0, Inf)
#> [1] 16.60863

# where

15 / qnorm(.995)
#> [1] 5.823367
15 / qnorm(.99)
#> [1] 6.447875

^{Created on 2019-11-18 by the reprex package (v0.3.0)}

Typo in probability theory doc: complement B1 should be in set of Reals

Hi Michael, thanks for writing up all the tutorials. I wasn't sure how you'd like feedback/typo catching, so I'll just post here. In the doc probability theory, the complement of B1 should be in the set of Reals.

Best,
Scott

Typo

Conditional not onditional in the following

knitr_case_studies/stochastic_differential_equations/stochastic_differential_equations.Rmd

Line 344 in 2527512

From this onditional probability density function between neighboring states we

Expectation value section 2.2 (minor nit)

The paragraph talks about f: X -> R, and then talks about E(g). Recommend making it E(f)

expression for prior_sd_lambda on inverse gamma is not quite correct

There are 2 places in principled_bayesian_workflow.Rmd where the expression
for the sd of an inverse gamma is used viz:

prior_sd_lambda <- sqrt( (9.21604)**2 / ((3.48681 - 1)**2 * (3.48681 - 1)) )

I think this should be (note the -2):

prior_sd_lambda <- sqrt( (9.21604)**2 / ((3.48681 - 1)**2 * (3.48681 - 2)) )

BTW: awesome work!

Sbc_rank in "A Principled Bayesian Workflow"

Thank you @betanalpha for this informative and hands on tutorial on the workflow!

I am confused and had been thinking for days on the sbc_rank of prior draw with respect to thinned posterior draws. the distribution of these rankings are supposed to be uniform, and for good models the result seems to be uniform.

My question, though, is why isn't it more concentrated? the code seems to calculate the rank of the true parameter that were used to generate the date against the fitted draws from posterior. If the posterior is well-fitted, wouldn't we expect to recover this parameter?

of course the choice of prior matters, in this case it means the posterior would have a tendency to go for smaller values.

then based on the code, I would expect the mode of the rank to be smaller than 250 (4000/8/2), yet the result was uniform.

Is there any resources that I can read to clear my head?

thanks

stan_utility update for other folders + rstan substitute?

Hi Mike,

I have a couple of questions about the stan_utility script:

I think the stan_utility.R's latest version is in the stan_intro folder, but would it help to have a pull request that updates it for all the other folders?
Is the check_div, check_treedepth, and check_energy fully substitutable with rstan's own functions, with the caveat that it is rstan::check_divergence and not rstan::check_div?
Curious why check_n_eff and check_rhat aren't in the package itself---or did I miss something?

My Stan is a bit rusty at the moment.

Unexpected behaviour of check_treedepth()

I'm not sure whether I am interpreting the output message correctly, for the same stan fit object I get:

 check_treedepth(tabi_res$fit, max_depth = 10)
[1] "6 of 1000 iterations saturated the maximum tree depth of 10 (0.6%)"
[1] "  Run again with max_depth set to a larger value to avoid saturation"

> check_treedepth(tabi_res$fit, max_depth = 12)
[1] "988 of 1000 iterations saturated the maximum tree depth of 12 (98.8%)"
[1] "  Run again with max_depth set to a larger value to avoid saturation"

Shouldn't the first check include the interactions of the second check?

Modifications for stan_utility.R to work with rstanarm

The stan_utility.R functions are great for model checking. If anyone wants to use them with models fit using the 'rstanarm' package, the following modified functions can be used.

# copied from 
# https://github.com/betanalpha/knitr_case_studies/blob/master/rstan_workflow/stan_utility.R
# modified to work with models fit using rstanarm 

# Tidybayes to extract model parameters and stuff
library(tidybayes) # Tested on tidybayes version 1.0.3
library(tidyverse)
library(rstanarm)

# Check transitions that ended with a divergence
check_div <- function(fit, quiet=FALSE) {
  sampler_params <- fit %>% tidy_draws()
  divergent <- sampler_params[['divergent__']]
  n = sum(divergent)
  N = length(divergent)

  if (!quiet) print(sprintf('%s of %s iterations ended with a divergence (%s%%)',
                    n, N, 100 * n / N))
  if (n > 0) {
    if (!quiet) print('  Try running with larger adapt_delta to remove the divergences')
    if (quiet) return(FALSE)
  } else {
    if (quiet) return(TRUE)
  }
}

# Check transitions that ended prematurely due to maximum tree depth limit
check_treedepth <- function(fit, max_depth = 15, quiet=FALSE) {
  sampler_params <- fit %>% tidy_draws()
  treedepths <- sampler_params[['treedepth__']]
  print(paste('Max treedepths reached is',max(treedepths)))
  n = length(treedepths[sapply(treedepths, function(x) x == max_depth)])
  N = length(treedepths)

  if (!quiet)
    print(sprintf('%s of %s iterations saturated the maximum tree depth of %s (%s%%)',
                            n, N, max_depth, 100 * n / N))
  
  if (n > 0) {
    if (!quiet) print('  Run again with max_treedepth set to a larger value to avoid saturation')
    if (quiet) return(FALSE)
  } else {
    if (quiet) return(TRUE)
  }
}

# Checks the energy fraction of missing information (E-FMI)
check_energy <- function(fit, quiet=FALSE) {
  sampler_params <- fit %>% tidy_draws()
  no_warning <- TRUE

  energies = sampler_params[['energy__']]
  numer = sum(diff(energies)**2) / length(energies)
  denom = var(energies)
  print(paste('Energy is ',numer / denom)) 
  if (numer / denom > 0.2) {
    if (!quiet) print('E-BFMI indicated no pathological behavior')
    if (quiet) return(TRUE)
  } else {
    if (!quiet) print('  E-BFMI below 0.2 indicates you may need to reparameterize your model')
    if (quiet) return(FALSE)
  }
}

# Checks the effective sample size per iteration
check_n_eff <- function(fit, quiet=FALSE) {
  fit_summary <- fit$stan_summary
  N <- dim(fit_summary)[[1]]

  iter <- prod(dim(fit$stanfit)[1:2])

  no_warning <- TRUE
  ratvals = c()
  for (n in 1:N) {
    ratio <- fit_summary[n,'n_eff'] / iter
    ratvals[n] = ratio
    if (ratio < 0.001) {
      if (!quiet) print(sprintf('n_eff / iter for parameter %s is %s!',
                        rownames(fit_summary)[n], ratio))
      no_warning <- FALSE
    }
  }
  print(paste('lowest n_eff is ',min(ratvals))) 
  if (no_warning) {
    if (!quiet) print('n_eff / iter looks reasonable for all parameters')
    if (quiet) return(TRUE)
  }
  else {
    if (!quiet) print('  n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated')
    if (quiet) return(FALSE)
  }
}

# Checks the potential scale reduction factors
check_rhat <- function(fit, quiet=FALSE) {
  fit_summary <- fit$stan_summary
  N <- dim(fit_summary)[[1]]

  ratvals = c()
  no_warning <- TRUE
  for (n in 1:N) {
    rhat <- fit_summary[n,'Rhat']
    ratvals[n] = rhat
    if (rhat > 1.1 || is.infinite(rhat) || is.nan(rhat)) {
      if (!quiet) print(sprintf('Rhat for parameter %s is %s!',
                        rownames(fit_summary)[n], rhat))
      no_warning <- FALSE
    }
  }
  print(paste('highest rhat is ',max(ratvals))) 
  if (no_warning) {
    if (!quiet) print('Rhat looks reasonable for all parameters')
    if (quiet) return(TRUE)
  } else {
    if (!quiet) print('  Rhat above 1.1 indicates that the chains very likely have not mixed')
    if (quiet) return(FALSE)
  }
}

Typo in Probability Theory (For Scientists and Engineers)

Hello, Michael

I was reading your case study on Probability Theory Concepts titled "Probability Theory (For Scientists and Engineers)" and I think I found a typo.

On the 4th paragraph of Section 2.1 Probability Distributions, you write:

Moreover, any distribution of probability should _globally consistent_ over any 
disjoint decomposition of the total space.

I believe there is a missing "be" between "should" and "globally".

I know this is minor and won't affect the comprehension of your reader, yet, your work is so thorough and top-notch that I thought you'd like to know about it so you could make the correction should you judge it necessary.

P.S.: I'd also like to take this opportunity to thank you for your generosity. It's not every day that we find sterling work like yours available as free online tutorials. It's of prime quality and deeply insightful. Your clear written explanation aided by sharp visualizations helps all of us, non-Mathematicians, to have a firmer grip on vital concepts that sometimes are poorly covered at college-level courses.

Potential Bayes' Theorem typo in Probability Theory on Product Spaces

Would you mind taking a minute to look at the last two equations of subsection 3.4 in Probability Theory on Product Spaces? There are two issues that I'm struggling to wrap my head around.

I'm used to seeing Bayes' Theorem flip the conditioning in the numerator of the right hand side,

P(A | B) = P(B|A) P(A) / P(B),

but it appears that you have the same conditioning happening on the left hand side as on the right hand side.

\pi(x_2 | \tilde{x}_1) = \frac{ \pi(x_2 | \tilde{x}_1) }{ constant } \pi(x_2)

Given Section 1.3, it seems that \pi(\tilde{x}_1 | x_2) should rightfully be function defined on x_1 \cross X_2. Is this interpretation correct?

The constant in the denominator of these equations is different, when I don't think it should be. In the first, you integrate with respect to x1, dx1, and in the second, x2, dx2. They should both integrate over x2, right?

Thanks for these great case studies. I really appreciate the clarity of mathematical concepts that you offer, especially when it comes to random variables.

Confusion & potential error in Probability Theory (For Scientists and Engineers)

First, thank you for providing these studies. I have been looking for probability theory intro material to provide to my data science students & advisees, and I think your studies will be quite useful for their intermediate-level study of probability.

There are two things that tripped me up, though.

The first is the pushforward map definition. Since $f$ is not restricted to be one-to-one at this point, isn't $f^{-1}$ a set? Use later in 4.1 would be consistent with this understanding. If $f^{-1}$ is a set, $f^{-1} \in A$ didn't make sense to me; $f^{-1} \cap A \ne \emptyset$ seems to be the meaning here.

The second is the definition of absolutely continuous in 4.3. The statement:

A measure $\nu$ is absolutely continuous with respect to another measure $\mu$ when $\nu$ allocates zero volume only to those sets which $\mu$ also allocates zero volume

My translation of this, consistent with the 'only if' in the following formula, would be that $\nu$ absolutely continuous w.r.t. $\mu$ means $\nu(A) = 0 \implies \mu(A) = 0$ ($\nu(A) = 0$ only if $\mu(A) = 0$). However, this is the converse of the definition I find in Athreya and Lahiri (p. 53) and the Encyclopedia of Mathematics: $\nu$ is absolutely continuous w.r.t. $\mu$ if $\mu(A) = 0 \implies \nu(A) = 0$. If I'm understanding correctly, it should say:

A measure $\nu$ is absolutely continuous with respect to another measure $\mu$ when $\nu$ allocates zero volume to all sets to which $\mu$ also allocates zero volume

It's quite possible I'm missing something, as I am pretty new to measure theory, but I don't see how these definitions don't contradict.

Misstatement of non-centered implementation?

Awesome resource Michael. However, I think there's a conflict between your description of the non-centered implementation of the Eight Schools model and the actual Stan code. The code states that "theta[j] = mu + tau * theta_tilde[j]" whereas the written description of the model has sigma in place of tau. Figured it would be worth the clarification for those (like myself) who are just getting familiar with coding non-centered versions of models.

Typos in figures of section 4.3.4 of probability on product spaces?

Hi there - I think I've spotted some typos in Section 4.3.4 of probability on product spaces, specifically in the equations in the figures:

in the first figure - i.e.dgm_condition_1.png - the first line should be condtional on \tilde{x}_1 rather than \tilde{x}_4
in the second figure - i.e.dgm_condition_2.png - on the last line, in the final two factors, x_4 should be replaced by x_1

Also: thanks for all the great resources!

Typo in posterior z-score?

In the case study "Towards A Principled Bayesian Workflow", I think there's a typo in the definition of the posterior z-score (section 1.3.2 "A Bayesian Eye Chart"): Shouldn't the denominator be the marginal posterior standard deviation instead of the marginal posterior mean (see also here)?

Typo in Metropolis-Hastings conditional PDF

Hi!

I think the Metropolis-Hastings conditional PDF in markov_chain_monte_carlo.Rmd should be:

$$
T(q' \mid q) = a(q', q) \cdot Q (q' \mid q )
+ \left(1 - \int a(q, q') Q(q' \mid q) d q' \right) \cdot \delta(q - q').
$$

instead of


$$
T(q' \mid q) = a(q', q) \cdot Q (q' \mid q )
+ \left(1 - a(q, q') \right) \cdot \delta(q - q').
$$

Else, the conditional PDF T(q' | q) doesn't integrate to 1. See, for instance, pg 3 of this note.

Great work by the way! I found the bit on correctly estimating the maximum lag for the autocorrelation very useful.

Probability distribution breakdown in figure error?

The figures show:

P(A_5)+P(A_6) = P (A_1) = 1

(and similarly the one next to it). That does not look correct as P(A_1) + P (A_2) = P(A_0) = 1. It should be:

P(A_5) + P(A_6) = P(A_1) (but not equal to 1).

Half Normal or Log Normal

Your gaussian processes 3 write-up states that the priors for the hyper parameters are half normal,

e.g.
$$\alpha \sim $ half- $\mathcal{N}(0,2)$$

and the corresponding code is
parameters{ ... real<lower=0> alpha; } model{ alpha ~ normal(0,2); ... }

But according to the stan reference manual (v 2.16.0) pg. 400: "If a variable X is declared to have a lower bound $a$, it is transformed to be an unbounded random variable $Y$, where ."

$$ Y = log (X - A)$$

which in this case would just be the log, suggesting that the new random variable is log normal, not half - would you mind clearing up this discrepancy for me?

I'm using a similar parameterization in a model I'm working with and would like to know the "real" prior distribution I'm using. - Many Thanks.

Predictive Realizations vs Posterior Predictive Realizations

I am a bit confused why the plots for Predictive Realizations and Posterior Predictive Realizations (in Part 1 of Gaussian Processes) are identical.

Could it be that plot_gp_realizations was used, mistakenly, instead of plot_gp_pred_realizations to plot the Posterior Predictive Realizations?

Ordinal regression determinant

I like your ordinal regression case study, and the approach to setting priors on the cutpoints is relevant to some models I am working on. I think that the determinant of that Jacobian matrix does have an analytic determinant, though. I have not written up a full proof, but I am convinced that the determinant is the product of the diagonal times the matrix dimension. In R, it would be something like

prod(diag(J)) * nrow(J)

Here is some code for testing this expression on matrices of the specified structure, with random entries:

nrep <- 100
## dimension of matrix:
nr <- sample(3:20, 100, replace=TRUE)

res <- matrix(NA, nrep, 2)

for(i in 1:nrep){
  ## matrix entries:
  ents <- rnorm((nr[i] - 1), sd=2)

  tmpmat <- matrix(0, nr[i], nr[i])
  tmpmat[,1] <- 1

  for(j in 1:(nr[i] - 1)){
    tmpmat[j:(j+1), (j+1)] <- c(ents[j], -ents[j])
  }

  res[i,1] <- det(tmpmat)
  res[i,2] <- prod(diag(tmpmat)) * nr[i]
}

all.equal(res[,1], res[,2])

Typo in interpretation of posterior z-score and contraction?

In the case study "Towards A Principled Bayesian Workflow", I think the sentence

On the other hand a posterior distribution with both a small posterior z-score and a small posterior contraction indicates ideal learning behavior.

has to read

"On the other hand a posterior distribution with both a small posterior z-score and a high posterior contraction indicates ideal learning behavior."

Otherwise, it would also contradict with the sentence

Likewise a posterior distribution with a small posterior z-score and a small posterior contraction is only poorly informed by the observations.