Structure to combine posterior samples from different prior draws and compute loglikelihoods about bayessd HOT 10 CLOSED

hyunjimoon commented on August 26, 2024

Structure to combine posterior samples from different prior draws and compute loglikelihoods

from bayessd.

Comments (10)

tseyanglim commented on August 26, 2024

Wait, so is the idea to first generate 30 synthetic data sets using 30 randomly drawn sets of parameter values, plus process noise, and then run MCMC on each of those 30 synthetic datasets? Is the goal to see whether the original 30 sets of parameter values can be recovered?

And I'm not sure I understand the aim of step 4 - why rank them?

from bayessd.

hyunjimoon commented on August 26, 2024

goal to see whether the original 30 sets of parameter values can be recovered?

Yes the idea is to verify the generator we are using. However, I think there are ways to extend this to validation (real data, not synthetic) e.g. use real Y, not $\tilde{Y}$ but this is under development.

the aim of step 4 - why rank them?

Thanks for this question. This method, called simulation based calibration, is what Bayesian computation people use for verification (to see how statistical and computational model match well). There exist many package that allows this, (e.g. R package introduced by Andrew here:
https://statmodeling.stat.columbia.edu/2022/02/17/sbc-a-new-r-package-for-simulation-based-calibration-of-bayesian-models/

from bayessd.

tseyanglim commented on August 26, 2024

Do you have an existing mdl file that incorporates process noise in the same way that you're injecting it into the Stan version of the model?

As we discussed there are a few ways to create the 30 sets of random draws, the key distinction here being whether you want to 1) rely on Vensim's random number generation, or 2) use exactly the same 30 sets of values (and process noise stream as well) to get an exact replication of the random component you have in Stan.

My inclination is that you should do the former but it's up to you. The former would mean the datasets produced are not expected to be identical, just statistically comparable. The latter would be more complicated to implement.

Also, doing step 1 and 2 is easy enough but step 3 and 4 would be outside of Vensim / VST's current capabilities - instead you'd get 30 exported CSVs or similar files with 100 sets of parameter estimates each, that you could then use to calculate and rank log likelihoods elsewhere (Python, R, etc.) (@tomfid is there a way to get the corresponding payoff for each accepted point in the MCMC sample in Vensim? if the likelihood is constructed manually as a variable in the model?)

(Sorry I'm working quite slowly yesterday and today, have been dealing with sick baby)

from bayessd.

tomfid commented on August 26, 2024

One of the files generated in an MCMC run ends with _MCMC_points.tab, and it contains the payoff as well as the parameters for each point. It includes rejected points, so you'd have to filter a bit to match the sample.

from bayessd.

tseyanglim commented on August 26, 2024

One of the files generated in an MCMC run ends with _MCMC_points.tab, and it contains the payoff as well as the parameters for each point. It includes rejected points, so you'd have to filter a bit to match the sample.

Is there an easy way to do that (e.g. unique point identifier) or is it just a matter of matching on the parameter value sets that are in the sample file?

from bayessd.

tomfid commented on August 26, 2024

Now that I think about it, using the Vensim RNG for process noise is fine, but it would take a little care to get it behaving the same way as the Stanified version. The reason is that TIME STEP=.0625, but the point vector for process noise has 20 points at unit time intervals (we think), interpolated in between. In Vensim, you'd normally either SMOOTH a white noise input to filter the high frequencies, or use SAMPLE IF TRUE to update the noise in a stepwise fashion. Neither of these exactly matches the linear interpolation in the Stanified version. You could use VECTOR LOOKUP to get the same linear interpolation behavior I guess.

from bayessd.

tomfid commented on August 26, 2024

Points are coded - status codes are as follows:
-2 Payoff error (e.g., an FP error or negative payoff that should represent a likelihood)
-1 Rejected
0 Initial
1 Accepted
2 Repeated
3 Accepted during burnin
4 Repeated during burnin
5 Accepted, but above payoff sensitivity threshold
6 Repeated, but above payoff sensitivity threshold
7 Improvement on best payoff (this will duplicate a point reported as 0-6)

So, you'd want the 1s and 2s I guess.

from bayessd.

hyunjimoon commented on August 26, 2024

For stanify, this commit resolves this.

@tseyanglim, for hierarchical Bayesian, would using VST have some edge above Venpy? I feel as you have operated on hierarchical Bayesian, VST would have better treatments for analysis + plots with subscripts.

@tomfid, could you elaborate the process of the following plots for MCMC prey-predator here?

I am asking because @ali-akhavan89 shared during 879 seminar, he used excel for the following analysis and wonder if there are ways to directly do this on python.

from bayessd.

tomfid commented on August 26, 2024

Q answered in other thread.

I did the same sort of ground-truth analysis for the spatial/hierarchical pred prey model:

Basically everything works - all parameters are within expected +/- 2 SD ranges of the estimates (I didn't get around to plotting actual distributions yet, so nongaussian behavior might be an issue).

Since the pred/prey model has a bigger parameter space, hierarchy, and (probably) is more nonlinear, I would expect it to be a harder problem. Therefore I'd interpret the result above as a convergence failure, or maybe some kind of identifiability problem.

from bayessd.

tomfid commented on August 26, 2024

I guess another possibility in @ali-akhavan89 's slide is that "Least Squares" implies that the Normal calibration payoff distribution was used instead of Gaussian. That's a pure sum of squares, omitting the /2 and LN(sd) factors in the likelihood. The sd doesn't matter as long as you're not estimating it, but the missing /2 means the log likelihood is bigger. That would make bounds too tight. You can compensate by setting the MCMC temp=2.

from bayessd.

Structure to combine posterior samples from different prior draws and compute loglikelihoods about bayessd HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent