Code Monkey home page Code Monkey logo

Comments (10)

tseyanglim avatar tseyanglim commented on August 26, 2024

Wait, so is the idea to first generate 30 synthetic data sets using 30 randomly drawn sets of parameter values, plus process noise, and then run MCMC on each of those 30 synthetic datasets? Is the goal to see whether the original 30 sets of parameter values can be recovered?

And I'm not sure I understand the aim of step 4 - why rank them?

from bayessd.

hyunjimoon avatar hyunjimoon commented on August 26, 2024

goal to see whether the original 30 sets of parameter values can be recovered?

Yes the idea is to verify the generator we are using. However, I think there are ways to extend this to validation (real data, not synthetic) e.g. use real Y, not $\tilde{Y}$ but this is under development.

the aim of step 4 - why rank them?

Thanks for this question. This method, called simulation based calibration, is what Bayesian computation people use for verification (to see how statistical and computational model match well). There exist many package that allows this, (e.g. R package introduced by Andrew here:
https://statmodeling.stat.columbia.edu/2022/02/17/sbc-a-new-r-package-for-simulation-based-calibration-of-bayesian-models/

from bayessd.

tseyanglim avatar tseyanglim commented on August 26, 2024

Do you have an existing mdl file that incorporates process noise in the same way that you're injecting it into the Stan version of the model?

As we discussed there are a few ways to create the 30 sets of random draws, the key distinction here being whether you want to 1) rely on Vensim's random number generation, or 2) use exactly the same 30 sets of values (and process noise stream as well) to get an exact replication of the random component you have in Stan.

My inclination is that you should do the former but it's up to you. The former would mean the datasets produced are not expected to be identical, just statistically comparable. The latter would be more complicated to implement.

Also, doing step 1 and 2 is easy enough but step 3 and 4 would be outside of Vensim / VST's current capabilities - instead you'd get 30 exported CSVs or similar files with 100 sets of parameter estimates each, that you could then use to calculate and rank log likelihoods elsewhere (Python, R, etc.) (@tomfid is there a way to get the corresponding payoff for each accepted point in the MCMC sample in Vensim? if the likelihood is constructed manually as a variable in the model?)

(Sorry I'm working quite slowly yesterday and today, have been dealing with sick baby)

from bayessd.

tomfid avatar tomfid commented on August 26, 2024

One of the files generated in an MCMC run ends with _MCMC_points.tab, and it contains the payoff as well as the parameters for each point. It includes rejected points, so you'd have to filter a bit to match the sample.

from bayessd.

tseyanglim avatar tseyanglim commented on August 26, 2024

One of the files generated in an MCMC run ends with _MCMC_points.tab, and it contains the payoff as well as the parameters for each point. It includes rejected points, so you'd have to filter a bit to match the sample.

Is there an easy way to do that (e.g. unique point identifier) or is it just a matter of matching on the parameter value sets that are in the sample file?

from bayessd.

tomfid avatar tomfid commented on August 26, 2024

Now that I think about it, using the Vensim RNG for process noise is fine, but it would take a little care to get it behaving the same way as the Stanified version. The reason is that TIME STEP=.0625, but the point vector for process noise has 20 points at unit time intervals (we think), interpolated in between. In Vensim, you'd normally either SMOOTH a white noise input to filter the high frequencies, or use SAMPLE IF TRUE to update the noise in a stepwise fashion. Neither of these exactly matches the linear interpolation in the Stanified version. You could use VECTOR LOOKUP to get the same linear interpolation behavior I guess.

from bayessd.

tomfid avatar tomfid commented on August 26, 2024

Points are coded - status codes are as follows:
-2 Payoff error (e.g., an FP error or negative payoff that should represent a likelihood)
-1 Rejected
0 Initial
1 Accepted
2 Repeated
3 Accepted during burnin
4 Repeated during burnin
5 Accepted, but above payoff sensitivity threshold
6 Repeated, but above payoff sensitivity threshold
7 Improvement on best payoff (this will duplicate a point reported as 0-6)

So, you'd want the 1s and 2s I guess.

from bayessd.

hyunjimoon avatar hyunjimoon commented on August 26, 2024

For stanify, this commit resolves this.

@tseyanglim, for hierarchical Bayesian, would using VST have some edge above Venpy? I feel as you have operated on hierarchical Bayesian, VST would have better treatments for analysis + plots with subscripts.

@tomfid, could you elaborate the process of the following plots for MCMC prey-predator here?

image

I am asking because @ali-akhavan89 shared during 879 seminar, he used excel for the following analysis and wonder if there are ways to directly do this on python.
image

from bayessd.

tomfid avatar tomfid commented on August 26, 2024

Q answered in other thread.

I did the same sort of ground-truth analysis for the spatial/hierarchical pred prey model:
image

Basically everything works - all parameters are within expected +/- 2 SD ranges of the estimates (I didn't get around to plotting actual distributions yet, so nongaussian behavior might be an issue).

Since the pred/prey model has a bigger parameter space, hierarchy, and (probably) is more nonlinear, I would expect it to be a harder problem. Therefore I'd interpret the result above as a convergence failure, or maybe some kind of identifiability problem.

from bayessd.

tomfid avatar tomfid commented on August 26, 2024

I guess another possibility in @ali-akhavan89 's slide is that "Least Squares" implies that the Normal calibration payoff distribution was used instead of Gaussian. That's a pure sum of squares, omitting the /2 and LN(sd) factors in the likelihood. The sd doesn't matter as long as you're not estimating it, but the missing /2 means the log likelihood is bigger. That would make bounds too tight. You can compensate by setting the MCMC temp=2.

from bayessd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.