Apart from the samples field of the <code class="notr

I would use the plural for both samples and gradients, since the field contains

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

That's great <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Fields of MCMCChain about klara.jl HOT 10 CLOSED

papamarkou commented on June 11, 2024

Fields of MCMCChain

from klara.jl.

Comments (10)

fredo-dedup commented on June 11, 2024

Ok for including the gradient, as long as the field is allowed to be empty when the model does not provide it.

I have taken a look at the mcmc type of the coda R package and it seems pretty simple : a array with named columns and start/end/thinning interval. All this is apparently enough to have all their diagnostic and plotting tools so we are probably not forgetting something fundamental.

We could also be storing both samples and gradient in a DataFrame to take advantage of the added functionality (summary, subsetting, etc.).

How about this ?

type MCMCChain
   range::Range
   samples::DataFrame
   gradient::DataFrame
   diag::DataFrame
   task::MCMCTask
   runTime::Float64
end

range would contain the start/stop and optionally thinning interval represented as a Range
samplesand gradient would be DataFrames constrained to have the same columns and same number of rows
diag would be diagnostic variables, typically dependant on the sampler, such an accept/reject boolean for MH samplers, or weight for sequential MC. Constrained to have same number of rows as samples.

Open questions :

for model parameters that are vector or arrays, should we split all their individual values across multiple columns (easier for diagnostic tools) or should we use a single dataframe column (clearer)
diag is ok for diagnostic variables that exist for each sample, but not for those that are meaningful for the full run (acceptance rate, ...). Should we add a field for that ? (note that in the case of the acceptance rate it can be calculated dynamically in the show method and needs not really be stored).

from klara.jl.

papamarkou commented on June 11, 2024

Your proposed MCMCChain type definition looks good to me @fredo-dedup.

As for the open questions:

my suggestion would be to use a single dataframe column for model parameters that are vectors, because I agree that it seems to be clearer in terms of coding, and also because your suggested MCMCChain has a dataframe take, so let's adhere to dataframe field definitions as much as possible then.
I would say that an extra field for diagnostic variables that are meaningful for the full run, such as the acceptance rate, is probably not needed for now. Since the acceptantance rate can be calculated dynamically via show, we can wrap show in a function that will take an extra optional parameter which would determine the length of the sub-chain for which the acceptance rate would be calculated. For example, if the length of the chain is 1000, the default value of the optional argument would be the whole length (1000), and if the user wants, he may compute the rate for the first 100 iterations with the function call rate(len=100).

A final pedantic point concerning the naming convention of the fields:

shall we used sample instead of samples? Otherwise, we may change gradient to gradients.
diag seems at a first look ambiguous to the user, because it reminds the short name for the diagonal of a matrix. Maybe diagnostic is a clearer name?
This thought doesn't have to do with the MCMCChain, just had in mind that the beta name may be changed to pars or parameters throughout the MCMC package.

These suggestions on the names of variables are very flexible - just sharing them, they don't imply we need to follow them.

from klara.jl.

fredo-dedup commented on June 11, 2024

I would use the plural for both samples and gradients, since the field contains multiple samples/gradients
Agree for diagnostic(s ?) and pars

As a general rule, I will gladly bow to all renaming suggestions since English is not my first language.

from klara.jl.

papamarkou commented on June 11, 2024

English is not my mother tongue either, so I am also open to naming suggestions. Good, we agree for now to the names samples, gradients, diagnostics and pars, as long as nobody else comes up with other suggestions.

from klara.jl.

papamarkou commented on June 11, 2024

@fredo-dedup, I changed the MCMCChain type according to your suggestion and our discussion. There are a few points on which I would like your feedback-help (whenever you get the chance to look at the changes):

I was thinking of defining samples and gradients as DataArray instead of DataFrame, if you agree, and keep only diagnostics as a DataFrame. This would make the numerical and statistical processing of samples and gradients easier (I don't know if it will offer any relative speed up too).
Although I have included the range field in the MCMCChain definition, I haven't implemented it. The reason for not having done this yet is because I haven't clarified to myself how you have formulated in your mind the role of range in relation to the code in src/runners/run.jl. Would you like to sort this part out, as you probably have already a better view of the specific matter?
Line 62 (res.misc[:mod][pos] = at) of src/runners/serialMC.jl has been temporarily commented out, because I wanted to ask you to which field of the newly defined MCMCChain the old misc field would correspond.

from klara.jl.

papamarkou commented on June 11, 2024

@fredo-dedup, just a quick update:

I left samples and gradients as they are (dataframes) because this allows extra functionality that may be useful as you had previously pointed out.
I implemented the range field of MCMCChain in src/runners/run.jl, including the thinning interval.
This probably completes MCMCChain, apart from:
having to transpose the dataframe fields so that they have (mcmclength, npars) size instead of (npars, mcmclength) size. We would want each column of the samples dataframe to correspond to a covariate (parameter) of the model, not to an iteration of the sampler. I don't know yet how easy would be to make this change, but it seems the natural thing to do.
I haven't updated the seqMC, README, haven't looked into the misc field yet, so there are some minor loose ends to cover still.

from klara.jl.

fredo-dedup commented on June 11, 2024

Sorry for being slow to answer, I was having problem with my packages installation since yesterday.

I was also working on the same part of code. Since you were the first to shoot, I'll merge my changes with yours.

I don't really know what to prefer about the DataFrame / DataArray choice. I think DataFrames would allow for different kinds of parameters (one column containing a scalar, alongside another column containing a vector for example; could be useful too for samplings that generate a vector of varying size such as Dirichlet processes).On the other hand the post processing would be more complicated I agree.

If we stick with DataFrames we can keep that option open (splitting or not vectors across columns).

I'll move the misc data to diagnostics..

from klara.jl.

papamarkou commented on June 11, 2024

Thanks @fredo-dedup, you weren't slow (I found some time yest night to do a bit of work). Sure, merge any additional work you did.

Let's stick with dataframes then. Even if the post-processing is at first a bit more complicated, we will simply add another layer of functions to assist the users.

Great you will take care of misc, as you are more familiar with its functionality.

from klara.jl.

fredo-dedup commented on June 11, 2024

I have rewrote part of run() to handle ranges a little differently in the arguments list (see updated README), and added column naming in the samples DataFrame (in the same manner of STAN and jags, by appending .1, .2, .. after the parameter name).

I have also extended the task field of MCMCChain to accommodate Arrays of MCMCTasks which is more adapted for seqMC.

Weights are now in the diagnostics field.

I'll close that issue since the MCMCChain is more or less operational.

from klara.jl.

papamarkou commented on June 11, 2024

That's great @fredo-dedup.

from klara.jl.

Fields of MCMCChain about klara.jl HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent