Comments (10)
Ok for including the gradient, as long as the field is allowed to be empty when the model does not provide it.
I have taken a look at the mcmc
type of the coda R package and it seems pretty simple : a array with named columns and start/end/thinning interval. All this is apparently enough to have all their diagnostic and plotting tools so we are probably not forgetting something fundamental.
We could also be storing both samples and gradient in a DataFrame to take advantage of the added functionality (summary, subsetting, etc.).
How about this ?
type MCMCChain
range::Range
samples::DataFrame
gradient::DataFrame
diag::DataFrame
task::MCMCTask
runTime::Float64
end
range
would contain the start/stop and optionally thinning interval represented as a Rangesamples
andgradient
would be DataFrames constrained to have the same columns and same number of rowsdiag
would be diagnostic variables, typically dependant on the sampler, such an accept/reject boolean for MH samplers, or weight for sequential MC. Constrained to have same number of rows assamples
.
Open questions :
- for model parameters that are vector or arrays, should we split all their individual values across multiple columns (easier for diagnostic tools) or should we use a single dataframe column (clearer)
diag
is ok for diagnostic variables that exist for each sample, but not for those that are meaningful for the full run (acceptance rate, ...). Should we add a field for that ? (note that in the case of the acceptance rate it can be calculated dynamically in theshow
method and needs not really be stored).
from klara.jl.
Your proposed MCMCChain
type definition looks good to me @fredo-dedup.
As for the open questions:
- my suggestion would be to use a single dataframe column for model parameters that are vectors, because I agree that it seems to be clearer in terms of coding, and also because your suggested
MCMCChain
has a dataframe take, so let's adhere to dataframe field definitions as much as possible then. - I would say that an extra field for diagnostic variables that are meaningful for the full run, such as the acceptance rate, is probably not needed for now. Since the acceptantance rate can be calculated dynamically via
show
, we can wrapshow
in a function that will take an extra optional parameter which would determine the length of the sub-chain for which the acceptance rate would be calculated. For example, if the length of the chain is 1000, the default value of the optional argument would be the whole length (1000), and if the user wants, he may compute the rate for the first 100 iterations with the function callrate(len=100)
.
A final pedantic point concerning the naming convention of the fields:
- shall we used
sample
instead ofsamples
? Otherwise, we may changegradient
togradients
. diag
seems at a first look ambiguous to the user, because it reminds the short name for the diagonal of a matrix. Maybediagnostic
is a clearer name?- This thought doesn't have to do with the
MCMCChain
, just had in mind that thebeta
name may be changed topars
orparameters
throughout theMCMC
package.
These suggestions on the names of variables are very flexible - just sharing them, they don't imply we need to follow them.
from klara.jl.
- I would use the plural for both samples and gradients, since the field contains multiple samples/gradients
- Agree for diagnostic(s ?) and pars
As a general rule, I will gladly bow to all renaming suggestions since English is not my first language.
from klara.jl.
English is not my mother tongue either, so I am also open to naming suggestions. Good, we agree for now to the names samples
, gradients
, diagnostics
and pars
, as long as nobody else comes up with other suggestions.
from klara.jl.
@fredo-dedup, I changed the MCMCChain
type according to your suggestion and our discussion. There are a few points on which I would like your feedback-help (whenever you get the chance to look at the changes):
- I was thinking of defining
samples
andgradients
asDataArray
instead ofDataFrame
, if you agree, and keep onlydiagnostics
as aDataFrame
. This would make the numerical and statistical processing ofsamples
andgradients
easier (I don't know if it will offer any relative speed up too). - Although I have included the
range
field in theMCMCChain
definition, I haven't implemented it. The reason for not having done this yet is because I haven't clarified to myself how you have formulated in your mind the role ofrange
in relation to the code insrc/runners/run.jl
. Would you like to sort this part out, as you probably have already a better view of the specific matter? - Line 62 (
res.misc[:mod][pos] = at
) ofsrc/runners/serialMC.jl
has been temporarily commented out, because I wanted to ask you to which field of the newly definedMCMCChain
the oldmisc
field would correspond.
from klara.jl.
@fredo-dedup, just a quick update:
- I left
samples
andgradients
as they are (dataframes) because this allows extra functionality that may be useful as you had previously pointed out. - I implemented the
range
field ofMCMCChain
insrc/runners/run.jl
, including the thinning interval.
This probably completesMCMCChain
, apart from: - having to transpose the dataframe fields so that they have (mcmclength, npars) size instead of (npars, mcmclength) size. We would want each column of the
samples
dataframe to correspond to a covariate (parameter) of the model, not to an iteration of the sampler. I don't know yet how easy would be to make this change, but it seems the natural thing to do. - I haven't updated the seqMC, README, haven't looked into the
misc
field yet, so there are some minor loose ends to cover still.
from klara.jl.
Sorry for being slow to answer, I was having problem with my packages installation since yesterday.
I was also working on the same part of code. Since you were the first to shoot, I'll merge my changes with yours.
I don't really know what to prefer about the DataFrame / DataArray choice. I think DataFrames would allow for different kinds of parameters (one column containing a scalar, alongside another column containing a vector for example; could be useful too for samplings that generate a vector of varying size such as Dirichlet processes).On the other hand the post processing would be more complicated I agree.
If we stick with DataFrames we can keep that option open (splitting or not vectors across columns).
I'll move the misc
data to diagnostics..
from klara.jl.
Thanks @fredo-dedup, you weren't slow (I found some time yest night to do a bit of work). Sure, merge any additional work you did.
Let's stick with dataframes then. Even if the post-processing is at first a bit more complicated, we will simply add another layer of functions to assist the users.
Great you will take care of misc, as you are more familiar with its functionality.
from klara.jl.
I have rewrote part of run() to handle ranges a little differently in the arguments list (see updated README), and added column naming in the samples DataFrame (in the same manner of STAN and jags, by appending .1, .2, .. after the parameter name).
I have also extended the task
field of MCMCChain
to accommodate Arrays of MCMCTasks which is more adapted for seqMC.
Weights are now in the diagnostics field.
I'll close that issue since the MCMCChain is more or less operational.
from klara.jl.
That's great @fredo-dedup.
from klara.jl.
Related Issues (20)
- Decide whether to add univariate AM or drop AM sampler HOT 1
- Possible DSL sitting atop of existing functionality
- Drop Basic prefix in various type names
- Add some simple examples from OpenBUGS, volume 1
- MethodError: Cannot `convert` an object of type Array{Float64,1} to an object of type Float64 HOT 3
- Add option to bound parameters HOT 3
- Deprecation warnings for v0.6 HOT 2
- World age issue due to code generation inside an inner constructor HOT 2
- Issue related to where keyword and parametric types HOT 7
- How to replace consume and produce with Channels HOT 1
- Implement AMWG sampler
- MCMC in GaussianProcesses.jl HOT 24
- Eliminate code generation to avoid world age issues in Julia v0.6
- HMC with reverse-mode autodiff HOT 1
- ForwardDiff 0.4 0.6 HOT 4
- Is there any guidance for custom sample space HOT 2
- Monitoring Metropolis-Hastings Internal Quantities
- Support Julia 1.0 HOT 5
- Gibbs sampling from a posterior distribution kernel
- Package compatibility caps
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from klara.jl.