juliastats / klara.jl Goto Github PK

View Code? Open in Web Editor NEW

168.0 168.0 38.0 1.26 MB

MCMC inference in Julia

License: Other

Julia 100.00%

klara.jl's People

Contributors

Stargazers

Watchers

klara.jl's Issues

Parsing model expressions gives error

I'm really like the MCMC package, in particular the neat interface, and the autodiff possibility.

Unfortunately, I cant parse model expressions. For example:

modelxpr = quote
    v ~ Normal(0, 1)
end
mymodel3 = model(modelxpr, v=ones(3))

throws an error:

ERROR: [parseModel] unmanaged expr type macrocall
 in error at error.jl:21
 in explore at /home/ich/.julia/MCMC/src/autodiff/parsing.jl:86
 in explore at /home/ich/.julia/MCMC/src/autodiff/parsing.jl:85
 in explore at /home/ich/.julia/MCMC/src/autodiff/parsing.jl:100
 in explore at /home/ich/.julia/MCMC/src/autodiff/parsing.jl:85
 in parseModel! at /home/ich/.julia/MCMC/src/autodiff/parsing.jl:119
 in generateModelFunction at /home/ich/.julia/MCMC/src/autodiff/parsing.jl:429
 in MCMCLikelihoodModel at /home/ich/.julia/MCMC/src/modellers/likmodel.jl:93
 in model at /home/ich/.julia/MCMC/src/modellers/mcmcmodels.jl:28

I'm using MCMC 0.0.3 and the
versioninfo() output is:

Julia Version 0.3.0-prerelease
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3687U CPU @ 2.10GHz
  WORD_SIZE: 64
  BLAS: libblas.so.3
  LAPACK: liblapack.so.3
  LIBM: libopenlibm

I hope that helps.

Avoid allocating memory

The current codebase seems to be a little liberal with memory. If we're hoping to achieve good performance, we probably need to avoid lines like:

oldbeta = copy(beta)
beta += randn(model.size) * scale

I'm still working through everything, but this code seems to involve allocating a new vector for every sample, which we shouldn't really be doing. At least in principle, we can perform all memory allocation at the start of an MCMC run.

What is desired for Bayes GLM models?

How general would you want the specification of a Bayes GLM model to be? If the prior on the coefficients for the linear predictor is multivariate normal then the calculation of the estimates is straightforward. Just replace the iteratively reweighted least squares (IRLS) algorithm by a penalized iteratively reweighted least squares (PIRLS) algorithm. We do this when fitting GLMMs. If a more general specification of the logprior density is desired then other methods will be needed.

Simple example doesn't work

I'm trying to get the simple examples to run.

julia> using MCMC
Loading RMW(scale) sampler
Loading MALA(driftStep, monitorPeriod) sampler
Loading HMC(innerSteps, stepSize) sampler

julia> mymodel = Model(v-> -dot(v,v), 3, ones(3))
Model(# function,[:beta=>PDims(1,(3,))],3,[1.0,1.0,1.0])

julia> res = mymodel * RWM(0.1) * (100:1000) 
ERROR: no method MCMCChain(Dict{Any,Any},Array{Float64,1},MCMCTask,Float64)
 in run at /home/simon/.julia/MCMC/src/runners/run.jl:17
in * at operators.jl:47

My build of Julia is as follows:

Version 0.2.0-prerelease+2820
"Commit 44c9887* 2013-07-26 06:32:24 UTC"
x86_64-linux-gnu

Add data field in likelihood model

This is a placeholder for now, will get back to it.

Loading messages are always required?

Following loading messages, in my humble opinion, are not very beneficial, especially when batch execution.

Loading MCMCLikModel model
Loading ARS(logCandidate, logCandidateScalingFactor, scale, tuner) sampler
Loading IMH(logCandidate, randCandidate!) sampler
Loading RWM(scale, tuner) sampler
Loading RAM(scale, rate) sampler
Loading MALA(driftStep, tuner) sampler
...

So I propose that this kind of loading messages should be printed to the screen only in the interactive session.
The isinteractive() function is helpful to judge whether the execution is interactive or not.

Moreover, this macro will scrap the boilerplate code:

macro loading(component)
    if isinteractive()
        quote
            println("Loading $(string($component)) $(super($component)):")
            for method in methods($component)
                print("  ")
                println(method)
            end
        end
    end
end

...

@loading HMC

This prints much more informative messages (though the file locations look pesky):

Loading HMC MCMCSampler:
  HMC(nLeaps::Int64,leapStep::Float64) at /Users/kenta/.julia/v0.3/MCMC/src/samplers/HMC.jl:78
  HMC(nLeaps::Int64,leapStep::Real,tuner::Union(Nothing,MCMCTuner)) at /Users/kenta/.julia/v0.3/MCMC/src/samplers/HMC.jl:71
  HMC() at /Users/kenta/.julia/v0.3/MCMC/src/samplers/HMC.jl:80
  HMC(tuner::Union(Nothing,MCMCTuner)) at /Users/kenta/.julia/v0.3/MCMC/src/samplers/HMC.jl:76
  HMC(nLeaps::Int64) at /Users/kenta/.julia/v0.3/MCMC/src/samplers/HMC.jl:77
  HMC(nLeaps::Int64,tuner::Union(Nothing,MCMCTuner)) at /Users/kenta/.julia/v0.3/MCMC/src/samplers/HMC.jl:77
  HMC(leapStep::Float64) at /Users/kenta/.julia/v0.3/MCMC/src/samplers/HMC.jl:79
  HMC(leapStep::Float64,tuner::Union(Nothing,MCMCTuner)) at /Users/kenta/.julia/v0.3/MCMC/src/samplers/HMC.jl:79

Enable optional use of autodiff when model is specified via a function or distribution

Automatic differentiation is currently used only when the model is specified via expression parsing. We should enable optional use of automatic differentiation when the model is specified explicitly via a function or distribution too.

Convergence Diagnostics

Would there be interest in some code for a couple of convergence diagnostics?

Integration between MCMC and DataFrames, StatsBase

The routines that are meant to compute the asymptotic variance of Monte Carlo chains are not functional anymore. This is because of recent relevant advances in DataFrames and StatsBase. The issue is that we store the MCMC samples in a dataframe in the MCMC package. We then look at these samples columnwise, therefore we work with DataArrays. The problem is that at the moment all the autocovariance and autocorrelation functions in StatsBase don't operate on DataArrays. How shall we bridge this gap between the JuliaStats packages to ensure portability and integration of code?

@fredo-dedup, @StefanKarpinski, @johnmyleswhite, @lindahua, @dmbates

Interest in a Julia <-> Stan interface

Andrew Gelman contacted me about the possibility of a Julia interface to Stan, similar to the rstan package for R. I'll just quote our exchange here. Let me know if it would be more appropriate to move this to the milestones issue under StatsBase or to one of the discussion groups.

On Mon, Apr 28, 2014 at 6:01 AM, Andrew Gelman [email protected] wrote:
Hi Doug. I hope all is well with you. Bob Carpenter was telling me that you’ve been active in the Stan list recently. I’ve also heard you’ve been working with Julia. We have some interest in putting together a Julia/Stan link, at the very least having a way to run Stan from Julia. One reason for this is that soon Stan should have the capability of quickly fitting a lot of models that we now do in R (lm, glm, lmer, etc) using point estimation of various sorts as well as full HMC. Do you have any suggestions/thoughts about how a Julia link could be built? I don’t think any of us in stan-core use Julia, but we’ve been having a lot of contributions from outsiders (for example, Allen Riddell who wrote PyStan and now is part of stan-core). Any advice would be appreciated. Thanks.

-- My reply

There is definitely interest in the Julia community in Stan. To get an idea of the current statistical capabilities in Julia take a look at the JuliaStats group on github.com. A more complete list of packages is at

http://julia.readthedocs.org/en/latest/packages/packagelist/

An interface between Julia and Stan is, in principle, no more complicated than rstan and could, in fact, be somewhat easier than rstan. A package implementing such an interface would typically be called something like JuliaStan (camel-case names for packages are preferred) and stored in a repository named JuliaStan.jl. There are sophisticated, although somewhat complicated, ways of building binary dependencies on various operating systems. I am currently struggling with implementing such dependencies for the Metis graph-partitioning package, now mostly complete.

Julia has very flexible facilities for calling C functions, see the section on "calling C and Fortran code" in http://docs.julialang.org/en/latest/ (note that the latest Julia release, 0.2, is badly out-of-date now relative to the 0.3-prerelease version and developers mostly work in 0.3). At present it is not easy to call C++ functions or methods directly because of name mangling but anything declared within

extern "C" {
...
}

is okay.

The package structure is reasonably easy to pick up for anyone familiar with github.

I can imagine several ways of proceeding.

Create a JuliaStan.jl repository under the stan-dev github group and have development header by the stan-dev group; This would involve some of the Stan developers learning enough Julia to contribute.
Create a separate JuliaStan group to bring together Julia and Stan developers
Create a repository under the JuliaStats group

It really depends on whether you want to emphasize the Julia aspects or the Stan aspects of the project.

If you wish, I could block out a JuliaStan.jl repository based on rstan under my github login and then we could determine where to transfer it.

May I forward this message or a summary of this message to the JuliaStats members to indicate that the Stan community is interested in Julia?

There is an MCMC.jl project that seeks to implement HMC and NUTS samplers in Julia and there is active development of forward and backward automatic differentiation. I don't know the developers working on that project well and cannot vouch for their capabilities.

Returning function values?

How can I retrieve the function values from an MCMC object, say, if I want to find out which sample point had the highest (posterior) density?

[PkgEval] MCMC may have a testing issue on Julia 0.3 (2014-10-29)

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.

On Julia 0.3

On 2014-10-28 the testing status was Tests pass.
On 2014-10-29 the testing status changed to Package doesn't load.

Tests pass. means that PackageEvaluator found the tests for your package, executed them, and they all passed.

Package doesn't load. means that PackageEvaluator did not find tests for your package. Additionally, trying to load your package with using failed.

This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.

Test log:

>>> 'Pkg.add("MCMC")' log
INFO: Cloning cache of MCMC from git://github.com/JuliaStats/MCMC.jl.git
INFO: Cloning cache of ReverseDiffSource from git://github.com/JuliaDiff/ReverseDiffSource.jl.git
INFO: Installing ArrayViews v0.4.8
INFO: Installing Distributions v0.5.5
INFO: Installing MCMC v0.0.9
INFO: Installing PDMats v0.2.4
INFO: Installing ReverseDiffSource v0.0.5
INFO: Installing StatsBase v0.6.9
INFO: Package database updated
INFO: METADATA is out-of-date a you may not have the latest version of MCMC
INFO: Use `Pkg.update()` to get the latest versions of your packages

>>> 'using MCMC' log
Julia Version 0.3.1
Commit c03f413* (2014-09-21 21:30 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

ERROR: invalid method definition: not a generic function
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at loading.jl:152
 in _require at loading.jl:67
 in require at loading.jl:51
 in include at ./boot.jl:245
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:285
 in _start at ./client.jl:354
 in _start_3B_1712 at /home/idunning/julia03/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.3/MCMC/src/jobs/jobs.jl, in expression starting on line 1
while loading /home/idunning/pkgtest/.julia/v0.3/MCMC/src/MCMC.jl, in expression starting on line 94
while loading /home/idunning/pkgtest/.julia/v0.3/MCMC/testusing.jl, in expression starting on line 2

>>> test log
no tests to run
>>> end of log

Model changes propositions

@scidom, here are a few ideas, tell me what you think before I proceed...

Simplify the constructors by removing the size parameter since it can be deduced form the initial values vector size (but still keep it in the type),
Add a check that the initial values are within the support of the function (i.e. not returning -Inf) in the inner constructor. The same test could then be removed from the samplers and the code would be simplified,
Add a helper function model() that would act as a single entry point for creating models, this would be more user friendly and less intimidating. The large set of model types that we have created (and will certainly continue to grow) would remain available for more advanced users.
Add a scale field to the model type, either a scalar or a vector (or left empty), to give a hint to the samplers as to how the parameters should be moved (I am thinking of RWM, but it would surely be useful elsewhere). This kind of information more logically belongs to the model type and we could still force another scaling at the sampler type creation if needed.

Duplication of Distributions

I'm a little worried by the duplication of so many distributions in this codebase. Julia should have exactly one implementation of all distributions. If something's wrong with how Distributions.jl is implemented, we should find a way to fix that.

Fields of MCMCChain

Apart from the samples field of the MCMCChain type, another field is needed, say gradient, which will hold the computed gradient at each point of the chain. The gradient is needed for post-processing the chain (for calculating control variates for instance). @fredo-dedup, would you agree that this extra field is the most plausible way of returning the gradient values to the user?

We may need to consider how to organize the fields of MCMCChain more generally - this relates to the seqMC work in progress too.

DataArray not defined in SerialMC.jl

DataArray isn't defined in SerialMC.jl ; a fix is to explicitly import it

import DataArrays.DataArray

and/or to add DataArrays to REQUIRE

General questions? Example: meaning of "accept" key in diagnostics.

Hi @scidom

I have a procedural question. Starting to work using the MCMC framework, occasionally I will have questions. Would you like to see those raised as issues here or in some other way? I do realize we are all busy.

As a concrete example, the "accept" key in diagnostics is ignored in the MCMC stats functions like mean(). This is ok for MH, but not for ARS (Acceptance Rejection Sampling).
The accept key right now does not indicate valid posterior samples. There are many ways around this, but just wanted to get your input from the procedural side.

pkgeval badge

@IainNZ I tried to update all the badges on README. travis and readthedocs are now fine, while the pkgeval badge doesn't seem to be ok yet (I have changed MCMC->Lora in the pkgeval URLs). No rush with this, it is not urgent, just brought it to your attention.

Job-Runner Algorithmic Design Frolics

This issue has been opened solely for keeping a record on how the MCJob and MCRunner types are interweaved and designed (in case these thoughts need to be reproduced).

MCJob subtypes hold a field of type MCRunner. The main concern has been whether it would be more sensible to define the field to be of type MCRunner or Vector{MCRunner}. The answer is that the field should be of type MCRunner.

To understand why, notice that no matter whether a serial, sequential or parallel tempering Monte Carlo runner is defined, it regulates (and in that sense characterizes) the whole MCJob. In other words, the runner dictates how to run the job. This make sense using plain English at a first verbose level, i.e. the runner runs the job. Moreover, it makes sense logically, meaning that an algorithm indicates how to run a specific job, and the name "runner" is attached to this algorithm.

The only vague point that needs to be clarified is that a runner may hold fields such as the burnin or total Monte Carlo length or the thinning interval (see coded runners). Now, all existing runners assume that these fields are unique, therefore scalar Float64 entities. For example, a sequential Monte Carlo runner runs several chains in parallel, but even then the runner presumes that the burnin, thinning and total length are shared between the parallel chains.

if a new sequential or tempering method is designed one day, it may be the case that a different burnin, thinning or total length will be designated to each of the involved chains. Then it would make sense to have a vector holding the burnin lengths of the chains, and similarly a vector for the thinning and a vector for the total length of the chains. Nevertheless, the method is unique, that is there is a unique algorithm (runner) that consumes the job. For this reason, the right design is to encapsulate the burnin, thinning and total lengths (vectors) in the runner instead of creating a vector of runners, each one holding the burnin, thinning and total length (scalars) of a single chain.

So, this is right:

type NewMCRunner <: MCRunner
  burnin::Vector{Float64}
  thinning::Vector{Float64}
  nsteps::Vector{Float64}
end

type NewMCJob <: MCJob
  runner::NewMCRunner
  ...
end

run(r::NewMCRunner, ...)

On the other hand, this is conceptually wrong:

type NewMCRunner <: MCRunner
  burnin::Float64
  thinning::Float64
  nsteps::Float64
end

type NewMCJob <: MCJob
  runner::Vector{NewMCRunner}
  ...
end

run(r::Vector{NewMCRunner}, ...)

Describe on arrays of samples?

E.g., after:

@time mcchain = run(m * [HMC(1, 0.025), NUTS()] * SerialMC(1000:10000))

describe(mcchain) would produce something like:

	HMC(1, 0.025)	NUTS()
Parameter 1
------------------
Min	-0.9911931944813289	-0.9784462069915573
Mean	-0.7659593771072393	-0.7652180273205432
Max	-0.5299229353887581	-0.5228078563712695
MC Error	0.0007079890147378	0.0006563835134460041
ESS	6561.211265417861	7905.80020498207
AC Time	1.3718503544370717	1.1385311754182401
-----------------	---------------------------------	----------------------------------
Parameter 2
------------------
Min	-1.3720937309757602	-1.3720937309757602
...	...	...
------------------	---------------------------------	-----------------------------------

If this is reasonable, I can take a stab at a first version. Clearly in this example I'm not showing aspects like thinning.

Moderator access to MCMC.jl

@johnmyleswhite do you manage the premissions for JuliaStats? I would like to kindly ask for re-granting me moderator access to MCMC.jl; I can't access the package's settings so as to enable the readthedocs webhook. Cheers.

BUG: Can't declare function with name of abstract type

In the most recent update we have declared abstract MCJob in input.jl and then in jobs.jl we try to define various methods named MCJob. This doesn't work.

Another thing to notice about the current state of jobs.jl is that the MCJob function is not type stable. depending on the j parameter it returns a PlainMCJob or a TaskMCJob.

[PackageEvaluator.jl] Your package MCMC may have a testing issue.

This issue is being filed by a script, but if you reply, I will see it.

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their test (if available) on both the stable version of Julia (0.2) and the nightly build of the unstable version (0.3).

The results of this script are used to generate a package listing enhanced with testing results.

The status of this package, MCMC, on...

Julia 0.2 is 'Tests fail, but package loads.'
Julia 0.3 is 'Tests pass.'

'No tests, but package loads.' can be due to their being no tests (you should write some if you can!) but can also be due to PackageEvaluator not being able to find your tests. Consider adding a test/runtests.jl file.

'Package doesn't load.' is the worst-case scenario. Sometimes this arises because your package doesn't have BinDeps support, or needs something that can't be installed with BinDeps. If this is the case for your package, please file an issue and an exception can be made so your package will not be tested.

This automatically filed issue is a one-off message. Starting soon, issues will only be filed when the testing status of your package changes in a negative direction (gets worse). If you'd like to opt-out of these status-change messages, reply to this message.

Roadmap for Model Parsing Upgrade

@fredo-dedup I was thinking that the next massive improvement for our package would be to generalize some aspects of model specification, namely:

allow partial updates of subsets of model parameters,
allow updates of correlated parameters,
enable hierarchical models.
this among else would tackle Gibbs sampling as a subcase, on which @spencerlyon2 has made some work.

I haven't touched model specification at all until today. Does the relevant code reside in src/parsers/modelparser.jl? Have you made any progress in any other branch or repo, to make sure we avoid duplication of work? I was thinking of using the Julia Graphs package for model parsing (still in an imperative fashion, I don't plan to go for declarative programming), what do you think, would that be a good direction?

Distributions not in REQUIRE file

Instaling the MCMC package and then loading it, gives a warning that Distributions is not found.

WARNING:

Distributions not found
at C:\Users\K.julia\MCMC\src\autodiff/distributions.jl:9
at C:\Users\K.julia\MCMC\src\MCMC.jl:33
at In[2]:1
in require at loading.jl:44
in include at boot.jl:238
in include_from_node1 at loading.jl:118
in include at boot.jl:238
in include_from_node1 at loading.jl:118
in reload_path at loading.jl:144

Code generated for a model is not always the same nor correct

I was trying to test a simple model using the MCMC package, but noticed that generated code differs between invocations of "model" and usually this code returns the wrong result. After some looking around and testing, I have figured out that the ReverseDiffSource package causes the problem, it seems to return the wrong value. I'm reporting the issue here because ReverseDiffSource has drastically changed since 0.0.4.

My example code is published as a gist: https://gist.github.com/tomhaber/691bd1a8b8672cdd894f
The lowered code clearly shows the problem, but is easily overlooked :)

Tom

Do we need the tuner field in NUTS?

@fredo-dedup, as I was writing a test file for the empirical tuner, I wondered if we need the tuner field in the NUTS sampler. I was thinking that if we choose an adaptive version of NUTS it will probably be standalone (in a similar way that HMCDA is separate from HMC). If you agree with this line of thinking and approve it, I will then remove tuner from NUTS.

tests fail after the last two commits

@fredo-dedup, do you know what causes the failure? See also papamarkou/ZVSimulator.jl#1

Write MCMCTask{R<:MCMCRunner}

In runners.jl there is a lot of manual type checking (isa(t.runner, SerialMC))

Would it be better to parameterize the MCMCTask type on the type of the runner so that it can be dispatched via julia's type checking rather than manually checking types?

Designing Factor Graph-Based Model

This is a set of notes towards designing model specification using factor graphs as part of the MCMC package.

To start with, here is a first thought on how to define a factor graph as a type. There are three main components tha fully specify a factor graph:

the set of variable nodes V, holding the local variables. The terminology comes from [Frey, 1997]. To comply with the Graphs.jl package, this would be termed also as the set of variable vertices.
the set of subset nodes S (also called the subset vertices), holding the associated local functions. in an MCMC context these functions would be the log-targets of the associated variable vertix blocks, specified as functions or as distributions. Apparently V and S are disjoint.
the set of edges E. Since a factor graph is a bipartite graph, there should be only edges connecting a vertex from S to a vertex from V, which would tell which variables from V the corresponding function from S updates. It should be relatively easy to code an algorithm is_bipartite() operating on existing graphs, since the Graphs.jl package supports depth-first as well as breadth-first traversal.

The Mamba.jl package does something similar by using stochastic, logical and input nodes. Using a factor graph, all these nodes would belong to V. The functions in S, which could be priors or likelihoods, would update the corresponding blocks of nodes.

This of course still needs to be pushed forward some more, but it is already rather clear that an imperative exploitation of factor graphs is not all that hard to achieve.

[PkgEval] MCMC may have a testing issue on Julia 0.4 (2014-10-08)

On Julia 0.4

On 2014-10-05 the testing status was Tests pass.
On 2014-10-08 the testing status changed to Tests fail, but package loads.

Tests pass. means that PackageEvaluator found the tests for your package, executed them, and they all passed.

Tests fail, but package loads. means that PackageEvaluator found the tests for your package, executed them, and they didn't pass. However, trying to load your package with using worked.

Special message from @IainNZ: This change may be due to breaking changes to Dict in JuliaLang/julia#8521, or the removal of deprecated syntax in JuliaLang/julia#8607.

Test log:

>>> 'Pkg.add("MCMC")' log
INFO: Installing ArrayViews v0.4.6
INFO: Installing Distributions v0.5.4
INFO: Installing MCMC v0.0.7
INFO: Installing PDMats v0.2.4
INFO: Installing ReverseDiffSource v0.0.4
INFO: Installing StatsBase v0.6.6
INFO: Package database updated
INFO: METADATA is out-of-date a you may not have the latest version of MCMC
INFO: Use `Pkg.update()` to get the latest versions of your packages

>>> 'using MCMC' log

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/scalarstats.jl:98.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/scalarstats.jl:122.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>Float64)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:162.
Use "Dict{T,Float64}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:192.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>W)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:193.
Use "Dict{T,W}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/misc.jl:66.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/misc.jl:77.
... truncated ...
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/HMC.jl:140.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/HMC.jl:146.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/HMCDA.jl:111.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/HMCDA.jl:116.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/NUTS.jl:176.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/SMMALA.jl:103.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/SMMALA.jl:108.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/PMALA.jl:120.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/PMALA.jl:126.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/RMHMC.jl:164.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/RMHMC.jl:168.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/runners/SerialMC.jl:41.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/runners/SeqMC.jl:108.
Use "Dict{Any,Any}(a=>b, ...)" instead.
Julia Version 0.4.0-dev+998
Commit e24fac0 (2014-10-07 22:02 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

>>> test log

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/scalarstats.jl:98.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/scalarstats.jl:122.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>Float64)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:162.
Use "Dict{T,Float64}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:192.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>W)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:193.
Use "Dict{T,W}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/misc.jl:66.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/misc.jl:77.
... truncated ...
WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/PMALA.jl:126.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/RMHMC.jl:164.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/samplers/RMHMC.jl:168.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/runners/SerialMC.jl:41.
Use "Dict{Any,Any}(a=>b, ...)" instead.

WARNING: deprecated syntax "{a=>b, ...}" at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/runners/SeqMC.jl:108.
Use "Dict{Any,Any}(a=>b, ...)" instead.
ERROR: `Dict{K,V}` has no method matching Dict{K,V}(::Array{Symbol,1}, ::Array{(Int64,(Int64,)),1})
 in MCMCLikelihoodModel at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/modellers/MCMCLikModel.jl:123
 in model at /home/idunning/pkgtest/.julia/v0.4/MCMC/src/modellers/models.jl:28
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in anonymous at no file:14
 in include at ./boot.jl:245
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:293
 in _start at ./client.jl:362
 in _start_3B_3789 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.4/MCMC/test/test_hmc.jl, in expression starting on line 23
while loading /home/idunning/pkgtest/.julia/v0.4/MCMC/test/runtests.jl, in expression starting on line 12
Running tests:
  * test_hmc.jl *
    Testing basic HMC constructors...
    Testing that HMC sampler can run...

INFO: Testing MCMC
================================[ ERROR: MCMC ]=================================

failed process: Process(`/home/idunning/julia04/usr/bin/julia /home/idunning/pkgtest/.julia/v0.4/MCMC/test/runtests.jl`, ProcessExited(1)) [1]

================================================================================
INFO: No packages to install, update or remove
ERROR: MCMC had test errors
 in error at error.jl:21
 in test at pkg/entry.jl:719
 in anonymous at pkg/dir.jl:28
 in cd at ./file.jl:20
 in cd at pkg/dir.jl:28
 in test at pkg.jl:68
 in process_options at ./client.jl:221
 in _start at ./client.jl:362
 in _start_3B_3789 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so


>>> end of log

Turn jobs to parametric types based on the choice of runner

Note to self: define jobs as parametric types, parameterized by the choice of runner. This is because some job fields (such as samplers and tasks) may need to be vectors depending on the runner type.

Question. Why not return a fresh task every run?

I am working on writing a generic Gibbs sampler that shares design principles with the code here. (the WIP is on the gibbs branch of my fork)

I was wondering why we don't always return a fresh task after running a chain? Specifically, I noticed that a task is stopped before being returned in parallel (run_serialmc_exit), but not if you run in serial. Unless I'm mistaken this is just a way to tell Julia that it is ok to serialize the task so it can be returned from another process.

When I wrote a simple random walk metropolis routine I found that it was easier to always spin a new task at the end of running a chain (in serial or in parallel) because that way the task is always ready to resume (again, in serial or in parallel). I found that I could get the task to resume at the same place without losing any samples simply by consuming the task one more time to find a new model.init value, spinning a new task with that updated model, then making sure that the first produce in the sampler was model.init.

Is there a reason we don't do that here?

e-meetup tomorrow?

Hi Julians,

As you may have seen, the package has gone through a (semi-complete) major upgrade. I am currently documenting the package so that users can catch up with the changes. This will take me a few more days, as I am writing up bits of doc sections daily whenever I can squeeze some time in my timetable.

In the meantime, we have been thinking how we could possibly amplify the range of models accommodated by the package. For this reason, I would like to invite those of you interested in MCMC to catch up on Google Hangout. @spencerlyon2 and I are going to have a chat tomorrow at 11:00am New York time (3:00pm London time). Please feel free to drop in if interested in participating. My hope is that we can make these e-meetings more regular, meeting once per forthnight or month to discuss progress. I will set up a new repo to upload minutes, possibly under JuliaStats.

I have made some thoughts on how factor graphs could be possibly used for MCMC model specification, see issue #72 . Nevertheless, tomorrow's chat will be focused mostly on Gibbs sampling. @spencerlyon2 will explain how he has coded Gibbs to fit the existing state of MCMC.jl. I think it would be good for us to understand how he set up the updating mechanism and get his code merged soon so that the relevant functionality is available. We can return to the more general issue of model specification later.

If you have any other MCMC issues that you would like to discuss in future e-meetings, you can mention them tomorrow.

@fredo-dedup @spencerlyon2 @goedman @johnmyleswhite @lindahua @dmbates @StefanKarpinski (please forward this to people I may have missed out)

[PkgEval] MCMC may have a testing issue on Julia 0.3 (2014-07-15)

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.2) and the nightly build of the unstable version (0.3). The results of this script are used to generate a package listing enhanced with testing results.

On Julia 0.3

On 2014-07-14 the testing status was Tests pass.
On 2014-07-15 the testing status changed to Package doesn't load.

Tests pass. means that PackageEvaluator found the tests for your package, executed them, and they all passed.

Package doesn't load. means that PackageEvaluator did not find tests for your package. Additionally, trying to load your package with using failed.

Test log:

INFO: Installing ArrayViews v0.4.6
INFO: Installing Distributions v0.5.2
INFO: Installing MCMC v0.0.6
INFO: Installing PDMats v0.2.1
INFO: Installing ReverseDiffSource v0.0.4
INFO: Installing StatsBase v0.5.3
INFO: Package database updated
ERROR: NumericExtensions not found
 in require at loading.jl:47
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at loading.jl:152
 in _require at loading.jl:67
 in require at loading.jl:54
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at loading.jl:152
 in _require at loading.jl:67
 in require at loading.jl:54
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at loading.jl:152
 in _require at loading.jl:67
 in require at loading.jl:51
 in include at ./boot.jl:245
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:285
 in _start at ./client.jl:354
while loading /home/idunning/pkgtest/.julia/v0.3/PDMats/src/PDMats.jl, in expression starting on line 9
while loading /home/idunning/pkgtest/.julia/v0.3/Distributions/src/Distributions.jl, in expression starting on line 4
while loading /home/idunning/pkgtest/.julia/v0.3/MCMC/src/MCMC.jl, in expression starting on line 9
while loading /home/idunning/pkgtest/.julia/v0.3/MCMC/testusing.jl, in expression starting on line 1
INFO: Package database updated

Use Pkg.dir instead of explicit file path in examples

In examples/probit_regression.jl, for example, it is better to use

vaso = readdlm(Pkg.dir("MCMC","examples","vaso.txt"), ' ')

than the current

vaso = readdlm("vaso.txt", ' ')

so that the script can be run from other directories.

I can submit a pull request if desired.

Add coverage and ZENODO DOI badges

In the near future, once code and documentation are satisfactorily settled. See Gadfly as an example of how the badges would look.

Split MCMCChain to MCMCFrame and MCMCVector

I haven't quite formulated in my mind how we would do this, but it would make sense to define two types related to MCMCChain, one for vectors and one for dataframes, following the paradigm of DataFrame and DataArrays being separate entities. When one works on summary stats on MCMC output it becomes obvious that some restructuring is needed towards this direction. This is open for discussion as to how it will be settled.

Making MCMC.jl more flexible and portable in terms of model specification

@LaurenceA, your suggestion in #34 to somehow allow use of MCMCChain and samples with external models is very useful. I want to work towards this direction and my first impression is that your suggestion is feasible and attainable. I would like to ask you for some input, from a user's perspective in order to see how exactly I am going to achieve the end goal.

Let's say that you have a matrix of Monte Carlo samples that you have somehow computed. This matrix will populate the MCMCChain.samples field. You may (or may not) have the relevant scores (grad of the log-likelihood for instance) stored in a separate matrix, which could populate the MCMCChain.gradients field. Assuming this situation, I would like to ask you the following:

you will also need to have a task defined, since this is one of the main components of the MCMC package. There is a reason of course for this, since tasks make it much easier to handle the simulations. This task needs in turn a triplet (model, sampler, runner) as its fields. The sampler and runner are easy to be instantiated using the existing samplers and runners. What you want is to define your own model rather than using the likelihood model of the MCMC package. Do I understand correct what you are trying to achieve?
If so, you would have to define this model as a subtype of MCMCModel or else we would have to simply change the MCMCTask.model field so as to be a subtype of Model instead of MCMCModel. The main question is what kind of statistical model you want to define. This is a more general question, since one may want to input here a model from GLM or from any other Julia package. @fredo-dedup, @johnmyleswhite, @dmbates, @andreasnoackjensen, @lindahua I am copying you in here so that you follow up the discussion (and so that you contribute if you have some ideas). If you can all give me an idea of possible models and situations that one may want to use with MCMC then I can start extending the codebase in MCMC to better work with external models.

Compliance of constructor and file names with Julia naming conventions

I am not sure what is the most Julia-compatible way of naming the various samplers, models and runners? I think that camel case is not desired among the Julians, is this correct? For example, we have the HMC type, which is probably fine to use capitals for it, but I guess the actual external constructors should be called hmc instead? Furthermore, is it better to call the sampler file HMC.jl or hmc.jl?

Syntax for tempering?

To calculate marginal likelihoods via power posteriors (Friel and Pettitt 2008; DOI: 10.1111/j.1467-9868.2007.00650.x) or stepping stone sampling (doi: 10.1093/sysbio/syq085), one needs to run MCMC, sampling from $p(y|\theta)^t p(\theta)$, where t is an inverse temperature, $p(y|\theta)$ is the likelihood, and $p(\theta)$ is the prior. However, one needs to store the likelihood in order to compute the marginal likelihood. What is the best syntax for a sampler that includes an inverse temperature, and a MCMCChain object that stores the likelihood (despite sampling from the power posterior)?

Rename HMC to LMC

I was thinking whether it would be better to rename the existing HMC sampler to LMC (standing for Lagrangian Monte Carlo). If I am right, the existing sampler relies on the velocities and Lagrangian, rather than the momentum and Hamiltonian. These two samplers are mathematically equivalent, yet it may be better to have two different implementations, called LMC and HMC respectively. If you are ok with this @fredo-dedup, I will rename the existing HMC to LMC and will port its Hamitonian-equivalent under then name HMC.

Truncated normal RWM and Metropolis-within-Gibbs?

@scidom and @fredo-dedup - Great job so far on MCMC.jl. I had already tweaked RWM.jl to accept a vector of scale parameters, but it looks as though you're already on top of this. A few requests (that I am happy to help with):

Using a truncated normal distribution rather than a normal in RWM.jl (with the appropriate Hastings ratio adjustment)
An integer random walk sampler (sampled from +/- 1:K), again with the choice of bounds.
Some means of collecting individual samplers together to perform Metropolis-within-Gibbs (or componentwise Metropolis, etc.), either with a fixed or random order.

Samplers need access to diagnostics

The way the samplers and runners are integrated makes it probably impossible for the samplers to access the diagnostics field of the MCMCChain. It is necessary for the samplers to access the diagnostics; for example, it is needed for calculating the acceptance rate for tuning. Do we have an easy way to change this or we need to drastically redesign the sampler-runner-modeller scheme?

AutoDiff to be made a separate package unifying all relevant code

We currently have the following 7 Julia sources for automatic and symbolic differentiation, with varying approaches to the matter:

The symbolic differentiation code of @johnmyleswhite in the Calculus package.
The symbolic differentiation code of @fredo-dedup in the MCMC package.
The automatic differentiation code of @bsxfan in ad4julia.
The SymPy package of @jverzani, which of course is a stand-alone package with different scope, but thought it would be good to mention it since it offers symbolic differentiation tools.
The code for dual numbers in the DualNumbers package.
The automatic differentiation code in AutoDiff. @tshort and @kmsquire contributed to this effort.
Some routines for elementary symbolic operations (such as replacement of symbols or expressions) in Symbolic.

It may be challenging to merge all this code because the above approaches to autodiff and symbolic differentiation vary. We may need to open a discussion as to how to achieve the unification.

autodiff for correlated model parameters

I opened this issue as a placeholder-reminder to add expression parsing and autodiff for correlated model parameters, i.e. for mutlivariate distributions with a covariance matrix that is neither isotropic nor diagonal. For example, we would want to draw Monte Carlo samples from a multivariate Normal:

modelxpr = quote
    v ~ MultivariateNormal(zeros(2), [1 0.3; 0.3 1])
end

Feature: adapt MCMC step size

One of the most important aspects of MCMC simulation is the adaptive (in the broad sense of the word) tuning of the Monte Carlo algorithm's parameters (ex MALA drift step or HMC leap step). Our MCMC code lacks this crucial feature so far. This could be done either on the basis of a user-provided function, which tunes the step size according to the current acceptance ratio, or in a more automated way in lieu of such user input. A first mplementation of this adaptation is available in the GeometricMCMC package (in src/setStep.jl), but I would like to discuss how this can be done more elegantly (and possibly more efficiently). I believe this is one of the most high-priority enhancements from the user's point of view.

Setting up a Travis test

I am going to write up a yml Travis test for our package by the end of the day. @fredo-dedup if you manage to get the autodiff related tests working by then, this would be great; if you need more time, this is ok, I will temporarily disable the autodiff testing until it runs successfully.

Prototyping checkpoint

Syntax :

Are Ranges for specifying #steps and burnin a useful shortcut ? If it doesn't feel natural to an average user that might be a bit too much and we could revert to the more standard ˋrun(tasks, steps=., burnin=.)ˋ
Same question for using keyword args for steps and burnin, we could revert to standard arguments as the function calls has few arguments
In the same vein, does using ˋ_ˋ to build MCMCTasks feels ok to you ? May be the pointwise multiply ˋ._ˋ would be more exact...

Use of Julia Tasks :

These are nice for storing an internal state and calling the sampler when needed. But the reset call looks like a hack and may bring issues in other contexts.
I am not sure, but I vaguely remember that Tasks cannot be used on multiple cores which would be a major showstopper. Should we be using remote calls ?

Looking forward :

We need to port the remaining samplers of GeometricMCMC.jl and SimpleMCMC.jl
We should provide basic analysis tools on the Chains : Effective sample size, a summary function (the R coda library would be the ultimate reference here)
The MCMCChain type should output a more user-friendly structure for the model parameters. A potential solution is to return DataFrames though we have to ponder if the added dependency and the user familiarity required with this kind of structure are worth it.
The autodiff should really be expanded to include censoring and truncation, basic functions such as if, for loops, etc.., and a larger number of available distributions.

Mixing and matching the functionality in MCMC.jl

I'm really impressed by the number of samplers and diagnostic techniques you have here, and it'd be awesome to incorporate some of that functionality into work that doesn't neatly fit into the MCMC.jl framework. Perhaps you might want to:

Use your Riemannian HMC sampler as a building block in a larger, custom, perhaps non-parametric model.
Use your diagnostic techniques on samples from a custom sampler.

Number 1 seems pretty straightforward, especially if you can thin out all but the last sample. But (at some point in the distant future) it might be nice to have functions that are optimized for giving you just one sample, rather than a full MCMCChain object.

Number 2 is a bit more difficult - I can't see how to construct an MCMCChain object with only samples from my custom sampler.

I'm particularly interested in this, because I'd like to incorporate some of your functionality in Church.jl, a probabilistic programming language that focuses more on non-parametrics. But I think there are lots of people in the ML community who would appreciate modular implementations of MCMC methods and diagnostics that they can use easily in their own samplers.

juliastats / klara.jl Goto Github PK

klara.jl's People

Contributors

Stargazers

Watchers

Forkers

klara.jl's Issues

On Julia 0.3

On Julia 0.4

On Julia 0.3

Recommend Projects

Recommend Topics

Recommend Org