Code Monkey home page Code Monkey logo

pdsampler.jl's Introduction

PDSampler

Unix/OSX Windows CodeCov Docs License
Travis AppVeyor CodeCov Latest License: MIT

PDSampler.jl is a package designed to provide an efficient, flexible, and expandable framework for samplers based on Piecewise Deterministic Markov Processes and their applications. This includes the Bouncy Particle Sampler and the Zig-Zag Sampler.

Please refer to the documentation for information on how to use/expand this package. The project is hosted by the Alan Turing Institute (ATI). If you encounter problems, please open an issue on Github. If you have comments or wish to collaborate, please send an email to tlienart > cpg σ gmail > com.

If you find this toolbox useful please star the repo. If you use it in your work, please cite this code and send us an email so that we can cite your work here.

If you want to make suggestions, if you want new features, please don't hesitate, open an issue or send an email.

Contributors

Installation and requirements

(This is explained in more details in the documentation)

Requirements:

In the Julia REPL:

] add PDSampler
using PDSampler

Note that loading the package may take several seconds as some of the dependencies (in particular ApproxFun.jl are quite slow to load).

References

Note: if your paper is not listed here and you feel like it should, please open an issue (same goes if there is a mistake or if a preprint is now a proper-print).

pdsampler.jl's People

Contributors

martintoreilly avatar sjvollmer avatar tlienart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pdsampler.jl's Issues

Hide functions that do not strictly need to be exposed

So far, every single function is exported through some export statements. This is not useful.

  • review the functions that are likely to be used externally and export them
  • remove from exports the functions that are not likely to be used externally, fix the testing code accordingly adding a PDMP. in front of the relevant function call.

loading time

due to dependencies, loading time is quite slow (several seconds). Would be good to check how to strip down dependencies so that only the strict minimum is loaded in order to speed up loading.

Julia 0.6.3 & 0.7-alpha compat

Julia 0.6.3 has just been released

PDSampler still passes tests as expected. Though when testing I had to

Pkg.build("FFTW")
Pkg.build("Distributions")
Pkg.build("PDMats")
Pkg.build("PDSampler")

This had likely nothing to do with 0.6.3 but rather with the fact that some packages had been updated and needed to be re-built.


Julia 0.7-alpha has just been released

Consequently this package will be updated (slowly). Progress updated here.

using Pkg
Pkg.add("PDSampler")
Pkg.test("PDSampler")

unsurprisingly throws quite a few errors:

   Testing PDSampler
ERROR: LoadError: ArgumentError: Module DiffResults not found in current path.
Run `Pkg.add("DiffResults")` to install the DiffResults package.
Stacktrace:
 [1] require(::Module, ::Symbol) at ./loading.jl:868
 [2] include at ./boot.jl:314 [inlined]
 [3] include_relative(::Module, ::String) at ./loading.jl:1071
 [4] include(::Module, ::String) at ./sysimg.jl:29
 [5] top-level scope
 [6] eval at ./boot.jl:316 [inlined]
 [7] eval(::Expr) at ./client.jl:394
 [8] macro expansion at ./none:3 [inlined]
 [9] top-level scope at ./<missing>:0
in expression starting at /Users/tlienart/.julia/packages/Klara/iHrv/src/Klara.jl:5
ERROR: LoadError: Failed to precompile Klara to /Users/tlienart/.julia/compiled/v0.7/Klara/LH3t.ji.
Stacktrace:
 [1] error at ./error.jl:33 [inlined]
 [2] compilecache(::Base.PkgId) at ./loading.jl:1207
 [3] _require(::Base.PkgId) at ./loading.jl:978
 [4] require(::Base.PkgId) at ./loading.jl:878
 [5] require(::Module, ::Symbol) at ./loading.jl:873
 [6] include at ./boot.jl:314 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1071
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] top-level scope
 [10] eval at ./boot.jl:316 [inlined]
 [11] eval(::Expr) at ./client.jl:394
 [12] macro expansion at ./none:3 [inlined]
 [13] top-level scope at ./<missing>:0
in expression starting at /Users/tlienart/.julia/packages/PDSampler/qauV/src/PDSampler.jl:6
ERROR: LoadError: Failed to precompile PDSampler to /Users/tlienart/.julia/compiled/v0.7/PDSampler/nwMk.ji.
Stacktrace:
 [1] error at ./error.jl:33 [inlined]
 [2] compilecache(::Base.PkgId) at ./loading.jl:1207
 [3] _require(::Base.PkgId) at ./loading.jl:1007
 [4] require(::Base.PkgId) at ./loading.jl:878
 [5] require(::Module, ::Symbol) at ./loading.jl:873
 [6] include at ./boot.jl:314 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1071
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] include(::String) at ./client.jl:393
 [10] top-level scope
in expression starting at /Users/tlienart/.julia/packages/PDSampler/qauV/test/runtests.jl:1
ERROR: Package PDSampler errored during testing
Stacktrace:
 [1] #test#57(::Bool, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/Pkg/src/Types.jl:359
 [2] #test at ./<missing>:0 [inlined]
 [3] #test#35(::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:216
 [4] test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:205 [inlined]
 [5] #test#34 at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:202 [inlined]
 [6] test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:202 [inlined]
 [7] #test#33 at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:201 [inlined]
 [8] test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:201 [inlined]
 [9] #test#32 at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:200 [inlined]
 [10] test(::String) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:200
 [11] top-level scope
  • I don't know why Klara is still in there somewhere given that I thought I had explicitly removed all references to it
  • DiffResults will likely need to be added to REQUIRE

registering package

It would be nice to register this package on METADATA.jl.

After several failed attempts, I'm giving up but if someone more savvy than I offers to help I'll be grateful. One difficulty is that the name of the package changed (PDMP->PDSampler). No idea how to go around this cleanly.

GBPS for local BPS

It should work just the same

  • Add the possibility for lambdaref=0.0 in the local simulate
  • Use the gbps kernel with standard specular reflection at the boundary
  • Check that it works with the LBPS example.

add possibility to index factors by key value

[migrated from private repo]

At the moment factors are indexed from 1 to K where K is the number of factor.

In some cases, there may be another indexing which the user may want to use. For example if the factors correspond to entries of a matrix, it's more convenient to refer to a factor using a tuple (i,j).

In order to do that the only thing we care about is to have a mapping from key to index. So I don't think it's necessary to modify Factor

  • In the definition of the FactorGraph there should be the possibility for the user to pass an array of keys
  • that should build a mapping which should then be used when required

I believe this should be rather easy to do although it's late and I need to think a bit more carefully about it.

Use Compat library to help with forward looking language

Language breaking changes: https://github.com/JuliaLang/julia/blob/master/NEWS.md

A few things have already been sorted like abstract Blah to @compat abstract type Blah end etc. A few more things need to be fixed in order for PDMP to be callable by 0.6 and 0.7.

This issue to list what's currently not sorted. (i.e. when include("PDMP.jl") in more advanced Julia fails)

Current error to investigate (with 0.6)

ERROR: LoadError: LoadError: ArgumentError: invalid type for argument pq in method definition for ls_firstbranch! at /Users/tlienart/.julia/v0.5/PDMP/src/local/simulate.jl:157
Stacktrace:
 [1] include_from_node1(::String) at ./loading.jl:539
 [2] include(::String) at ./sysimg.jl:14
 [3] include_from_node1(::String) at ./loading.jl:539
 [4] include(::String) at ./sysimg.jl:14
while loading /Users/tlienart/.julia/v0.5/PDMP/src/local/simulate.jl, in expression starting on line 517
while loading /Users/tlienart/.julia/v0.5/PDMP/src/PDMP.jl, in expression starting on line 46

make all arguments of Simulation and LocalSimulation explicit

At the moment some of the arguments are named and some are not. I feel a user is bound to forget the order of the arguments and would end up having to copy paste etc (this may happen anyway).

It may be a good idea to refactor into something like

Simulation(dict)

where the dict looks then like

simparams = Dict("x0"=>x0, "v0"=>v0)

etc.

The question is then whether it makes sense to have a full immutable type for the simulation and whether it would not be better to just have a dictionary.

test for node with linear thinning

In the helper function ls_updatepq!, those lines:

bounce = fg.factors[fidx].nextevent(vcxf, vcvf)
acc    = bounce.dobounce(g, vcvf)
while !acc
    bounce = fg.factors[fidx].nextevent(vcxf, vcvf)
    acc    = bounce.dobounce(g, vcvf)
end
tauf = bounce.tau

the while !acc is for the case of thinning (corresponds to the accept/reject step in thinning). This is valid of course (same code as BPS) but so far the test case in LBPS has been a multivariate gaussian for which acc==true. For the sake of testing, one should

  • define a chain with simple factors
  • factors require thinning for sampling
  • test the helper function with that.

Testing script Azure

Executing machine Mi needs to have

  • julia 0.5
  • PDMP + generic script generalchild.jl
  • specific script child.jl
  • data should be data/ratings.csv file

the data can be obtained via the file ratings.dat from https://grouplens.org/datasets/movielens/1m/ it's got a crappy format so before putting that on machine it should be pre-processed as:

sed s/::/,/g ratings.dat > ratings.csv

On the machine Mi, call julia -e "child.jl", recuperate generated data child_(hash).jld + STDERR and STDOUT (if possible)

The format of a child script is (https://github.com/tlienart/pmf/blob/master/child.jl)

CHILDNAME  = "A"
LATENT_D   = 30
SIGMA_U    = 10.0
SIGMA_V    = 20.0
SIGMA_R    =  1.0
LAMBDAREF  =   .01
MAXNEVENTS = 10
MAXT       = Inf

include("generalchild.jl")

the first 8 lines can trivially be generated from the python mother script. Ideally we would want ranges for one or two of those and that would form one experiment. So the mother script would have

SIGMA_U = [10.0, 20.0]
SIGMA_V = [10.0, 20.0]

which would generate four children script (cartesian product), all other parameters should be fixed and that should correspond to four machines.

The code for the general script generalchild is on https://github.com/tlienart/pmf/blob/master/generalchild.jl

@martintoreilly if you could adapt your mother script so that it can send to an executing machines the data, what is needed so that this call julia -e "child.jl"works, it would be fantastic, thanks!

add shortcut for spherical gaussians

At the moment we can't build an isotropic gaussian with just a float as covariance matrix, it has to be a vector of float so s*ones(d) is ok but not s.

Easy fix, would be nice to have.

Naming?

Hi all,

As a new Turing fellow, I was looking at the ATI GitHub site, and saw you had a Julia package on inference in PDMPs. I'm glad to see there's interest in using Julia there; we've had a package for simulating PDMPs on Github for a while: https://github.com/sdwfrost/PDMP.jl; this is mainly to employ a simple change of variable approach as an alternative to thinning, so there isn't any overlap with our package and yours, but it would probably be good to talk about name clashing; we were just about to submit to METADATA, but have held off for now.

Add switch for multithreading and test resulting performances

When the package is installed on a new machine, the user should do something like

export JULIA_NUM_THREADS=$(julia -e "println(Sys.CPU_CORES)")

which sets the number of usable threads to the number of CPU Cores as detected by Julia (e.g.: 4 on my machine).

Experiment

On a reasonably complex examples, one should try with this environment variable set to 1 or to the number of CPUS and see whether that is a positive change or not.

discuss stopping criterion

Proper stopping criterion (with theoretical guarantees) is when the clock in BPS or LBPS goes over a pre-specified (large) time T such as 1e8.

At the moment we're also stopping

  • when the number of events generated hits a pre-specified number (in LBPS)
  • when the number of gradient evaluation hits a pre-specified number (in BPS)

Both potentially add bias of some form and even though they make sense and the possibility should probably stay there, the user should be warned that the suggested choice is a maximum time.

add quadrature for path and alleventlist

[migrated from private repo]

  • For polynomials, we can do exact quadrature which can be directly implemented along a piecewise linear path
  • For general functions, we can use a simple GL or GK quadrature with a number of points depending on the size of the interval (proportional) with a number of function evaluation as a budget
  • If feeling fancy, we could do adaptive quadrature but it's kind of overkill

Note that this is all well and easy for reduced dimensionality, as usual.

Problem with referencing of pages in documentation

This part in particular:

### Examples

The following examples will introduce you to the functionalities of the package.
The code to all examples can be found in the [examples directory](https://github.com/alan-turing-institute/PDMP.jl/tree/master/example).

```@contents
Pages = ["examples/bps_mvg_constr.md", "examples/lbps_gchain.md"]
Depth = 2
x```

(x is not present, it's just to escape the triple quote). This does not work.

It's converted to

<p>The following examples will introduce you to the functionalities of the package.
The code to all examples can be found in the <a href="https://github.com/alan-turing-institute/PDMP.jl/tree/master/example">examples directory</a>.</p>
<p>```@contents
Pages = ["examples/bps_mvg_constr.md", "examples/lbps_gchain.md"]
Depth = 2</p>
<pre class="codehilite"><code>### Code documentation

however, locally, the docs/build/index.md contains

- [Truncated Multivariate Gaussian](examples/bps_mvg_constr.md#Truncated-Multivariate-Gaussian-1)
- [Chain of Multivariate Gaussian](examples/lbps_gchain.md#Chain-of-Multivariate-Gaussian-1)

which should be appropriately converted.

@martintoreilly could you look into this?

Add cloud parallelisation capability

Description

Create a script to deploy a set of simulations to Azure. Other cloud providers are not in scope for this issue.

Notes

  • Should provide a single parent script that spawns child processes on either the local machine or Azure.
  • Look at Azure Batch
  • Use the Microsoft Data Science VM
  • Should we use a Jupyter Notebook or a plain Julia script? @tlienart What are your thoughts on this?
  • How to handle authentication while ensuring that the risk of credentials being committed to Github is very low?
    • Look at using the Python Azure Active Directory Authentication Library, which can handle 2FA.
    • Alternatively look at the Python Keychain library, which supports OSX, Windows and some Linux local credential stores.
    • In either case, can use PyCall to call Python from Julia.

investigate multiple core usage for the local BPS

In Julia:

Sys.CPU_CORES

may return something like 4 (on my machine). In which case julia can be run with julia -p 4.

In theory when one factor is updated, all adjacent factors should be updated in that their respective bounce time should be updated. All of those updates are independent and so we should be able to use a pmap to do it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.