probcomp / metaprob Goto Github PK

View Code? Open in Web Editor NEW

166.0 26.0 17.0 4.9 MB

An embedded language for probabilistic programming and meta-programming.

License: GNU General Public License v3.0

Clojure 20.71% Makefile 0.68% Shell 0.30% Gnuplot 0.11% JavaScript 77.68% Dockerfile 0.52%

probabilistic-programming data-science clojure machine-learning

metaprob's Introduction

Metaprob

A language for probabilistic programming and metaprogramming, embedded in Clojure.

Note: Metaprob is currently an unstable research prototype, with little documentation and low test coverage. Also, future versions may not be backwards compatible with this version. We do not recommend using it for any purpose other than basic research, and are not yet able to support users outside of the MIT Probabilistic Computing Project.

Key features

Models can be represented via generative code, i.e. ordinary code that makes stochastic choices
Models can also be represented via approximations, e.g. importance samplers with nontrivial weights
Custom inference algorithms can be written in user-space code, via reflective language constructs for:
- tracing program executions
- using partial traces to specify interventions and constraints
Generic inference algorithms are provided via user-space code in a standard library; adding new algorithms does not require modifying the language implementation
All Inference algorithms are ordinary generative code and can be traced and treated as models
New probability distributions and inference algorithms are first-class citizens that can be created dynamically during program execution

Motivations

Lightweight embeddings of probabilistic programming and inference metaprogramming
- Interactive, browser-based data analysis tools (via ClojureScript)
- Smart data pipelines suitable for enterprise deployment (via Clojure on the JVM)
“Small core” language potentially suitable for formal specification and verification
Teaching
- Undergraduates and graduate students interested in implementing their own minimal PPL
- Software engineers and data engineers interested in probabilistic modeling and inference
Research in artificial intelligence and cognitive science
- Combining symbolic and probabilistic reasoning, e.g. via integration with Clojure’s core.logic
- “Theory of mind” models, where an agent’s reasoning is modeled as an inference metaprogram acting on a generative model
- Reinforcement learning and other “nested” applications of modeling and approximate inference
- Causal reasoning, via a notion of interventions that extends Pearl's “do” operator
Research in probabilistic meta-programming, e.g. synthesis, reflection, runtime code generation

Modeling and tracing

Generative models are represented as ordinary functions that make stochastic choices.

;; Flip a fair coin n times
(def fair-coin-model
 (gen [n]
   (map (fn [i] (at i flip 0.5)) (range n))))
;; Flip a possibly weighted coin n times
(def biased-coin-model
 (gen [n]
   (let [p (at "p" uniform 0 1)]
     (map (fn [i] (at i flip p)) (range n)))))

Execution traces of models, which record the random choices they make, are first-class values that inference algorithms can manipulate.

We obtain scored traces using infer-and-score, which invokes a “tracing interpreter” that is itself a Metaprob program.

(infer-and-score :procedure fair-coin-model, :inputs [3])

Documentation

metaprob's People

Contributors

Stargazers

Watchers

Forkers

shafiahmed cswaroop-opensource magemasher cameronfreer abhi0syal vishalbelsare arronmabrey daveduthie nabacg ncharchut dvandra kalufinnle stjordanis reloj rupesh81

metaprob's Issues

Test of MH

VKM: "defining a test of MH to infer which proposals are probably accepted and rejected, and a figure that shows it"

Extend the Metaprob pretty printer to display the source code of generative procedures nicely

I think this just means: if the thing to be printed is a procedure, and is also a trace (via meta), and there is a "source" property, then translate the parse tree to clojure, and prettyprint the clojure code (using the clojure prettyprinter).

The tree-to-clojure translator (src/metaprob/to-clojure.clj) will probably have to be updated; it hasn't been used in a while.

Maybe do something smart with inf procedures as well.

Makefile fails in parse-all

When I try to build metaprob-clojure on my Mac (in ~/src/metaprob-clojure) I get this error:

$ make
bash bin/parse-all
find: ‘metacirc’: No such file or directory
make: *** [Makefile:3: all] Error 1

parse-all tries to cd to ../metaprob, so is the Makefile making an assumption about us also having some other metaprob repository checked out? If so, will you please add this to the installation instructions?

Clean up trace implementation

Fix synchronization logic (maybe change choice of clojure side effects operators/type)
Attempt simplifications that exploit new representation of mutable traces
Try to simplify trace.clj and impose better structure on it

metaprob.tutorial.jupyter cannot be loaded in some contexts

Description

src/metaprob/tutorial/jupyter.clj requires clojupyter.misc.display. Because clojupyter.misc.display is a namespace included by the lein-jupyter plugin it is only available in Clojure processes that were started via the plugin. This means that attempting to load the metaprob.tutorial.jupyter namespace from Clojure processes started in any other way (most commonly clj or lein repl) will fail.

There are two ways this could happen:

A user not familiar with this issue attempts to require the namespace manually.
A user attempts to refresh all the Clojure namespaces using a utility like clojure.tools.namespace.repl/refresh.

Potential solutions

Option A

Move src/metaprob/tutorial outside of src and add an entry like

:source-paths ["path/to/tutorial"]

to project.clj. This means that any .jar built with Leiningen will include the tutorial files but not , and as a result anyone using Metaprob from such a .jar will still be unable to run clojure.tools.namespace.repl/refresh. Probably what we would do in this case is either add another project.clj for build configuration or use a clj-based method for building such as juxt/pack.alpha.

Option B

Declare an explicit dependency lein-jupyter by adding

lein-jupyter {mvn/version "0.1.16"}

to :deps in deps.edn. This means that all artifacts built by either lein or clj / clojure will include the tutorial, lein-jupyter, and its dependencies. This may be desirable, but it will increase the size of the .jar file and does place a hard dependency on libraries that Metaprob does not need to function.

Reproduction steps

Attempt to load the metaprob.tutorial.jupyter namespace either explicitly

clj -e "(require 'metaprob.tutorial.jupyter)"

or via clojure.tools.namespace.repl/refresh:

clj -Srepro \
    -Sdeps "{:deps {org.clojure/tools.namespace {:mvn/version \"0.2.11\"}}}" \
    -e "(require '[clojure.tools.namespace.repl :refer [refresh]])
        (refresh)"

Expected results

nil

if the namespace was loaded via require.

:ok

if the namespace was loaded via clojure.tools.namespace.repl/refresh.

Actual results

:error-while-loading metaprob.tutorial.jupyter
#error {
 :cause "Could not locate clojupyter/misc/display__init.class or clojupyter/misc/display.clj on classpath."
 :via
 [{:type clojure.lang.Compiler$CompilerException
   :message "java.io.FileNotFoundException: Could not locate clojupyter/misc/display__init.class or clojupyter/misc/display.clj on classpath., compiling:(metaprob/tutorial/jupyter.clj:1:1)"
   :at [clojure.lang.Compiler load "Compiler.java" 7526]}
  {:type java.io.FileNotFoundException
   :message "Could not locate clojupyter/misc/display__init.class or clojupyter/misc/display.clj on classpath."
   :at [clojure.lang.RT load "RT.java" 463]}]
 :trace …}

Port to clojurescript

Talk to @zane about this?

Tail recursion

If we want to have a chance at tail recursion in metaprob, we’ll have to pass the accumulating score and output trace in as arguments, rather than relying on logic to combine scores and outputs post evaluation.

But using Clojure-on-Java is unsafe in any case, so there would be more to do in order to get tail recursion, beyond just this.

I don't think this is a priority right now.

Variational inference using reverse-mode AD

VKM: "variational inference on a toy Bayesian linear regression example, with a naive mean field [variational] approximation, using reverse-mode AD [on a subset of Metaprob]" (see https://github.com/probcomp/metaprob-clojure/milestones/5 and https://github.com/probcomp/metaprob-clojure/milestone/4 )

Output traces should be outputs, not inputs

Modify the interpreter so that each evaluation returns an output trace, and no output trace is passed in as an argument.

In the process, we should be able to get rid of 'locatives', which were kind of a kludge all along.

In conjunction with #20, this change should make the interpreter compositional, and more in line both with Ben and Eric's semantics work, and Marco's work in Gen.

Ensure proper function of infer-apply when interpreted by infer-apply

Perhaps it just works already, but it would be a good idea to test it, at least.

A good test would be to do the rejection sampling example, maybe just one sample if it's really slow.

Then some kind of sanity check, like eyeballing the resulting trace.

trace-copy should share substructure when possible

For trace-copy, now that it does a deep copy, it might be a good idea to get it to notice fully immutable substructures, and share them between the original and the copy. This would have two benefits:

Could save a lot of space, in some situations.
Could improve run time of recursions that are coded to do eq checks for identity, in cases when there are such identities to be detected.

Declaring probprogs to be opaque to prevent interpretation by any metacircular interpreter

If a metacircular interpreter interprets code that contains a call to metacircular interpreter #2 (a different one, or the same one), it will simply plow ahead an interpret the source code for MCI #2, creating three levels of interpretation (the JVM interpreting MCI #1 interpreting MIC #2).

I thought this was a problem from the biased coin example, but now I don't know. When, say trace_choices calls interpret, it simply calls it, it won't interpret it. interpret can be interpreted by the JVM, so that doesn't initiate another level of interpretation.

If this were a problem, then it could be fixed by introducing a source-code barrier, call it opaque, that would force JVM interpretation and inhibit source code interpretation, whenever there were a choice. opaque could be used either on the defining side (so all uses of the MCI would automatically drop to the JVM) or on the consuming side (so each use of the MCI could decide which kind of interpretation it would want).

The question isn't raised by the python version of metaprob because it always uses metacirc-stub.vnts, effectively defining all the MCI names as opaque. That is, the MCIs are never used for anything significant. In the clojure version, however, the MCIs (in their compiled-to-JVM versions) are doing serious work, because there is no equivalent of the python propose module.

opaque could have other uses, e.g. speeding up deterministic code by forcing to run under the JVM instead of under an MCI.

Not a critical issue, I think, but not totally sure.

Should we make master a protected branch?

GitHub has a feature called "protected branches". They describe it as follows:

Protected branches ensure that collaborators on your repository cannot make irrevocable changes to branches. Enabling protected branches also allows you to enable other optional checks and requirements, like required status checks and required reviews.

Repository owners and people with admin permissions for a repository can enforce certain workflows or requirements, before a collaborator can merge a branch in your repository by creating protected branch rules.

— About protected branches - User Documentation

Here is the full list of available branch protection settings.

Environments as traces?

VKM expressed a desire to have the trace as the only compound data structure, i.e. to implement everything else (lists, addresses, etc) as traces.

In python-metaprob, this is the case for lists and addresses, but it is not true of environments. Environments have a primitive implementation, the VentureEnvironment class.

This question affects how the primitives make_env, match_bind, and env_lookup are to be written in clojure.

The tradeoff is that if these environment are traces, rather than something more clojure-idiomatic, then the very inner loop of every metacircular interpreter becomes slower. (I have not measured the difference.) Also, as usual when exposing the implementation of an abstraction, reasoning and transparent changes on either side of the abstraction become difficult or impossible.

(Environments are provided as primitives (from python), and are used in the metacircular interpreters. The language has no environment reification or reflection capabilities, so environments as values have no bearing on how any native metaprob interpreter or compiler might implement its environments. For example, when translated to clojure, environments can be implemented as clojure environments, without breaking anything.)

Installation of Clojure command line tools should be documented

Per @jar398's comments in #53 we should document how to install the Clojure command line tools clj and clojure, probably by linking to the official Clojure website.

Make a figure summarizing (trace ...) syntax

Per VKM on 4/10.

Remove dependence on Java

https://twitter.com/bobfrankston/status/1027210690443010049?s=11

Looks like Oracle will be charging for updates going forward, which I assume includes security updates. This would mean the end of Java as a 'free as in beer' platform, and we are currently depending on that, assuming we want ordinary people to use metaprob eventually.

Maybe there will still be OpenJDK, but I don't know how solid that effort is.

Fortunately there are other Clojure implementations, right?

Remove 'locatives' from trace.clj

They are no longer needed, if PR #25 is merged.

N-ary procedures

Need to implement & in match-bind!

Switch to clojure-native syntax for variable binding

This issue follows on from #8 (comment) . This also relates to #52.

Currently clojure-metaprob uses syntax for variable binding inherited from python-metaprob, specifically block and define (well, define was written = in python-metaprob and block was written {....}, but you get the idea). In informal conversation it has been suggested quite vaguely and without analysis that Metaprob should use clojure syntax for binding, i.e. let and letfn, and add support for do which would just be block without the option of define.

In my opinion what is needed is a moderately detailed proposal for this: what changes would have to be done to the two implementations (syntax.clj and compositional.clj), exactly? How hard is it to implement them? What happens to the parse trees (traces, from-clojure)? What will their traces look like? What should we do about recursive functions, which are impossible to support properly in clojure (as traces, and without side effects)? Are we really ready to completely jettison python-clojure compatibility (to_clojure.clj), something we haven't completely done yet?

Then there are user-facing questions: Do these changes make the language harder for beginners to understand? Do they make the interpreter harder to read?

I don't recommend proceeding with implementation until we have answers for most of these questions.

Make interpreter source code more pedagogical

“Make the interpreter more pedagogical, with comments for each clause” (VKM 3/30)

I made a start on this. I should probably review and do some more. Then VKM should review.

Remove with-handler and &this from system

Rewrite the prelude and other modules, as needed, so that they do not use with-address or &this (I think this is mainly just the definition of map) - use inf instead
Remove with-address and &this from interpreter and documentation

Clarify current directory setting in jupyter setup

File src/metaprob/tutorial/README.md is unclear about what directory to do these steps in. Since 2, 3, and 4 all talk about the lein command, they really ought to happen in the same directory. And because of step 4, that directory has to be the root of the clone.

So, we need to say that before step 2.

It would be nice if the setup were compatible with the 'normal' metaprob install, which currently moves lein to bin/lein, but we could change that...

Generate illustrations for use in LaTex, via graphviz, SVG, or the like

A facility for generating graphviz illustrations of traces, similar to what python metaprob has.

Tests for 'make exa' and 'make histograms'

Create an examples/main_test.clj file that invokes the -main function.
Suppress a lot of the dribble that is currently generated. Maybe via a flag, since sometimes you do want to see the dribble.
Check to make sure that the .samples and .commands files get written
Maybe some kind of sanity check on the .samples files, like make sure they have enough rows, are not all zero, and so on.
Maybe do some stats on the samples, to make sure the results are generally sensible (there is code for computing averages and maxima there already I think).
Maybe generate the png files, to ensure that the gnuplot script can run to completion successfully?

use of `which` in Makefile does not work on Ubuntu

On ubuntu 16.04 which does not seem take -s as an argument.

$ which -s java
Illegal option -s
Usage: /usr/bin/which [-a] args

The relevant line in the makefile is here.

Controlling trace addresses, and finding locations in traces

These are two closely related issues so I will put them together. This is a design issue, not yet ready for implementation work.

Allow control over the structure of a trace, i.e. over the paths to the values in the trace.

The requirements for this feature come from three places: (a) map, (b) mem, (c) making user code that manipulates traces less sensitive to changes in the source code, (d) possible future type systems for traces.

The original design, from python-metaprob, was a feature called with-address (working in conjunction with this). with-address worked in the old side-effecty interpreter, but does not currently work in the compositional interpreter.

Make it easier for users to find things in traces.

Values might get different paths as the source code changes, but this would not matter so much if values were found (and changed) via some kind of search instead of by manually tracking down the path and putting the path in the code. Such a feature would be reminiscent of jquery in Javascript.

If this worked well enough, we might now have to worry as much about sub-issue 1 above.

One requirement comes from the tutorial, where @alex-lew has written the following:

(define [sex-adr height-adr salary-adr] (addresses-of tr))

The intent is clear - one wants to give descriptive names to paths, so that the places where they're used are easier to understand - but the execution is problematic, in that addresses-of doesn't provide any guarantee of the order in which the paths occur in the result list.

Defensive programming around lazy maps is unnecessary

Follow-up work from this comment on #46.

There are a few places, for example in state.clj, where we do some defensive programming around laziness in maps. e.g. for map m

(doseq [entry m] true)

This is unnecessary as the only place laziness shows up in Clojure is in lazy sequences. We should remove those guards.

(trace ...) should allow paths

E.g. (trace '("a" "b") 7 '("a" "c") 9)

This kind of thing comes up in @alex-lew 's tutorial. It was implemented in python-metaprob, but never implemented in clojure-metaprob, so I will flag this as a bug instead of an enhancement. It might be interesting to look at how it was implemented in python-metaprob, although the easy implementation is just to use trace-merge.

HMM forward filtering and backward sampling

VKM: "HMM forward filtering + backward sampling, on a real-world example" https://github.com/probcomp/metaprob-clojure/milestones/5

Rename primitives for clojure compatibility

Most primitives (see builtin.clj) got their names from the Python version of metaprob. I was hesitant to change them lest we lose python compatibility. But now, having dropped that requirement, we could rename them.

For example, the arithemetic primitives are add, sub, mul, div, le, gte and so on. They could be + - * / < >= and so on.

A few other clojure inconsistencies are: length/count, tuple/vector.

More generally, there has been no overall review of the language design now that we are dropping the Python compatibility requirement.

project.clj and deps.edn should be self-describing

Per @jar398's comments on #53, project.clj and deps.edn should be self-describing. Here's what he wrote:

The only change I would make (besides adding installation instructions for obtaining clj and clojure commands) would be to make the project.clj and deps.edn files more self-describing, and maybe point to one another. E.g. project.clj could say something to the effect "to add a new dependency please update deps.edn". I would try not to assume familiarity with the tools, where a simple comment or hyperlink could direct the reader to the right place.
— comment

Redesign trace address assignment and trace waiving

Work in progress. Current: with-address, &this, and opaque. Perhaps these could be cleaned up, generalized, and maybe combined.

Get rid of unnecessary side effects in interpreter (subscore)

The subscore is computed iteratively and is updated by side effect. The code could be written applicatively using define and tail recursion and/or appropriate clojure primitives, anticipating possible parallel evaluation in the future.

It's important to figure out how to do this cleanly, since the code is meant to be pedagogical (#5).

Make closures created by the interpreter be callable from clojure

((block (define [fun _ _]
          (infer :procedure (gen [x] (gen [y] (+ x y))) 
                 :inputs [3] 
                 :output-trace? true))
        fun)
 4)
ClassCastException clojure.lang.Atom cannot be cast to clojure.lang.IFn  metaprob.examples.all/eval9041 (form-init4440039299882098981.clj:113)

Here the inner gen procedure is created by the metacircular interpreter. Currently such procedures are not callable from clojure; formerly this was impossible because there was no way to maneuver the lexical environment into the clojure copy of the code. But now we have the environment and it should be possible to make this work.

What's going on with metaprob.examples.all-test?

Right now all it contains is:

(defn foo [] 'hello)

I'm guessing that the idea was that it would at some point replace metaprob.examples.main?

Set up continuous integration

The only CI system I know about is travis-ci. Perhaps there are others. Here's what would be needed for travis:

Agree to the travis-ci.org terms of service. This would be a contract between travis-ci and MIT, and I don't have authority to agree to these on behalf of MIT. So we'd have to find an authorized officer of MIT to do this, or have someone authorize me.
Make the repository public, or else pay money for the service. I don't know whether VKM is willing to do this.
Make the repository contents "open source" whatever that means.

Consider organizing trace primitives into parallel mutable/immutable suites

Introduce separate functional/side-effecty operators e.g. trace-set vs. trace-set!

trace-set! could be the one that mutates a trace, while trace-set is functional (creates a new trace with added or changed value).
trace-set-subtrace and trace-clear would have similar partners.
This would follow clojure’s naming convention for ‘transient collections’
Should pair, range, map, concat, etc. return immutable lists? Or should they have parallel immutable variants?
I’ve already changed range to always return an immutable list; considering doing so for map and some others.
Clojure sequences are immutable, and Metaprob immutable sequence-looking traces are represented as clojure sequences. Therefore, increased use of immutability improves both clojure compatibility and performance.

Need design & then review.

Alternative design: all basic traces are mutable, but we also allow the use of clojure references (Vars or whatever) as traces. So we have one suite of trace primitives, but they also work on references. (I am waving my hands. The devil is in the details.)

Alternative design: all traces are mutable.

Implement mem

Reorganize source file tree

Goal: make the structure immediately clear(er) to new people arriving at the source code.

Put all clojure (non-metaprob) source (trace, sequence, builtin_impl) in one subdirectory of src/metaprob, say kernel or core
- Rename builtin_impl.clj and/or split it up into smaller files
- Not sure where syntax.clj goes, kernel I guess?
Put all metaprob source (prelude, infer, distributions) in another subdirectory, say metaprob or prelude, or maybe leave it where it is
Rename builtin.clj to kernel.clj maybe, leave it in the src/metaprob directory?

Need explanation of when values are stored in trace, and justification for policy

For (block 1 2) there is no output trace.

If (define f (gen [x y] 7)), then for (f 1 2) there is no output trace.

For (+ 1 2) there is an output trace that tells you that a function called + returned the value 3; but there are no subtraces for 1 or 2.

For (+ (uniform 0 1) 2), the trace says uniform returned whatever (say u), and + returned u+2. This means that in MH both of these calls are possible modulation points - meaning that we might have uniform returning u but (+ (uniform 0 1) 2) return v+2 where v not= u. This seems weird: the program is explicitly specifying a constraint (the value of the outer expression is 2 more than the value of the inner expression), but the program execution is failing to obey it.

It would not be too hard to document the rule - which is pretty simple, only the values returned by primitive procedure calls get recorded (also inf procedures can do anything they like including record a value) - but an explanation of what the interpreter does is not going to be very satisfying without some rationale for the policy. I don't understand that rationale.

In interaction guide, (run-tests 'metaprob.trace-test) fails

Need a require, or a refresh, or something in order to ensure that the test module loads before attempting to run tests. Fix interaction.md. The file examples.md is in better shape and it's probably just a matter of copying the correct incantations from examples.md to interaction.md. (Another option is to reorganize.)

Prettyprinter should be smarter with 'locatives'

pprint is printing {:value Atomxxx ...} or something similar for the output trace for the coin flips example. Should give useful output instead.

Best way to fix: make locatives go away completely (#25).

Cache parse tree traces in some top-level structure

This is an efficiency concern only, I think:

Currently the traces that play the role of parse trees for the purpose of the meta-circular interpreter are computed every time the corresponding gen-expression is evaluated. For example:

(define fun (gen [x] (gen [y] (add x y)))
(fun 3)
(fun 17)

Each call to fun creates a new copy of the parse tree.

Now that we have immutable traces, we can make parse trees immutable, which allows them to occur as constant (quoted) structure in the clojure code (I think?). I think we just need to move the parse tree generation from run time to compile time.

Public release

VKM says: “Public release with a README file that explains the examples and core language briefly” (after examples are debugged) (not a tutorial)

I think the requirements and process need to be spelled out in a bit more detail.

Better trace datatype documentation

Make nature of traces clearer (optional value + optional keyed subtraces)
Provide examples for trace-get, trace-subtrace, and other primitives

Thanks @Schaechtle

Semantics of define differs between the clojure macros and the metacircular interpreter

Here is a bit of technical debt to pay off some time. I don't think this is likely to bite anyone soon, but eventually it will, so I wanted to record the issue.

The issue is that the clojure-metaprob block macro (with its define keyword) only approximates what python-metaprob did, using the primitives available in clojure (let and letfn). The python-metaprob equivalent was internal definitions written using =, which augments the lexical environment by mutation. But the meta-circular interpreter is faithful to python-metaprob. This leads to differences in behavior, for example:

> (define loser 
    (gen []
      (define bar 13)
      (define foo (gen [] bar))
      (define bar 17)
      (foo)))
#'metaprob.examples.all/loser
> (loser)
13
> (infer :procedure loser :output-trace? true)
[17 {} 0]

The block/define macro is pretty hairy, and creates its illusion of environment modification by segmenting the sequence of definitions into let bindings and letfn bindings - very different from creating an environment up front and mutating it with each succeeding define.

I'm not sure what to recommend regarding a fix. Implementing the python-metaprob semantics in clojure would be extremely painful and would create implementation and documentation complexity far in excess of its utility. But implementing the block macro's semantics in the metacircular interpreter will also be difficult (albeit much less so). A third alternative would be to abandon block/define in favor of clojure let and letfn, with changes to the meta-circular interpreter to implement the clojure syntax.