Code Monkey home page Code Monkey logo

ocamlzar's Introduction

Zar for OCaml: formally verified sampling from discrete probability distributions.

See the related paper (appeared at PLDI'23), Zar Coq development, blog post and opam package.

Why use Zar?

Probabilistic Choice

A basic operation in randomized algorithms is probabilistic choice: for some p ∈ [0,1], execute action a1 with probability p or action a2 with probability 1-p (i.e., flip a biased coin to determine the path of execution). A common method for performing probabilistic choice is as follows:

if Random.float 1.0 < p then a1 else a2

where p is a float in the range [0,1] and Random.float 1.0 draws a uniform random float from the range [0,1). While good enough for many applications, this approach is not always correct due to floating point rounding error. We can only expect a1 to be executed with probability p + ϵ for some small error term ϵ, which technically invalidates any correctness guarantees of our overall system that depend on the correctness of its probabilistic choices.

Zar provides an alternative that is formally proved (in Coq) to execute a1 with probability p (where num and denom are integers such that p = num / denom):

let coin = Zar.coin num denom in (* Build coin sampler. *)
if coin#gen () then a1 else a2

The expression Zar.coin num denom builds a sampler object that flips a coin with bias p = num / denom. Internally, the coin is constructed as a stream transformer of type bool Seq.t -> bool Seq.t (see OCaml's lazy sequence library) that transforms an input source of fair coin flips (i.e., uniformly distributed random bits) into an output stream of biased coin flips. The coin transformer is applied to a default source of random bits based on the OCaml Random module, and then wrapped in a stateful sampler object that provides a simplified interface for consuming elements from the stream. The following code is equivalent:

let bit_stream = Seq.forever Random.bool |> Seq.memoize in
let coin_stream = bit_stream |> Zar.coin_transformer num denom in
let coin = new Zar.sampler coin_stream in
if coin#gen () then a1 else a2

You're free to supply your own stream of random bits instead, but remember that the coin will have the correct output distribution only when the input stream is uniformly distributed. We also recommend ensuring that the input stream is persistent (see the Seq module documentation for discussion of persistent vs. ephemeral sequences).

Uniform Sampling

Another common operation is to randomly draw from a finite collection of values with equal (uniform) probability of each. An old trick for drawing an integer uniformly from the range [0, n) is to generate a random integer from [0, RAND_MAX] and take the modulus wrt. n:

k = rand() % n // Assign to k a random integer from [0,n).
// Do something with k.

but this method suffers from modulo bias when n is not a power of 2, causing some values to occur with higher probability than others (see, e.g., this article for more information on modulo bias). Zar provides a uniform sampler that is guaranteed for any integer 0 < n to generate samples from the range [0,n) with probability 1/n each:

let die = Zar.die n in
let k = die#gen () in (* Draw k uniformly from [0,n). *)
(* Do something with k. *)

Although the OCaml function Random.int is ostensibly free from modulo bias, our implementation guarantees it by a formal proof of correctness in Coq.

Finite Distributions

The coin and die samplers are special cases of a more general construction for finite probability distributions that we provide here. Given a list of nonnegative integer weights weights such that 0 < weightsᵢ for some i (at least one of the weights is nonzero), we can draw an integer k from the range [0, |weights|) with probability weightsₖ / ∑ⱼweightsⱼ (the corresponding weight of k normalized by the sum of all weights):

let findist = Zar.findist weights in
let k = findist#gen () in
(* Do something with k. *)

For example, Zar.findist [1; 3; 2] builds a sampler that draws integers from the set {0, 1, 2} with Pr(0) = 1/6, Pr(1) = 3/6 = 1/2, and Pr(2) = 2/6 = 1/3.

Trusted Computing Base

The samplers provided by Zar have been implemented and verified in Coq and extracted to OCaml for execution. Validity of the correctness proofs is thus dependent on the correctness of Coq's extraction mechanism, the OCaml compiler and runtime, and a small amount of OCaml shim code (viewable here and thoroughly tested with QCheck here),

Proofs of Correctness

The samplers are implemented as choice-fix (CF) trees (an intermediate representation used in the Zar compiler) and compiled to interaction trees that implement them via reduction to sequences of fair coin flips. See Section 3 of the paper for details and the file ocamlzar.v for their implementations and proofs of correctness.

Correctness is two-fold. For biased coin with bias p, we prove:

  • coin_itree_correct: the probability of producing true according to the formal probabilistic semantics of the constructed interaction tree is equal to p, and

  • coin_true_converges: when the source of random bits is uniformly distributed, the proportion of true samples generated by the coin converges to p as the number of samples goes to +∞.

The equidistribution result is dependent on uniform distribution of the Boolean values generated by OCaml's Random.bool function. See the paper for a more detailed explanation.

Similarly, Theorem die_itree_correct proves semantic correctness of the n-sided die, and Corollary die_eq_n_converges that for any m < n the proportion of samples equal to m converges to 1 / n.

Theorem findist_itree_correct proves semantic correctness of findist samplers, and Corollary findist_eq_n_converges that for any weight vector weights and integer 0 <= i < |weights|, the proportion of samples equal to i converges to weightsᵢ / ∑ⱼweightsⱼ.

Usage

See zar.mli for the top-level interface.

Zar.bits () produces a stream of uniformly distributed random bits.

Zar.self_init () initializes the PRNG for Zar.bits (currently just calls Random.self_init).

Zar.init n initializes the PRNG for Zar.bits with a given seed.

Biased Coin

Zar.coin_transformer num denom builds a stream transformer that when applied to a stream of uniformly distributed random bits generates bool samples with Pr(True) = num/denom. Requires 0 <= num < denom and 0 < denom.

Zar.coin_stream num denom composes Zar.coin_transformer num denom with the default source of uniformly distributed random bits.

Zar.coin num denom builds a sampler object over the stream produced by Zar.coin_stream num denom.

N-sided Die

Zar.die_transformer n builds a stream transformer that when applied to a stream of uniformly distributed random bits generates int samples with Pr(m) = 1/n for integer m where 0 <= m < n .

Zar.die_stream n composes Zar.die_transformer n with the default source of uniformly distributed random bits.

Zar.die n builds a sampler object over the stream produced by Zar.die_stream n.

Finite Distribution

Zar.findist_transformer weights builds a stream transformer from list of nonnegative integer weights weights (where 0 < weightsᵢ for some i) that when applied to a stream of uniformly distributed random bits generates int samples with Pr(i) = weightsᵢ / ∑ⱼweightsⱼ for integer 0 <= i < |weights|.

Zar.findist_stream weights composes Zar.findist_transformer weights with the default source of uniformly distributed random bits.

Zar.findist weights builds a sampler object over the stream produced by Zar.findist_stream weights.

Performance and Limitations

The samplers here are optimized for sampling performance at the expense of build time. Thus, this library may not be ideal if your use case involves frequent rebuilding due to changes in the samplers' parameters (e.g., the coin's bias or the number of sides of the die).

The size of the in-memory representation of a coin with bias p = num / denom is proportional to denom (after bringing the fraction to reduced form). The size of an n-sided die is proportional to n, and the size of a finite distribution to the sum of its weights. The formal results we provide are partial in the sense that they only apply to samplers that execute without running out of memory. I.e., we do not provide any guarantees against stack overflow or out-of-memory errors when, e.g., n is too large.

Acknowledgments

Thanks to mooreryan for comments and code contributions.

ocamlzar's People

Contributors

bagnalla avatar mooreryan avatar

Stargazers

Seb Mondet avatar Sora Morimoto avatar Thomas Gazagnaire avatar Mika Illouz avatar  avatar

Watchers

 avatar  avatar

Forkers

mooreryan

ocamlzar's Issues

Potential bugs with random seeding

Hi, I have run into a couple of potential bugs (or, they could definitely be my misunderstanding of what should be happening!)

  • For Zar.take, the first value (end of the list) is always the same number (because it is always taking the first of the stream). This makes sense, but maybe not the most useful for sampling...eg you would need to do something like die |> Zar.drop 1 |> Zar.take 10 or something to get a random sample.
  • The random sampling seems to not respect the random seeding in a way that I would expect. If you look at the example program below, ...
    • test_it_1 function generate the first 14 values differently and then all for int lists will end with the same set of numbers.
    • test_it_2 is also a bit weird in which gives two separate streams
    • Given that the source of randomness looks to be this line: let rec bits () : bool stream = SCons (Random.bool (), bits), which uses Random module, I would expect re-initializing the Random module with the same seed will produce the same stream of random values, but that is not the case.
let sexp_of_int_list ints = 
  let open Sexplib.Sexp in
  List (List.map (fun i -> Atom (Int.to_string i)) ints)

let print_int_list ints = 
  ints |> sexp_of_int_list |> Sexplib.Sexp.to_string_mach |> print_endline

let test_it_1 () = 
  print_endline "test_it_1";
  Random.init 1234321; 
  let die = Zar.die 100 in
  die |> Zar.take 75 |> print_int_list;
  Random.init 1234321; 
  die |> Zar.take 75 |> print_int_list;
  Random.init 1234321; 
  let die = Zar.die 100 in
  die |> Zar.take 75 |> print_int_list;
  Random.init 1234321; 
  die |> Zar.take 75 |> print_int_list;
  ()

let test_it_2 () = 
  print_endline "test_it_2";
  Random.init 44553; 
  let die = Zar.die 100 in
  die |> Zar.take 75 |> print_int_list;
  Random.init 44553; 
  die |> Zar.take 75 |> print_int_list;
  Random.init 44553; 
  let die = Zar.die 100 in
  die |> Zar.take 75 |> print_int_list;
  Random.init 44553; 
  die |> Zar.take 75 |> print_int_list;
  ()


let () = 
  test_it_1 ();
  test_it_2 ();
  ()

Output

test_it_1                        
(68 44 70 17 64 24 21 65 51 50 18 98 5 62 36 80 22 70 40 31 62 4 38 57 10 32 6 65 22 0 92 48 25 28 74 82 97 46 36 16 29 10 34 94 48 20 19 56 96 78 40 11 0 76 50 37 46 36 14 13 40 20 8 39 45 79 85 3 15 9 1 83 35 63 73)
(68 44 70 17 64 24 21 65 51 50 18 98 5 62 36 80 22 70 40 31 62 4 38 57 10 32 6 65 22 0 92 48 25 28 74 82 97 46 36 16 29 10 34 94 48 20 19 56 96 78 40 11 0 76 50 37 46 36 14 13 40 54 69 72 89 92 51 57 54 50 91 67 81 86 73)
(68 44 70 17 64 24 21 65 51 50 18 98 5 62 36 80 22 70 40 31 62 4 38 57 10 32 6 65 22 0 92 48 25 28 74 82 97 46 36 16 29 10 34 94 48 20 19 56 96 78 40 11 0 76 50 37 46 36 14 13 40 20 8 39 45 79 85 3 15 9 1 83 35 63 73)
(68 44 70 17 64 24 21 65 51 50 18 98 5 62 36 80 22 70 40 31 62 4 38 57 10 32 6 65 22 0 92 48 25 28 74 82 97 46 36 16 29 10 34 94 48 20 19 56 96 78 40 11 0 76 50 37 46 36 14 13 40 54 69 72 89 92 51 57 54 50 91 67 81 86 73)
test_it_2
(84 73 73 20 86 39 20 28 51 52 38 23 30 88 33 4 50 88 6 40 33 76 34 29 84 64 8 38 45 9 57 19 12 83 95 25 78 58 21 10 8 18 44 62 41 83 37 23 4 77 27 79 15 50 54 5 74 7 89 21 15 51 93 97 90 7 30 57 69 61 55 54 69 24 11)
(20 86 39 20 28 51 52 38 23 30 88 33 4 50 88 6 40 33 76 34 29 84 64 8 48 40 57 34 5 37 68 95 2 78 56 17 71 48 44 64 25 53 92 1 28 46 34 63 66 86 24 8 39 68 16 38 94 60 57 11 72 7 67 97 44 18 25 28 90 56 24 55 37 29 11)
(84 73 73 20 86 39 20 28 51 52 38 23 30 88 33 4 50 88 6 40 33 76 34 29 84 64 8 38 45 9 57 19 12 83 95 25 78 58 21 10 8 18 44 62 41 83 37 23 4 77 27 79 15 50 54 5 74 7 89 21 15 51 93 97 90 7 30 57 69 61 55 54 69 24 11)
(20 86 39 20 28 51 52 38 23 30 88 33 4 50 88 6 40 33 76 34 29 84 64 8 48 40 57 34 5 37 68 95 2 78 56 17 71 48 44 64 25 53 92 1 28 46 34 63 66 86 24 8 39 68 16 38 94 60 57 11 72 7 67 97 44 18 25 28 90 56 24 55 37 29 11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.