juliamath / sobol.jl Goto Github PK

generation of Sobol low-discrepancy sequence (LDS) for the Julia language

License: Other

Julia 100.00%

sobol-sequence math julia monte-carlo low-discrepancy-sequence quasi-monte-carlo

sobol.jl's Introduction

The Sobol module for Julia

This module provides a free Julia-language Sobol low-discrepancy-sequence (LDS) implementation. This generates "quasi-random" sequences of points in N dimensions which are equally distributed over an N-dimensional hypercube.

The advantage of an LDS over truly random (or pseudo-random) numbers is that an LDS (which is not random) tends to be more evenly distributed for finite numbers of points. This is used in quasi-Monte Carlo methods in order to accelerate convergence compared to traditional Monte Carlo methods employing random sequences.

It can be installed using the Julia package manager via Pkg.add("Sobol").

Algorithm

This is an independent implementation, originally by Steven G. Johnson, of the algorithm for generation of Sobol sequences in up to 21201 dimensions described in:

P. Bratley and B. L. Fox, Algorithm 659: Implementing Sobol's quasirandom sequence generator, ACM Trans. Math. Soft. 14 (1), pp. 88-100 (1988), doi:10.1145/42288.214372.
S. Joe and F. Y. Kuo, Remark on algorithm 659: Implementing Sobol's quasirandom sequence generator, ACM Trans. Math. Soft 29 (1), 49-57 (2003), doi:10.1145/641876.641879.

Originally implemented in C in 2007 as part of the NLopt library for nonlinear optimization, the code was subsequently converted by Ken-B into pure Julia with roughly the same performance.

It is important to emphasize that SGJ's implementation was based on the mathematical description of the algorithms only, and was done without reference to the Fortran code from the Trans. Math. Soft. (TOMS) papers. The reason is that TOMS code is not free/open-source software (it falls under restrictive ACM copyright terms). (SGJ did re-use a table of primitive polynomials and coefficients from the TOMS code, but since this is merely a tabulation of mathematical facts it is not copyrightable.) SGJ's implementation in NLopt, along with this Julia translation, is free/open-source software under the MIT ("expat") license.

Direction numbers used were derived from the file http://web.maths.unsw.edu.au/~fkuo/sobol/new-joe-kuo-6.21201

Technically, we implement a 32-bit Sobol sequence. After 2³²-1 points, the sequence terminates, and subsequently our implementation returns pseudo-random numbers generated by the Mersenne Twister algorithm. In practical applications, however, this point is rarely reached.

Usage

To initialize a Sobol sequence s in N dimensions (0 < N < 21201), use the SobolSeq constructor:

using Sobol
s = SobolSeq(N)

Then

x = next!(s)

returns the next point (a Vector{Float64}) in the sequence; each point lies in the hypercube [0,1]^N. You can also compute the next point in-place with

next!(s, x)

where x should be a Vector of length N of some floating-point type (e.g. Float64, Float32, or BigFloat).

You can also use a SobolSeq as an iterator in Julia:

for x in SobolSeq(N)
   ...
end

Note, however, that the loop will never terminate unless you explicitly call break (or similar) in the loop body at some point of your choosing.

We also provide a different SobolSeq constructor to provide an N-dimensional Sobol sequence rescaled to an arbitrary hypercube:

s = SobolSeq(lb, ub)

where lb and ub are arrays (or other iterables) of length N, giving the lower and upper bounds of the hypercube, respectively. For example, SobolSeq([-1,0,0],[1,3,2]) generates points in the box [-1,1]×[0,3]×[0,2]. (Although the generated points are Float64 by default, in general the precision is determined by that of lb and ub.)

If you know in advance the number n of points that you plan to generate, some authors suggest that better uniformity can be attained by first skipping the initial portion of the LDS. In particular, we skip 2ᵐ−1 for the largest m where 2ᵐ-1 ≤ n (see Joe and Kuo, 2003 for a similar suggestion). This facility is provided by:

skip(s, n)

Skipping exactly n elements is also possible:

skip(s, n, exact=true)

skip returns s, so you can simply do s = skip(SobolSeq(N)) or similar.

Example

Here is a simple example, generating 1024 points in two dimensions and plotting them with the PyPlot package. Note the highly uniform, nonrandom distribution of points in the [0,1]×[0,1] unit square!

using Sobol
using PyPlot
s = SobolSeq(2)
p = reduce(hcat, next!(s) for i = 1:1024)'
subplot(111, aspect="equal")
plot(p[:,1], p[:,2], "r.")

Author

This module was written by Steven G. Johnson.

sobol.jl's People

Contributors

Stargazers

Watchers

Forkers

ken-b tkelman johnadders rsoaresp cortner cc7768 yakir12 moritzdrechselgrau ludoro cscherrer jiahao fagan2888

sobol.jl's Issues

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

[PkgEval] Sobol may have a testing issue on Julia 0.3 (2014-08-16)

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.

On Julia 0.3

On 2014-08-15 the testing status was Tests pass.
On 2014-08-16 the testing status changed to Package doesn't load.

Tests pass. means that PackageEvaluator found the tests for your package, executed them, and they all passed.

Package doesn't load. means that PackageEvaluator did not find tests for your package. Additionally, trying to load your package with using failed.

This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.

Test log:

>>> 'Pkg.add("Sobol")' log
INFO: Cloning cache of Sobol from git://github.com/stevengj/Sobol.jl.git
INFO: Installing BinDeps v0.3.0
INFO: Installing MathProgBase v0.2.5
INFO: Installing NLopt v0.1.1
INFO: Installing SHA v0.0.2
INFO: Installing Sobol v0.1.0
INFO: Installing URIParser v0.0.2
INFO: Building NLopt
INFO: Package database updated
INFO: METADATA is out-of-date a you may not have the latest version of Sobol
INFO: Use `Pkg.update()` to get the latest versions of your packages

>>> 'using Sobol' log
ERROR: type: ccall: expected Symbol, got Array{Any,1}
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at loading.jl:152
 in _require at loading.jl:67
 in require at loading.jl:54
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at loading.jl:152
 in _require at loading.jl:67
 in require at loading.jl:51
 in include at ./boot.jl:245
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:285
 in _start at ./client.jl:354
 in _start_3B_1699 at /home/idunning/julia03/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.3/NLopt/src/NLopt.jl, in expression starting on line 375
while loading /home/idunning/pkgtest/.julia/v0.3/Sobol/src/Sobol.jl, in expression starting on line 3
while loading /home/idunning/pkgtest/.julia/v0.3/Sobol/testusing.jl, in expression starting on line 1


>>> test log
no tests to run
>>> end of log

add test for values > typemax(UInt32)

from julia-users:
https://groups.google.com/forum/#!topic/julia-users/Jpj_Zb7EmcQ

Fixed in 28c1860. Should we add a test?

I haven't looked into the issue itself, but could have a look at the end of the week.

Skipping on powers of 2

I believe your "skip" function should be altered to actually skip one less than it currently does. As a simple benchmark, if I take a vector of floats and convert each to boolean by testing with >=0.5, I would like it to be the case that a set of 2^n samples covers each binary vector exactly once. This works if I skip 2^n - 1 times:

s = SobolSeq(4); for u=1 : 15;next!(s);end

sort(map(bs ->
        sum(map(x->x[2] >= 0.5 ? x[1] : 0, zip(2 .^ [0 1 2 3], bs))),
        map(x->next!(s), 1:16)
        ))

returns all numbers from 0 to 15 exactly once. If I instead run

s = SobolSeq(4);for u=1:16;next!(s);end

sort(map(bs ->
        sum(map(x->x[2] >= 0.5 ? x[1] : 0, zip(2 .^ [0 1 2 3], bs))),
        map(x->next!(s), 1:16)
        ))

then I get a repeat of 12, and no instance of 8. But skipping the 2^n - 1 seems to produce correct results for all cases I tried.

The advice of "skip the largest power of 2 below your desired one" seems quite bad, actually. Suppose I only want to produce 15 values of 4-bits -- I would hope to get no repeats, and one missing; maybe something less perfect than that. But it's actually quite terrible!

julia> s = SobolSeq(4);skip(s,15)

julia> sort(map(bs ->
        sum(map(x->x[2] >= 0.5 ? x[1] : 0, zip(2 .^ [0 1 2 3], bs))),
        map(x->next!(s), 1:15)
        ))
15-element Array{Int64,1}:
  4
  5
  5
  6
  6
  7
  7
  8
  8
  9
  9
 10
 10
 11
 11

That is to say, every vector produced has (bit 2 XOR bit 3) true! That should only be true in half of them! This is the result of only skipping 8. If you skip 15, as I suggest, then you get better results. (And, again: this seems true at various higher numbers as well.)

next() performance regression in 0.5-rc3

There seems to be a bit of a performance dip (10x) in 0.5-rc3 compared to 0.4.6, especially for scaled Sobol Sequences

e.g.
The benchmark on 0.5-rc3

S=SobolSeq(1,4,6)
@benchmark next(S)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 9
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 96.00 bytes
allocs estimate: 1
minimum time: 2.81 μs (0.00% GC)
median time: 2.91 μs (0.00% GC)
mean time: 2.90 μs (0.00% GC)
maximum time: 6.16 μs (0.00% GC)

While on 0.4.6

S=SobolSeq(1,4,6)
@benchmark next(S)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 251
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 80.00 bytes
allocs estimate: 1
minimum time: 306.00 ns (0.00% GC)
median time: 321.00 ns (0.00% GC)
mean time: 329.61 ns (1.27% GC)
maximum time: 6.40 μs (93.92% GC)

Any ideas?

Start the sequence at zero and add scrambling

First of all, thank youfor writing Sobol.jl. I have been learning about Quasi Monte Carlo (QMC) recently and I'm so glad that there is a Sobol module in Julia. I want to suggest a fairly significant API change, but I hope to convince you that it's a very good idea.

1) Include the initial $(0,0,...)$ point in the Sobol sequence

The first value in the Sobol sequence is zero. This recent paper by Art Owen shows that the common practice of dropping the $(0,0,...)$ point (as done in Sobol.jl) ruins the digital net property of Sobol and makes its convergence worse. What makes Sobol a "digital net" is the fact that if you take the first $2^n$ Sobol points (in any dimension) you can partition your unit interval / square / hypercube into $2^k (k \le n)$ even slices and each slice will get its fair share $2^(n-k)$ of the points. This is the key to its value in QMC. If you skip the $(0,0,...)$ and draw $2^n$ points, the result is not a digital net. Let me illustrate:

using PyPlot

for x in 0:(1/4):1
    for y in 0:(1/4):1
        plot([x,x],[0,1],"C1--")
        plot([0,1],[y,y],"C1--")
    end
end

dim = 2
seq = SobolSeq(dim)

# Include the (0,0) point, as per Owen (2021).
plot([0],[0],"ko")

for j in 1:(N-1)
    local x,y = next!(seq)
    plot([x],[y],"C0o")
end

# The 'extra' point.
x,y = next!(seq)
plot([x],[y],"ro")

Notice how Sobol.jl leaves the bottom-left square empty and instead puts two points in another square. So that ruins the digital net, and the convergence. You can fix this by skipping the next $2^n - 1$ so that the next $2^n$ points do make a digital net. I suspect that this might be the origin of the common practice of skipping $2^n - 1$ points. This also addresses the concern that $(0,0,...)$ is a corner case that could lead to weird results. But as Owen's paper shows (Figure 1), an even better solution is to scramble the points.

2) Add support for scrambled Sobol points

Back in the 90's Owen proposed scrambling digital nets to create a hybrid method that brings the best features of MC and QMC. In Randomized QMC (RQMC) you scramble the points in a way that preserves the digital net but creates a random instance and removes the corner points like $(0,0,...)$. Figure 1 of Owen and Rudolf (2021) has a nice illustration of MC vs QMC vs RQMC.

Since Owen's proposal, several authors have proposed various ways to scramble the points and have shown empirically that RQMC has better convergence than QMC. But more importantly, RQMC recovers a feature from MC that is lost in QMC: The ability to measure uncertainty. In MC, if you draw $N$ points to compute some integral, you can repeat the experiment many times to measure the variance of that result. This is lost in QMC because the points are not random. But RQMC lets you do something similar by generating multiple scrambles.

A scrambled Sobol is already implemented in scipy.stats.qmc.Sobol and in Owen's R package (rsobol). Unfortunately I was not able to figure out either method works, so I could not translate them to Julia. It doesn't look complicated at all; I just don't understand SciPy, R, or Sobol direction vectors well enough. They don't seem to be doing exactly the same thing, but they both seem to be doing something short and clever with the direction vectors.

I should mention that the R code from Owen is BSD-licensed, so if you can read R, you can just take his method and convert it to Julia.

Notice that the SciPy and R implementations both default to scrambled points and you have to explicitly ask for unscrambled points.

3) Don't produce Sobol points one at a time, draw $2^n$ at a time

At this point I hope I have convinced you that drawing anything other than a power of 2 worth of Sobol points ruins the digital net, and misses what makes Sobol powerful. In Owen's article he says

Using 1000 points of a Sobol’ sequence may well be less accurate than using 512.

The SciPy API strongly encourages you to draw $2^n$ points, and while they allow you to draw a different number, they give you a big warning to not do that. Owen's R code goes even farther --- it doesn't have an option to draw anything but a power of two. The input parameters are the exponent and the dimensionality. So I think Sobol.jl should do the same thing. Ditch the next!() method and instead draw a power of two.

I realize that this was a long request and that I'm suggesting breaking the API, but I hope I have made a convincing case that this would turn Sobol.jl into a very powerful tool for anyone computing high-dimensional integrals with RQMC methods. Not coincidentally, this is precisely what I am hoping to do with it.

Let me know what you think.

References:

iterator interface

Thanks for this great library, I find it very useful. However, I am not sure that the library supports the iteration interface of Julia. Eg Sobol.SobolSeq returns not an iterator, but an iterator and a state. So, for example,

import Sobol

function collect_take(iter, n)
    v = Vector{Float64}(0)
    for (i, x) in enumerate(iter)
        push!(v, x[1])
        if (i >= n)
            break
        end
    end
    v
end
double_take(iter, n) = vcat(collect_take(iter,5), collect_take(iter,5))
double_take(Sobol.SobolSeq(1), 5)

returns the first 10 elements (instead of the first 5, twice). Cf, for example,

double_take(countfrom(1),5)

Also, idioms like

[x[1] for x in take(Sobol.SobolSeq(1),5)]

don't work.

Comment

Maybe this is kind of neat for sub-random number generation:
https://en.wikipedia.org/wiki/Low-discrepancy_sequence#Additive_recurrence
You can use the golden ratio and some other ones:
https://en.wikipedia.org/wiki/Silver_ratio
https://en.wikipedia.org/wiki/Plastic_number
Luckily the wikipedia entries have the values in hexadecimal for coding and to check if the final digit in 32 or 64 bit is odd, which is better than even.

Unable to load on Julia nightly -- LoadError: syntax: expression too large

This was working fine on 1.5.0-DEV.854 (2020-05-05) (9 days old) but today gives an error:

$ /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia --startup-file=no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.0-DEV.41 (2020-05-15)
 _/ |\__'_|_|_|\__'_|  |  Commit 8370133584 (0 days old master)
|__/                   |

julia> using Sobol
[ Info: Precompiling Sobol [ed01d8cd-4d21-5b2a-85b4-cc3bdc58bad4]
ERROR: LoadError: LoadError: syntax: expression too large
Stacktrace:
 [1] top-level scope at /Users/me/.julia/packages/Sobol/UKDQ8/src/soboldata.jl:2171
 [2] include at ./Base.jl:368 [inlined]
 [3] include(::String) at /Users/me/.julia/packages/Sobol/UKDQ8/src/Sobol.jl:1
 [4] top-level scope at /Users/me/.julia/packages/Sobol/UKDQ8/src/Sobol.jl:5
 [5] include(::Module, ::String) at ./Base.jl:368
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:331 [inlined]
 [8] eval(::Expr) at ./client.jl:448
 [9] top-level scope at ./none:3
in expression starting at /Users/me/.julia/packages/Sobol/UKDQ8/src/soboldata.jl:2171
in expression starting at /Users/me/.julia/packages/Sobol/UKDQ8/src/Sobol.jl:1
ERROR: Failed to precompile Sobol [ed01d8cd-4d21-5b2a-85b4-cc3bdc58bad4] to /Users/me/.julia/compiled/v1.6/Sobol/ohkOh_nZr7I.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1328
 [3] _require(::Base.PkgId) at ./loading.jl:1030
 [4] require(::Base.PkgId) at ./loading.jl:928
 [5] require(::Module, ::Symbol) at ./loading.jl:923

julia>

Shouldn't this be the position of rightmost zero in base2?

https://github.com/stevengj/Sobol.jl/blob/bc88e65341786898e2be250f53b447d9871ccd7e/src/Sobol.jl#L65

The function trailing_zeros seems to be different from what's mentioned in some of the papers I found. Also see https://web.maths.unsw.edu.au/~fkuo/sobol/joe-kuo-notes.pdf#page=2&zoom=auto,-207,420 — some test values are given below eqn(5).
I didn't understand your implementation entirely, so I might be wrong!

Skipping an exact number of elements

In my application, I draw parameters from a SobolSeq and do some expensive simulations with them. I distribute the calculations across machines, so each machine is supposed to use a certain range of Sobol numbers.

Lets say, I want Sobol numbers n:(n+m). How do I get them other than

[next!(s) for 1:n+m][n+1:end]

Is there any benefit in using the existing skip function? Does it do something clever, or does it also compute and throw away the unwanted ones?

Is there interest in adding this function to the package?

function skip_exact(s, n)
  n_eff = 2 ^ floor(Int, log(n) / log(2))
  Δ = n - n_eff
  skip(s, n)
  for i in 1:Δ
    next!(s)
  end
  s
end

# want to get the nth element
n = 10

s1 = SobolSeq(1)
x1 = [next!(s1)[1] for i in 1:n][n]

s2 = SobolSeq(1)
skip_exact(s2, n-1)
x2 = next!(s2)[1]

@assert x1 == x2

use StaticArrays

Might be nice to use StaticArrays (i.e. return an SVector, not a Vector), since dimension of a SobolSeq is part of the type already.

However, for very high dimensions, a Vector is probably more appropriate, so the array type should probably be an additional type parameter.

benchmark/compare against Burkardt's implementation

It would be interesting to compare against this implementation, both for correctness and performance.

Increasing dimension of Sobol sequences

Not necessarily an issue, but this seems like the best place to put this.

There seems to be tables for higher dimensional Sequences available online. Would this be interesting to incorporate?

I'm interested in using Sobol sequences in financial applicaitons where the dimension needs to be very large, but from what I've read quasi-monte-carlo does poorly in high dimension. Perhaps it is unnecessary to code a Sobol sequence generator for such high dimensional problems?