Code Monkey home page Code Monkey logo

mersenne-twister's Introduction

A fast Mersenne Twister in C++

This is an implementation of the fast pseudo-random number generator (PRNG) MT19937, colloquially called the Mersenne Twister. It was given this name because it has a period of 2^19937 - 1, which is a Mersenne prime.

The code here is actually ~1.6 times faster (on Intel CPUs) than the reference implementation mt19937ar.c (see below).

The Mersenne Twister is highly regarded for its performance and high quality pseudo-random numbers. In spite of this, it is not suited for cryptographic code, because one only needs to observe 624 iterates to predict all future ones. It was designed with statistical simulations in mind, and should therefore be quite good for Monte Carlo simulations, probabilistic algorithms and so on.

You can read more about the Mersenne Twister on Wikipedia.

UPDATE

I removed the srand and rand C standard library drop-in functions, because I believe they contained errors.

OLDER UPDATE

All prior versions with loop unrolling had a bug that caused numbers to differ significantly from the reference implementation. This has now been fixed, and the tests have been expanded to test 2000 consecutive numbers and numbers at doubling index positions up to over four billion. Thanks to Mikael Leetmaa for letting me know about this!

I've also fixed an out-of-bounds read in the MT array. Thanks to Nyall Dawson for finding this bug!

Performance

This implementation is very fast. It runs faster than the reference, non-SIMD, implementation in the original paper (and the more recent code in mt19937ar.c) on the computers I've checked with (all Intel CPUs of different generations). To be sure, please run the test program by typing make check. See below for numbers.

The original optimization trick I did was to unroll the loop in generate_number() three times to avoid the relatively expensive modulus operations. The mod instructions were used to have the array index wrap around, but was alleviated with the loops and some simple arithmetic. This is a well known trick.

However, I tried unrolling each loop even more, since the loop counters can be factorized. The idea was to fill the CPU's instruction pipeline and avoid flushing it. It worked fine on my Intel Core i7 computer, and increased performance from around 186M numbers/sec to 204M numbers/sec. It may not work as well on other architectures, though.

I was tipped by Michel Valin of another neat trick to choose which value in the matrix to use. The previous code checked if y was odd with a bitwise AND, and used that to index into an array to choose between zero and a magic value. However, there's a really cool trick that can be used instead: Just put the y value in a signed variable, shift left 31 then shift right 31 again. What happens then is that, because the variable is signed, it will use the SARL x86 instruction so that the LSB is effectively copied to all other bit positions. The final step is then to bitwise AND the result with the magic value. In other words, previous code:

static int MATRIX[] = {0, 0x12345678};
// ...
uint32_t foo = MATRIX[y & 1] // 0 if y is even, 0x12345678 if it's odd

changes to

uint32_t foo = ((int32_t(y) << 31) >> 31) & 0x12345678

With -ftree-vectorize, it seems that this trick does wonders and speeds up the code even more.

Finally, note that people have done SIMD and CUDA implementations. If you are looking for even more speed, I suggest you check them out.

Compilation and usage

To build the example, just type

$ make clean check

You'll see if this implementation runs faster than the reference non-SIMD Mersenne Twister.

On an older Intel Core i7 (my machine), using clang 4 (I think) the output looks like this:

$ ./test-mt 20
Testing Mersenne Twister with reference implementation
  * Pass 1/2  OK
  * Pass 2/2  OK

Timing our implementation (best times over 20 passes) ... 
  1.0321990s 
  0.9729490s ..................
  min=0.972949s max=1.0322s mean=1.00114s stddev=0.0143048s
  193.8 million — 205.6 million numbers/second

Timing reference mt19937ar.c (best times over 20 passes) ... 
  1.1132160s .
  1.1116460s ..
  1.0994660s .
  1.0944240s ............
  min=1.09442s max=1.14412s mean=1.12278s stddev=0.0126344s
  174.8 million — 182.7 million numbers/second

1.12485 times faster than the reference (ratio of best runs)

On an Intel Xeon with gcc 6.3:

Testing Mersenne Twister with reference implementation
  * Pass 1/2  OK
  * Pass 2/2  OK

Timing our implementation (best times over 20 passes) ...
  0.5661380s
  0.5654360s
  0.5652670s .
  0.5649580s ...
  0.5641450s ...........
  min=0.564145s max=0.569173s mean=0.565719s stddev=0.00129429s
  351.4 million — 354.5 million numbers/second

Timing reference mt19937ar.c (best times over 20 passes) ...
  0.9026930s
  0.9001340s
  0.8963510s .
  0.8963100s .
  0.8959060s ..
  0.8956320s .
  0.8955830s ......
  0.8953400s .
  min=0.89534s max=0.902693s mean=0.896933s stddev=0.00172019s
  221.6 million — 223.4 million numbers/second

1.58707 times faster than the reference (ratio of best runs)

You can pass the number of iterations to perform on the command line, e.g.

$ ./test-mt 100

This is quite important to let the CPU throttle up to get the best numbers. For each iteration, the time is printed if it's better than seen before. If it isn't better, a dot is printed.

To actually use the code, include the header and cpp file into your project. Then

namespace mt {
  #include "mersenne-twister.h"
}

// ...

mt::seed(1234);
printf("a pseudo-random number: %d\n", mt::rand_u32());

Also look at the Makefile here as well, it contains a few optimization flags that you may want to use.

Portability

The code should be portable, although I have only tried on Intel CPUs. I don't know if the speed holds up on other CPUs (and I doubt it). If you're not on UNIX, you can pretty easily port the code.

The MT19937 algorithm is inherently 32-bit, so you only get 32-bit values.

Bugs

Please report any bugs to the author.

Author and license

Written by Christian Stigen Larsen

Distributed under the modified BSD license.

2015-02-17, 2017-12-06

References

mersenne-twister's People

Contributors

cslarsen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mersenne-twister's Issues

Is there some possibility to easily implement efficient method discard() ?

Do you think there is some possibility to easily implement efficient discard() ? I mean something analogous to std::mt19937::discard(). Any hints?

UPDATE: anyway it seems the original std::mt19937::discard() is not efficient at all (just generates in loop and discards the result), so this is not really the issue any more...

Bug with MT_UNROLL_MORE

The following code is bugged when MT_UNROLL_MORE is set.

    // i = [0 ... 226]
    while (i < DIFF)
    {
        /*
         * We're doing 226 = 113*2, an even number of steps, so we can safely
         * unroll one more step here for speed:
         */
        UNROLL(i + PERIOD);

#ifdef MT_UNROLL_MORE
        UNROLL(i + PERIOD);
#endif
    }

With MT_UNROLL_MORE, the last iteration begins with i set to 226. The loop enters and completes two more iterations, ending with i = 228, completing one more iteration than it should've. Interestingly, G++ is smart enough to produce a warning about this.

With MT_UNROLL_MORE, the following code fixes the bug and produces the same output as without MT_UNROLL_MORE:

    while (i < DIFF - 2)
    {
        /*
         * We're doing 226 = 113*2, an even number of steps, so we can safely
         * unroll one more step here for speed:
         */
        UNROLL(i + PERIOD);

#ifdef MT_UNROLL_MORE
        UNROLL(i + PERIOD);
#endif
    }
    UNROLL(i + PERIOD);

Out of bounds read in mersenne-twister.cpp

I've recently run a Coverity scan over a project which includes this library. It's flagged mersenne-twister.cpp line 76 as an out of bounds read for MT. I've double checked and it looks like an error to me, as in the for loop in line 66 i ranges from [0, 226]. Line 74 increments i by one, so the max value of i will be 227. Then, in line 76, MT 227 + 397 is read, which is outside the bounds of MT.

can't use your realisation

try to run your code like in readme example:

#include <cstdint>
#include <random>  
#include <iostream>
namespace mt {
#include "mersenne-twister.h"
}
int main () {
mt::seed(1234);
std::cout << mt::rand_u32() << std::endl;
return 0;
}

out:

/tmp/ccp47hIW.o: In function `main':
ownTest.cpp:(.text+0xa): undefined reference to `seed'
ownTest.cpp:(.text+0xf): undefined reference to `rand_u32'
collect2: error: ld returned 1 exit status

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.