Fast Random Numbers in C++

I really enjoy working with random numbers. Over time I have worked with quite some implementations, some of them heavily flawed. Some time ago I took the time and tried to find the best one for C++. As usual, there is no best one for all use cases, but there are reasonable choices for each use case.

For more important and critical numbers, I suppose it can be a good choice to go with a cryptographic hash, like SHA-256 or some better one, generate a good one time salt and append it with an incrementing number. Then extract the necessary amount of bytes from the hash result to map to your target range. As the hash output is supposed to be distributed evenly, it can act as a pseudo entropy source based on a fixed state. If the numbers are really critical, you should be using a cryptography library instead of rolling your own.

However, in most cases you do not need cryptographically strong random numbers. You also don't want some huge code dependency just for some dice throws.

1. Linear Congruential Generator

The simplest solution is a LCG which can be implemented very easily. It has huge but well understood flaws, while its main advantage is simplicity.

Listing 1: See Numerical Recipes

uint32_t lcg_ranqd1(uint32_t *state) {
      return (*state = *state * 1664525 + 1013904223);
}

If we only do a multiplication with a modulo, skipping the increment, it is called a Multiplicative Congruential Generator (MCG).

Listing 2: See Lehmer RNG

uint32_t lcg_minstd_cpp11(uint32_t *state) {
      return (*state = (uint64_t) *state * 48271 % 0x7FFFFFFF);
}

2. Permuted Congruential Generator

I strongly suggest using this one over a plain LCG as the quality of the generated values is drastically improved. See PCG website and PCG Family Paper (Melissa E. O'Neill, 2014). I have had some good success with these generators, since I have started using them a few years ago. They offer very good quality/performance ratio, useful features, and a standard library compatible C++ implementation.

3. Efficient Number Generations

To actually generate numbers efficiently, you have to avoid the usual standard library uniform distributions, because they can be relatively slow.

3.1. Booleans

This is the simple one. Just make sure you take a good bit. If you use a LCG, make sure to not use the first bit of its state, as it will probably be the most weak. I personally have been rolling with the fourth bit for PCG, though you might have a different favorite. Some people also prefer a sign check, which is actually just testing the most significant bit instead.

template <class rng_t>
inline bool bool_dist(rng_t &rng) {
    return rng() & 0b1000;
}

3.2. Integers

Luckily, there are a lot of ways of generating bounded random integers. This is where you will have to decide if you want a really fast generation or a good generation. I personally like using the following one, it even optimizes nicely on 32 bit platforms. However, it is biased very slightly.

template <class rng_t>
inline uint32_t int_dist_fast(rng_t &rng, uint32_t range) {
    uint32_t x = rng();
    uint64_t m = uint64_t(x) * uint64_t(range);
    return m >> 32;
}

template <class rng_t, class T>
inline T int_dist_fast(rng_t &rng, T min, T max) {
    assert(max >= min);
    return min + int_dist_fast(rng, max - min + 1);
}

I have been using the suggested benchmark winner for my unbiased generator. It is not that much slower, as the added special case is very rare.

template <class rng_t>
inline uint32_t int_dist_unbiased(rng_t &rng, uint32_t range) {
    uint32_t x = rng();
    uint64_t m = uint64_t(x) * uint64_t(range);
    uint32_t l = uint32_t(m);
    if (l < range) [[unlikely]] {
        uint32_t t = -range;
        if (t >= range) [[unlikely]] {
            t -= range;
            if (t >= range) [[unlikely]] {
                t %= range;
            }
        }
        while (l < t) {
            x = rng();
            m = uint64_t(x) * uint64_t(range);
            l = uint32_t(m);
        }
    }
    return m >> 32;
}

template <class rng_t, class T>
inline T int_dist_unbiased(rng_t &rng, T min, T max) {
    assert(max >= min);
    return min + int_dist_unbiased(rng, max - min + 1);
}

3.3. Floating Point

Floats are much harder due to their arbitrary precision. If the values do not have to be perfect, you can still be efficient however. Ignoring denormalized values and ignoring the variable resolution, we can come up with a really fast generator. This one makes some clever use of the precision bits to generate in the range 1.0 to 2.0, so we subtract one.

inline float uint32_to_float(uint32_t i) {
    union {
        uint32_t i;
        float f;
    } o = {i};
    return o.f;
}

template <class rng_t>
inline float float_dist_fast(rng_t &rng) {
    return uint32_to_float(0x3F800000u | (rng() >> 9)) - 1.0f;
}

template <class rng_t>
inline float float_dist_fast(rng_t &rng, float max) {
    return float_dist_fast(rng) * max;
}

template <class rng_t>
inline float float_dist_fast(rng_t &rng, float min, float max) {
    return min + float_dist_fast(rng) * max;
}

You can do the same for doubles with different constants. Just need to be careful to really get 64 bits of random out of your generator.

template <class rng_t>
inline double double_dist_fast(rng_t &rng) {
    uint64_t v = ((uint64_t) rng() << 32) ^ rng();
    uint64_t r = (v & 0xFFFFFFFFFFFFFull) | 0x3FF0000000000000ull;
    return *(double *) (&r) - 1.0;
}