12

Question: How can I generate a stream of unique random number in Go?

Namely, I want to guarantee there's no duplication in array a using math/rand and/or standard Go library utilities.

func RandomNumberGenerator() *rand.Rand {
    s1 := rand.NewSource(time.Now().UnixNano())
    r1 := rand.New(s1)          
    return r1
}
rng := RandomNumberGenerator()    
N := 10000
for i := 0; i < N; i++ {
    a[i] = rng.Int()
}

There are questions and solutions on how to generate a series of random number in Go, for example, here.

But right now I want to generate a series of random number that does not duplicate with previous value. Is there a standard/recommended way to do it in Go?

My guess is to (1)use permutation or to (2) keep track of previously generated numbers and regenerate a value if it's been generated before.

But solution (1) sounds like an overkill if I only want a few number and (2) sounds very time consuming if I end up generating a long series of random numbers due to collision, and I guess it's also very memory-consuming.


Use Case: To benchmark a Go program with 10K, 100K, 1M pseudo-random number that has no duplications.

Community
  • 1
  • 1
cookieisaac
  • 1,119
  • 4
  • 14
  • 33
  • If you want to guarantee a unique random series with only the std library, you'll need to implement a full cycle PRNG. If predictability isn't as much of a concern there are simpler Linear congruential generators you can use. – JimB Oct 07 '16 at 21:12
  • See: How to generate unique random string in a length range using Golang?: http://stackoverflow.com/questions/38418171/how-to-generate-unique-random-string-in-a-length-range-using-golang –  Oct 07 '16 at 21:50
  • but it is (pseudo ) random number, How do you mean by unique? When you say random it is just random not unique! e.g. 99999 is random number! and in true RNG the next number may be again 99999 by chance! (it is random, isn't it!? –  Oct 07 '16 at 21:52
  • @Amd I do see your point, and I know that by requiring the number to be unique, it wouldn't be pure random any more. But I'm not trying to be a cryptologist here. All I need to do is to benchmark a go program with 10K, 100K, 1M pseudo-random number that has no duplication. – cookieisaac Oct 07 '16 at 22:19
  • if it is just benchmark why not just count from 1,2,3,.... it is unique, in congruential generators they use (a*x+b)%n so simply it is counting with prime steps..., e.g. 1,11,21,31,..., and see: Mersenne Twister, I hope this helps. –  Oct 07 '16 at 22:25
  • @Amd I want to verify the behavior under sequential input, and random input....any pre-set formula seems too deterministic to be convincingly random especially with sequential count... – cookieisaac Oct 07 '16 at 22:54
  • @cookieisaac Can you not generate 1,2,3... sequence and then just shuffle it? If the magnitude of the numbers is irrelevant, you just want a random order, this is probably the simplest solution that absolutely guarantees no repetition. – biziclop Oct 07 '16 at 23:41
  • @biziclop technically I could, that is my guess(1). However, I want to get N numbers from [-2^31, 2^31), then I have to shuffle 2^32 numbers and retrieve only the first N numbers. That is too much of an overkill when N << 2^32. (N is roughly 100 thousand and 2^32 is roughly 4 billion ) – cookieisaac Oct 08 '16 at 00:06
  • @cookieisaac Yes, if you aren't only interested in the order, shuffling won't help you. – biziclop Oct 08 '16 at 09:24
  • "*To benchmark a Go program with 10K, 100K, 1M pseudo-random number that has no duplications.*" Benchmarking or testing? If you're benchmarking, assume it works and just time how long it takes to get N numbers. If you're *testing* then you're testing the wrong thing. Random number generators can have duplicates, they're random. What you want to test instead is whether they're [distributed evenly](https://stackoverflow.com/a/2130691/14660). – Schwern May 28 '18 at 01:53

7 Answers7

3

You should absolutely go with approach 2. Let's assume you're running on a 64-bit machine, and thus generating 63-bit integers (64 bits, but rand.Int never returns negative numbers). Even if you generate 4 billion numbers, there's still only a 1 in 4 billion chance that any given number will be a duplicate. Thus, you'll almost never have to regenerate, and almost never never have to regenerate twice.

Try, for example:

type UniqueRand struct {
    generated map[int]bool
}

func (u *UniqueRand) Int() int {
    for {
        i := rand.Int()
        if !u.generated[i] {
            u.generated[i] = true
            return i
        }
    }
}
joshlf
  • 16,739
  • 9
  • 55
  • 78
  • 5
    after choosing 4 billion numbers from a 64bit range, you have over a 25% change of collision (https://en.wikipedia.org/wiki/Birthday_problem#Probability_table) – JimB Oct 07 '16 at 21:18
  • I'm trying to generate around 40000 unique int32, and my observation is that somehow I always manage to run into collisions using rand.Int() – cookieisaac Oct 07 '16 at 21:36
  • @JimB - you have a 25% of having a *single* collision. I'm talking about the probability that, in any given generation event, the new number you generate is one you've generated in the past. – joshlf Oct 07 '16 at 22:07
  • @cookieisaac It may be that rand.Int doesn't have very good randomness properties. You may want to check out `crypto/rand` instead. – joshlf Oct 07 '16 at 22:08
  • 1
    @joshlf Actually, it's a 25% chance of having *at least* one collision. – pjs Oct 07 '16 at 22:29
  • @pjs Close enough ;) – joshlf Oct 08 '16 at 04:21
  • 3
    @joshlf Not in my prob & stats class it ain't! – pjs Oct 08 '16 at 04:40
1

I had similar task to pick elements from initial slice by random uniq index. So from slice with 10k elements get 1k random uniq elements.

Here is simple head on solution:

import (
    "time"
    "math/rand"
)

func getRandomElements(array []string) []string {
    result := make([]string, 0)
    existingIndexes := make(map[int]struct{}, 0)
    randomElementsCount := 1000

    for i := 0; i < randomElementsCount; i++ {
        randomIndex := randomIndex(len(array), existingIndexes)
        result = append(result, array[randomIndex])
    }

    return result
}

func randomIndex(size int, existingIndexes map[int]struct{}) int {
    rand.Seed(time.Now().UnixNano())

    for {
        randomIndex := rand.Intn(size)

        _, exists := existingIndexes[randomIndex]
        if !exists {
            existingIndexes[randomIndex] = struct{}{}
            return randomIndex
        }
    }
}
Martin Zinovsky
  • 3,390
  • 1
  • 15
  • 24
0

1- Fast positive and negative int32 unique pseudo random numbers in 296ms using std lib:

package main

import (
    "fmt"
    "math/rand"
    "time"
)

func main() {
    const n = 1000000
    rand.Seed(time.Now().UTC().UnixNano())
    duplicate := 0
    mp := make(map[int32]struct{}, n)
    var r int32
    t := time.Now()
    for i := 0; i < n; {
        r = rand.Int31()
        if i&1 == 0 {
            r = -r
        }
        if _, ok := mp[r]; ok {
            duplicate++
        } else {
            mp[r] = zero
            i++
        }
    }
    fmt.Println(time.Since(t))
    fmt.Println("len: ", len(mp))
    fmt.Println("duplicate: ", duplicate)
    positive := 0
    for k := range mp {
        if k > 0 {
            positive++
        }
    }
    fmt.Println(`n=`, n, `positive=`, positive)
}

var zero = struct{}{}

output:

296.0169ms
len:  1000000
duplicate:  118
n= 1000000 positive= 500000

2- Just fill the map[int32]struct{}:

for i := int32(0); i < n; i++ {
        m[i] = zero
}

When reading it is not in order in Go:

for k := range m {
    fmt.Print(k, " ")
}

And this just takes 183ms for 1000000 unique numbers, no duplicate (The Go Playground):

package main

import (
    "fmt"
    "time"
)

func main() {
    const n = 1000000
    m := make(map[int32]struct{}, n)
    t := time.Now()
    for i := int32(0); i < n; i++ {
        m[i] = zero
    }
    fmt.Println(time.Since(t))
    fmt.Println("len: ", len(m))
    //  for k := range m {
    //      fmt.Print(k, " ")
    //  }
}

var zero = struct{}{}

3- Here is the simple but slow (this takes 22s for 200000 unique numbers), so you may generate and save it to a file once:

package main

import "time"
import "fmt"
import "math/rand"

func main() {
    dup := 0
    t := time.Now()
    const n = 200000
    rand.Seed(time.Now().UTC().UnixNano())
    var a [n]int32
    var exist bool
    for i := 0; i < n; {
        r := rand.Int31()
        exist = false
        for j := 0; j < i; j++ {
            if a[j] == r {
                dup++
                fmt.Println(dup)
                exist = true
                break
            }
        }
        if !exist {
            a[i] = r
            i++
        }
    }
    fmt.Println(time.Since(t))
}
  • Thanks for the elaborate answer. Here's a few comments: Code snippet <1> is the same idea as @joshlf's answer, and thanks for benchmarking the result. Code snippet <2> is a cool hack that I wasn't aware of before. However, it doesn't suite my current use case, as in for any given fixed N, the array generated will always be the same, which sort of defeats the purpose of having a pseudo random number generator. Code snippet <3> inherits the same flaw as snippet <2> if I save it to the file, but only slower. – cookieisaac Oct 08 '16 at 18:44
  • @cookieisaac You're welcome, in `map[int32]struct{}` using the empty struct ``struct{}`` consumes zero bytes of memory, and is 30ms faster, see: http://dave.cheney.net/2014/03/25/the-empty-struct –  Oct 08 '16 at 19:17
0

Temporary workaround based on @joshlf's answer

type UniqueRand struct {
    generated   map[int]bool    //keeps track of
    rng         *rand.Rand      //underlying random number generator
    scope       int             //scope of number to be generated
}

//Generating unique rand less than N
//If N is less or equal to 0, the scope will be unlimited
//If N is greater than 0, it will generate (-scope, +scope)
//If no more unique number can be generated, it will return -1 forwards
func NewUniqueRand(N int) *UniqueRand{
    s1 := rand.NewSource(time.Now().UnixNano())
    r1 := rand.New(s1)
    return &UniqueRand{
        generated: map[int]bool{},
        rng:        r1,
        scope:      N,
    }
}

func (u *UniqueRand) Int() int {
    if u.scope > 0 && len(u.generated) >= u.scope {
        return -1
    }
    for {
        var i int
        if u.scope > 0 {
            i = u.rng.Int() % u.scope
        }else{
            i = u.rng.Int()
        }
        if !u.generated[i] {
            u.generated[i] = true
            return i
        }
    }
}

Client side code

func TestSetGet2(t *testing.T) {
    const N = 10000
    for _, mask := range []int{0, -1, 0x555555, 0xaaaaaa, 0x333333, 0xcccccc, 0x314159} {
        rng := NewUniqueRand(2*N)
        a := make([]int, N)
        for i := 0; i < N; i++ {
            a[i] = (rng.Int() ^ mask) << 1
        }

        //Benchmark Code
    }
}
cookieisaac
  • 1,119
  • 4
  • 14
  • 33
0

I see two reasons for wanting this. You want to test a random number generator, or you want unique random numbers.

You're Testing A Random Number Generator

My first question is why? There's plenty of solid random number generators available. Don't write your own, it's basically dabbling in cryptography and that's never a good idea. Maybe you're testing a system that uses a random number generator to generate random output?

There's a problem: there's no guarantee random numbers are unique. They're random. There's always a possibility of collision. Testing that random output is unique is incorrect.

Instead, you want to test the results are distributed evenly. To do this I'll reference another answer about how to test a random number generator.

You Want Unique Random Numbers

From a practical perspective you don't need guaranteed uniqueness, but to make collisions so unlikely that it's not a concern. This is what UUIDs are for. They're 128 bit Universally Unique IDentifiers. There's a number of ways to generate them for particular scenarios.

UUIDv4 is basically just a 122 bit random number which has some ungodly small chance of a collision. Let's approximate it.

n = how many random numbers you'll generate
M = size of the keyspace (2^122 for a 122 bit random number)
P = probability of collision

P = n^2/2M

Solving for n...

n = sqrt(2MP)

Setting P to something absurd like 1e-12 (one in a trillion), we find you can generate about 3.2 trillion UUIDv4s with a 1 in a trillion chance of collision. You're 1000 times more likely to win the lottery than have a collision in 3.2 trillion UUIDv4s. I think that's acceptable.

Here's a UUIDv4 library in Go to use and a demonstration of generating 1 million unique random 128 bit values.

package main

import (
    "fmt"
    "github.com/frankenbeanies/uuid4"
)

func main() {
    for i := 0; i <= 1000000; i++ {
        uuid := uuid4.New().Bytes()

        // use the uuid
    }
}
Schwern
  • 127,817
  • 21
  • 150
  • 290
  • Neither was the reason when I asked the question. The use case is that I improved a previous B+ tree delete/insertion algorithm for input with no duplicates. I want to make claims such as 'This algorithm improves previous version by X% for case Y at size Z'. Different benchmark case scenarios includes sequential and random. To construct the tree for the testing as well as to construct the input data set, I need a stream of "random" number that have no duplicates. – cookieisaac May 29 '18 at 16:57
  • @cookieisaac The point still stands. You don't need all the extra work and memory to guarantee uniqueness, you just need all but guaranteed uniqueness. `math.Rand.Int63` producing 1 million numbers has a roughly 1 in 18 million chance of producing a duplicate. That's [about 5 sigma or the likelihood of dying on aircraft flight](https://en.wikipedia.org/wiki/Orders_of_magnitude_(probability)). For the purposes of a benchmark, this is fine. If you use `crypto/rand` you can make it even more unlikely. – Schwern May 29 '18 at 18:52
  • That project was wrapped up a while ago, but I remember there was some requirement stopping me from using `Int63` and have to use `Int` (32-bit). With `int`(32), I always (for some reason) run into duplicates even for 100K samples and crash the algorithm. With increased integer bit size, it surely works in favor for dodging the collision, but unnecessarily doubled the memory footprint for the tree. So I resort to sample a million unique numbers from int32 range `[-2,147,483,648 to 2,147,483,647]` – cookieisaac May 31 '18 at 18:11
  • @cookieisaac Ahh, if you're restricted to just 32 bit numbers then yes, generating 1 million 32 bit numbers is guaranteed to have a collision. There's about a 50% chance at about 70,000. – Schwern May 31 '18 at 18:38
0

I'm typing this on a phone, so please forgive the lack of code or incorrect formatting.

This is how I'd do it:

Generate the list of ordered unique numbers.

Choose any two random indices and swap their elements.

Continue swapping for a certain number of iterations.

The slice you're left with is your random unique list.

Notes:

It's simple and memory use is proportional to size

The list can be generated and randomised at any time, even pre-compile using go generate

When you want a number, you get the next element in the list.

You have full control of its properties.

Carl
  • 39,407
  • 10
  • 74
  • 99
0

you can generate a unique random number with len(12) using UnixNano in golang time package :

uniqueNumber:=time.Now().UnixNano()/(1<<22)
println(uniqueNumber)

it's always random :D

benyamin
  • 327
  • 3
  • 13