Gentoo Archives: gentoo-user

From: Matti Nykyri <matti.nykyri@×××.fi>
To: "gentoo-user@l.g.o" <gentoo-user@l.g.o>
Subject: Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG)
Date: Sat, 28 Jun 2014 13:08:27
Message-Id: 8BE9EEAF-9909-489B-9219-5EDA6D7224C0@iki.fi
In Reply to: Re: [gentoo-user] Re: OT: Mapping random numbers (PRNG) by Matti Nykyri
1 On Jun 28, 2014, at 0:13, Matti Nykyri <matti.nykyri@×××.fi> wrote:
2
3 >> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@×××××.com> wrote:
4 >>
5 >> Matti Nykyri <Matti.Nykyri@×××.fi> schrieb:
6 >>
7 >>> If you are looking a mathematically perfect solution there is a simple
8 >>> one even if your list is not in the power of 2! Take 6 bits at a time of
9 >>> the random data. If the result is 62 or 63 you will discard the data and
10 >>> get the next 6 bits. This selectively modifies the random data but keeps
11 >>> the probabilities in correct balance. Now the probability for index of
12 >>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
13 >>
14 >> Why not do just something like this?
15 >>
16 >> index = 0;
17 >> while (true) {
18 >> index = (index + get_6bit_random()) % 62;
19 >> output << char_array[index];
20 >> }
21 >>
22 >> Done, no bits wasted. Should have perfect distribution also. We also don't
23 >> have to throw away random data just to stay within unaligned boundaries. The
24 >> unalignment is being taken over into the next loop so the "error" corrects
25 >> itself over time (it becomes distributed over the whole set).
26 >
27 > Distribution will not be perfect. The same original problem persists. Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64. Now the addition changes this so that index 0 to 1 reflects to previous character and not the original index.
28 >
29 > The distribution of like 10GB of data should be quite even but not on a small scale. The next char will depend on previous char. It is 100% more likely that the next char is the same or one index above the previous char then any of the other ones in the series. So it is likely that you will have long sets of same character.
30 >
31 > Random means that for next char the probability is always even, 1/62. And like mentioned in Dilbert it is impossible to say that something is random but possible to say that it isn't.
32 >
33 > If wasting 6bit of data seems large, do this:
34 >
35 > index = get_6bit_random();
36 > while (index > 61) {
37 > index <<= 1;
38 > index |= get_1bit_random();
39 > index &= 0x3F;
40 > }
41 > return index;
42 >
43 > It will waste 1 bit at a time until result is less than 62. This will slightly change probabilities though :/
44
45 Sorry this example is really flawed :( If next6bit is over 61 there are only two possible values for it: 62 or 63 -> that is 0x3E and 0x3F. So you see that only one bit changes. But that bit is random! So least significant bit is random and does not need to be discarded :)
46
47 index = get_6bit_random();
48 while (index > 61) {
49 index <<= 5;
50 index |= get_5bit_random();
51 index &= 0x3F;
52 }
53 return index;