Gentoo Archives: gentoo-user

From: Matti Nykyri <matti.nykyri@×××.fi>
To: "gentoo-user@l.g.o" <gentoo-user@l.g.o>
Subject: Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG)
Date: Sun, 29 Jun 2014 06:25:39
Message-Id: BED5D48C-BEBE-4F26-8843-0897E93FB67F@iki.fi
In Reply to: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG) by Kai Krakow
1 On Jun 29, 2014, at 0:28, Kai Krakow <hurikhan77@×××××.com> wrote:
2 >
3 > Matti Nykyri <matti.nykyri@×××.fi> schrieb:
4 >
5 >>> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@×××××.com> wrote:
6 >>>
7 >>> Matti Nykyri <Matti.Nykyri@×××.fi> schrieb:
8 >>>
9 >>>> If you are looking a mathematically perfect solution there is a simple
10 >>>> one even if your list is not in the power of 2! Take 6 bits at a time of
11 >>>> the random data. If the result is 62 or 63 you will discard the data and
12 >>>> get the next 6 bits. This selectively modifies the random data but keeps
13 >>>> the probabilities in correct balance. Now the probability for index of
14 >>>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
15 >>>
16 >>> Why not do just something like this?
17 >>>
18 >>> index = 0;
19 >>> while (true) {
20 >>> index = (index + get_6bit_random()) % 62;
21 >>> output << char_array[index];
22 >>> }
23 >>>
24 >>> Done, no bits wasted. Should have perfect distribution also. We also
25 >>> don't have to throw away random data just to stay within unaligned
26 >>> boundaries. The unalignment is being taken over into the next loop so the
27 >>> "error" corrects itself over time (it becomes distributed over the whole
28 >>> set).
29 >>
30 >> Distribution will not be perfect. The same original problem persists.
31 >> Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be 1/64.
32 >> Now the addition changes this so that index 0 to 1 reflects to previous
33 >> character and not the original index.
34 >>
35 >> The distribution of like 10GB of data should be quite even but not on a
36 >> small scale. The next char will depend on previous char. It is 100% more
37 >> likely that the next char is the same or one index above the previous char
38 >> then any of the other ones in the series. So it is likely that you will
39 >> have long sets of same character.
40 >
41 > I cannot follow your reasoning here - but I'd like to learn. Actually, I ran
42 > this multiple times and never saw long sets of the same character, even no
43 > short sets of the same character. The 0 or 1 is always rolled over into the
44 > next random addition. I would only get sets of the same character if rand()
45 > returned zero multiple times after each other - which wouldn't be really
46 > random. ;-)
47
48 In your example that isn't true. You will get the same character if 6bit random number is 0 or if it is 62! This is what makes the flaw!
49
50 You will also get the next character if random number is 1 or 63.
51
52 That is why the possibility for 0 and 1 (after modulo 62) is twice as large compared to all other values (2-61).
53
54 By definition random means that the probability for every value should be the same. So if you have 62 options and even distribution of probability the probability for each of them is 1/62.
55
56 > Keep in mind: The last index will be reused whenever you'd enter the
57 > function - it won't reset to zero. But still that primitive implementation
58 > had a flaw: It will tend to select characters beyond the current offset, if
59 > it is >= 1/2 into the complete set, otherwise it will prefer selecting
60 > characters before the offset.
61
62 If you modify the sequence so that if looks random it is pseudo random.
63
64 > In my tests I counted how ofter new_index > index and new_index < index, and
65 > it had a clear bias for the first. So I added swapping of the selected index
66 > with offset=0 in the set. Now the characters will be swapped and start to
67 > distribute that flaw. The distribution, however, didn't change.
68
69 Try counting how of often new_index = index and new_index = (index + 1) % 62 and new_index = (index + 2) % 62. With your algorithm the last one should be significantly less then the first two in large sample.
70
71 > Of course I'm no mathematician, I don't know how I'd calculate the
72 > probabilities for my implementation because it is sort of a recursive
73 > function (for get_rand()) when looking at it over time:
74 >
75 > int get_rand() {
76 > static int index = 0;
77 > return (index = (index + get_6bit_rand()) % 62);
78 > }
79 >
80 > char get_char() {
81 > int index = get_rand();
82 > char tmp = chars[index];
83 > chars[index] = chars[0];
84 > return (chars[0] = tmp);
85 > }
86 >
87 > However, get_char() should return evenly distributes results.
88 >
89 > What this shows, is, that while distribution is even among the result set,
90 > the implementation may still be flawed because results could be predictable
91 > for a subset of results. Or in other words: Simply looking at the
92 > distribution of results is not an indicator for randomness. I could change
93 > get_rand() in the following way:
94 >
95 > int get_rand() {
96 > static int index = 0;
97 > return (index = (index + 1) % 62);
98 > }
99 >
100 > Results would be distributed even, but clearly it is not random.
101 >
102 > --
103 > Replies to list only preferred.
104 >
105 >

Replies

Subject Author
[gentoo-user] Re: Re: Re: OT: Mapping random numbers (PRNG) Kai Krakow <hurikhan77@×××××.com>