Gentoo Archives: gentoo-user

From: Kai Krakow <hurikhan77@×××××.com>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Re: Re: Re: OT: Mapping random numbers (PRNG)
Date: Sun, 29 Jun 2014 12:46:33
Message-Id: rvd58b-icj.ln1@hurikhan77.spdns.de
In Reply to: Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG) by Matti Nykyri
1 Matti Nykyri <matti.nykyri@×××.fi> schrieb:
2
3 > On Jun 29, 2014, at 0:28, Kai Krakow <hurikhan77@×××××.com> wrote:
4 >>
5 >> Matti Nykyri <matti.nykyri@×××.fi> schrieb:
6 >>
7 >>>> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@×××××.com> wrote:
8 >>>>
9 >>>> Matti Nykyri <Matti.Nykyri@×××.fi> schrieb:
10 >>>>
11 >>>>> If you are looking a mathematically perfect solution there is a simple
12 >>>>> one even if your list is not in the power of 2! Take 6 bits at a time
13 >>>>> of the random data. If the result is 62 or 63 you will discard the
14 >>>>> data and get the next 6 bits. This selectively modifies the random
15 >>>>> data but keeps the probabilities in correct balance. Now the
16 >>>>> probability for index of 0-61 is 1/62 because the probability to get
17 >>>>> 62-63 out of 64 if 0.
18 >>>>
19 >>>> Why not do just something like this?
20 >>>>
21 >>>> index = 0;
22 >>>> while (true) {
23 >>>> index = (index + get_6bit_random()) % 62;
24 >>>> output << char_array[index];
25 >>>> }
26 >>>>
27 >>>> Done, no bits wasted. Should have perfect distribution also. We also
28 >>>> don't have to throw away random data just to stay within unaligned
29 >>>> boundaries. The unalignment is being taken over into the next loop so
30 >>>> the "error" corrects itself over time (it becomes distributed over the
31 >>>> whole set).
32 >>>
33 >>> Distribution will not be perfect. The same original problem persists.
34 >>> Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be
35 >>> 1/64. Now the addition changes this so that index 0 to 1 reflects to
36 >>> previous character and not the original index.
37 >>>
38 >>> The distribution of like 10GB of data should be quite even but not on a
39 >>> small scale. The next char will depend on previous char. It is 100% more
40 >>> likely that the next char is the same or one index above the previous
41 >>> char then any of the other ones in the series. So it is likely that you
42 >>> will have long sets of same character.
43 >>
44 >> I cannot follow your reasoning here - but I'd like to learn. Actually, I
45 >> ran this multiple times and never saw long sets of the same character,
46 >> even no short sets of the same character. The 0 or 1 is always rolled
47 >> over into the next random addition. I would only get sets of the same
48 >> character if rand() returned zero multiple times after each other - which
49 >> wouldn't be really random. ;-)
50 >
51 > In your example that isn't true. You will get the same character if 6bit
52 > random number is 0 or if it is 62! This is what makes the flaw!
53 >
54 > You will also get the next character if random number is 1 or 63.
55 >
56 > That is why the possibility for 0 and 1 (after modulo 62) is twice as
57 > large compared to all other values (2-61).
58
59 Ah, now I get it.
60
61 > By definition random means that the probability for every value should be
62 > the same. So if you have 62 options and even distribution of probability
63 > the probability for each of them is 1/62.
64
65 Still, the increased probability for single elements should hit different
66 elements each time. So for large sets it will distribute - however, I now
67 get why it's not completely random by definition.
68
69 >> In my tests I counted how ofter new_index > index and new_index < index,
70 >> and it had a clear bias for the first. So I added swapping of the
71 >> selected index with offset=0 in the set. Now the characters will be
72 >> swapped and start to distribute that flaw. The distribution, however,
73 >> didn't change.
74 >
75 > Try counting how of often new_index = index and new_index = (index + 1) %
76 > 62 and new_index = (index + 2) % 62. With your algorithm the last one
77 > should be significantly less then the first two in large sample.
78
79 I will try that. It looks like a good approach.
80
81 --
82 Replies to list only preferred.

Replies

Subject Author
Re: [gentoo-user] Re: Re: Re: OT: Mapping random numbers (PRNG) Matti Nykyri <Matti.Nykyri@×××.fi>