Gentoo Archives: gentoo-user

From:	Kai Krakow <hurikhan77@×××××.com>
To:	gentoo-user@l.g.o
Subject:	[gentoo-user] Re: Re: Re: OT: Mapping random numbers (PRNG)
Date:	Sun, 29 Jun 2014 12:46:33
Message-Id:	`rvd58b-icj.ln1@hurikhan77.spdns.de`
In Reply to:	Re: [gentoo-user] Re: Re: OT: Mapping random numbers (PRNG) by Matti Nykyri

1	Matti Nykyri <matti.nykyri@×××.fi> schrieb:
2
3	> On Jun 29, 2014, at 0:28, Kai Krakow <hurikhan77@×××××.com> wrote:
4	>>
5	>> Matti Nykyri <matti.nykyri@×××.fi> schrieb:
6	>>
7	>>>> On Jun 27, 2014, at 0:00, Kai Krakow <hurikhan77@×××××.com> wrote:
8	>>>>
9	>>>> Matti Nykyri <Matti.Nykyri@×××.fi> schrieb:
10	>>>>
11	>>>>> If you are looking a mathematically perfect solution there is a simple
12	>>>>> one even if your list is not in the power of 2! Take 6 bits at a time
13	>>>>> of the random data. If the result is 62 or 63 you will discard the
14	>>>>> data and get the next 6 bits. This selectively modifies the random
15	>>>>> data but keeps the probabilities in correct balance. Now the
16	>>>>> probability for index of 0-61 is 1/62 because the probability to get
17	>>>>> 62-63 out of 64 if 0.
18	>>>>
19	>>>> Why not do just something like this?
20	>>>>
21	>>>> index = 0;
22	>>>> while (true) {
23	>>>> index = (index + get_6bit_random()) % 62;
24	>>>> output << char_array[index];
25	>>>> }
26	>>>>
27	>>>> Done, no bits wasted. Should have perfect distribution also. We also
28	>>>> don't have to throw away random data just to stay within unaligned
29	>>>> boundaries. The unalignment is being taken over into the next loop so
30	>>>> the "error" corrects itself over time (it becomes distributed over the
31	>>>> whole set).
32	>>>
33	>>> Distribution will not be perfect. The same original problem persists.
34	>>> Probability for index 0 to 1 will be 2/64 and for 2 to 61 it will be
35	>>> 1/64. Now the addition changes this so that index 0 to 1 reflects to
36	>>> previous character and not the original index.
37	>>>
38	>>> The distribution of like 10GB of data should be quite even but not on a
39	>>> small scale. The next char will depend on previous char. It is 100% more
40	>>> likely that the next char is the same or one index above the previous
41	>>> char then any of the other ones in the series. So it is likely that you
42	>>> will have long sets of same character.
43	>>
44	>> I cannot follow your reasoning here - but I'd like to learn. Actually, I
45	>> ran this multiple times and never saw long sets of the same character,
46	>> even no short sets of the same character. The 0 or 1 is always rolled
47	>> over into the next random addition. I would only get sets of the same
48	>> character if rand() returned zero multiple times after each other - which
49	>> wouldn't be really random. ;-)
50	>
51	> In your example that isn't true. You will get the same character if 6bit
52	> random number is 0 or if it is 62! This is what makes the flaw!
53	>
54	> You will also get the next character if random number is 1 or 63.
55	>
56	> That is why the possibility for 0 and 1 (after modulo 62) is twice as
57	> large compared to all other values (2-61).
58
59	Ah, now I get it.
60
61	> By definition random means that the probability for every value should be
62	> the same. So if you have 62 options and even distribution of probability
63	> the probability for each of them is 1/62.
64
65	Still, the increased probability for single elements should hit different
66	elements each time. So for large sets it will distribute - however, I now
67	get why it's not completely random by definition.
68
69	>> In my tests I counted how ofter new_index > index and new_index < index,
70	>> and it had a clear bias for the first. So I added swapping of the
71	>> selected index with offset=0 in the set. Now the characters will be
72	>> swapped and start to distribute that flaw. The distribution, however,
73	>> didn't change.
74	>
75	> Try counting how of often new_index = index and new_index = (index + 1) %
76	> 62 and new_index = (index + 2) % 62. With your algorithm the last one
77	> should be significantly less then the first two in large sample.
78
79	I will try that. It looks like a good approach.
80
81	--
82	Replies to list only preferred.

Replies

Subject	Author
Re: [gentoo-user] Re: Re: Re: OT: Mapping random numbers (PRNG)	Matti Nykyri <Matti.Nykyri@×××.fi>

Report Message

Find on MARC Find on Google Groups