1 |
On Thu, Jun 05, 2014 at 10:58:51PM -0500, Canek Peláez Valdés wrote: |
2 |
> On Thu, Jun 5, 2014 at 9:56 PM, <meino.cramer@×××.de> wrote: |
3 |
> > Hi, |
4 |
> > |
5 |
> > I am experimenting with the C code of the ISAAC pseudo random number generator |
6 |
> > (http://burtleburtle.net/bob/rand/isaacafa.html). |
7 |
> > |
8 |
> > Currently the implementation creates (on my embedded linux) 32 bit |
9 |
> > hexadecimal output. |
10 |
> |
11 |
> So it's a 32 bit integer. |
12 |
> |
13 |
> > From this I want to create random numbers in the range of [a-Za-z0-9] |
14 |
> > *without violating randomness* and (if possible) without throwing |
15 |
> > away bits of the output. |
16 |
> |
17 |
> You mean *characters* int the range [A-Za-z0-9]? |
18 |
|
19 |
Well this isn't as simple problem as it sounds. A random 32 bit integer |
20 |
has 32 bits of randomness. If you take a divison reminder of 62 from this |
21 |
integer you will get only 5,95419631039 bits of randomness |
22 |
(log(62)/log(2)). So you are wasting 81,4% of your random data. Which is |
23 |
quite much and usually random data is quite expensive. You can save your |
24 |
precious random data by taking only 6 bit from your 32 bit integer and |
25 |
dividing it by 62. Then you will be wasting only 0,8% of random data. |
26 |
Another problem is alignment, but that is about mathematical correctness. |
27 |
|
28 |
> > How can I do this mathemtically (in concern of the quality of output) |
29 |
> > correct? |
30 |
> |
31 |
> The easiest thing to do would be: |
32 |
|
33 |
The easiest is not mathematically correct though. Random data will stay |
34 |
random only if you select and modify it so that randomness is preserved. |
35 |
If you take devison reminder of 62 from 32 bit integer there are 69 273 |
36 |
667 possibilities of the reminder to be 3 or less. For the reminder to 4 |
37 |
or more the number of possibilities is 69 273 666. In mathematically |
38 |
ideal case the probability for every index of the list should be same: |
39 |
1/62 = 1,61290322581%. But the modulo 62 modifies this probability: for |
40 |
index 0-3 the probability is 69 273 667/2^32 = 1,61290324759%. And for |
41 |
indexes 4-61 the probability will be 69 273 666/2^32 = 1,6129032243%. |
42 |
|
43 |
If you wish not to waste those random bits the probabilities will get |
44 |
worse. With 6 bits of random the probability for index 0-1 will be 2/64 |
45 |
and for 2-63 it will be 1/64. This is a very significant change because |
46 |
first and second index will appear twice as much as the rest. If you add |
47 |
2 characters to your list you will perfect alignment and you can take 6 |
48 |
bits of data without it modifying probabilities. |
49 |
|
50 |
If you are looking a mathematically perfect solution there is a simple |
51 |
one even if your list is not in the power of 2! Take 6 bits at a time of |
52 |
the random data. If the result is 62 or 63 you will discard the data and |
53 |
get the next 6 bits. This selectively modifies the random data but keeps |
54 |
the probabilities in correct balance. Now the probability for index of |
55 |
0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. |
56 |
|
57 |
> ------------------------------------------------------------------------------- |
58 |
> #include <time.h> |
59 |
> #include <stdio.h> |
60 |
> #include <stdlib.h> |
61 |
> |
62 |
> #define N (26+26+10) |
63 |
> |
64 |
> static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', |
65 |
> 'K', 'L', 'M', |
66 |
> 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', |
67 |
> 'X', 'Y', 'Z', |
68 |
> 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', |
69 |
> 'k', 'l', 'm', |
70 |
> 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', |
71 |
> 'x', 'y', 'z', |
72 |
> '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; |
73 |
> |
74 |
> int |
75 |
> next_character() |
76 |
> { |
77 |
> // Use the correct call for ISAAC instead of rand() |
78 |
> unsigned int idx = rand() % N; |
79 |
> return S[idx]; |
80 |
> } |
81 |
|
82 |
so modify the next_char function: |
83 |
|
84 |
char next_character() |
85 |
{ |
86 |
static unsigned int rand = 0; //(sizeof(int) = 32) |
87 |
static char bit_avail = 0; |
88 |
char result = 0; |
89 |
char move_bits = 0; |
90 |
char bits_moved = 0; |
91 |
|
92 |
do { |
93 |
if (!bits_avail) { |
94 |
// Use the correct call for ISAAC instead of rand() |
95 |
rand = rand(); |
96 |
|
97 |
bit_avail = 32; |
98 |
} |
99 |
|
100 |
move_bits = bits_avail >= 6 ? 6 : bits_avail; |
101 |
result <<= move_bits; |
102 |
result = (result | rand & (0xFF >> (8 - move_bits))) & 0x3F; |
103 |
bits_avail -= move_bits; |
104 |
bits_moved += move_bits; |
105 |
rand >>= move_bits; |
106 |
|
107 |
} while (bits_moved != 6 && result > 61); |
108 |
|
109 |
return result; |
110 |
} |
111 |
|
112 |
This function will give perfect distribution of 1/62 probability for |
113 |
every index. It will waste 6 bits with the probability of 1/32 (2/64). |
114 |
|
115 |
> int |
116 |
> main(int argc, char* argv[]) |
117 |
> { |
118 |
> // Use the correct call for initializing the ISAAC seed |
119 |
> srand((unsigned int)time(NULL)); |
120 |
> for (int i = 0; i < 20; i++) // --std=c99 |
121 |
> printf("%c\n", next_character()); |
122 |
> return 0; |
123 |
> } |
124 |
> ------------------------------------------------------------------------------- |
125 |
> |
126 |
> If the ISAAC RNG has a good distribution, then the next_character() |
127 |
> function will give a good distribution among the set [A-Za-z0-9]. |
128 |
> |
129 |
> Unless I missunderstood what you meant with "create random numbers in |
130 |
> the range of [a-Za-z0-9]". |
131 |
|
132 |
-- |
133 |
-Matti |