1 |
Having 16 (well actually only 12 according to the SBCL hackers) general purpose |
2 |
registers (AMD64) versus x86's 8 GPRs is reason enough for me to run AMD64. |
3 |
|
4 |
I believe the major performance bottle-neck in modern computers is memory access |
5 |
latencies and the like. Not just RAM btw, I'm talking about the memory hierarchy: |
6 |
register -> cache -> RAM -> hard-drive |
7 |
(ordered by speed of access: left being fastest, right being slowest) |
8 |
|
9 |
Processor calculation speeds have improved at a faster rate than memory access |
10 |
speeds. Your processor is essentially just sitting there idling as it waits for |
11 |
instructions to be fed to it (starved). Having more registers means you have a |
12 |
larger scratch-pad to perform calculations on. If you had 2 registers and you |
13 |
wanted to add (w + x) - (y + z). You'd do something like this: |
14 |
|
15 |
; (w + x) |
16 |
load w to register1 |
17 |
load x to register2 |
18 |
add register1 and register2 then store result to register1 |
19 |
save register 1 to memory cell ABCD |
20 |
|
21 |
; (y + z) |
22 |
load y to register1 |
23 |
load z to register2 |
24 |
add register1 and register2 then store result to register1 |
25 |
|
26 |
; (w + x) - (y + z) |
27 |
load contents of memory cell ABCD to register2 |
28 |
subtract register1 from register2 then store result to register1 |
29 |
|
30 |
So if I've done everything correctly I now have the result in register1 Now |
31 |
suppose you had 8 registers instead of just 2. Now you could do something like |
32 |
the following. |
33 |
|
34 |
load w to register1 |
35 |
load x to register2 |
36 |
load y to register3 |
37 |
load z to register4 |
38 |
add register1 and register2 store the result in register5 |
39 |
add register3 and register4 store the result in register6 |
40 |
subtract register6 from register5 and store the result in register7 |
41 |
|
42 |
and we still have one register that wasn't used. |
43 |
|
44 |
|
45 |
The first example? |
46 |
5 loads |
47 |
1 save |
48 |
|
49 |
The second example? |
50 |
4 loads |
51 |
|
52 |
Even though the additional save and load is going to cache memory (hopefully), |
53 |
it still is a performance penalty and those penalties will add up. |
54 |
|
55 |
This is a trivial example but it might give you an idea as to why registers are |
56 |
important. |
57 |
|
58 |
|
59 |
So to continue, I'd thought for awhile that AMD64 running in 32 bit mode might |
60 |
be using virtual or register windowing. So that when in 32 mode all of the 16 |
61 |
GPRs were being used by processes competing for the CPU. ie process 1 is using |
62 |
registers 0-7 and process 2 is using registers 8-15. Sadly, this is not the |
63 |
case. |
64 |
|
65 |
So knowing all this, you make the call. I'd use AMD64 because it is the future. |
66 |
Well that and that they could always use additional testers. |
67 |
|
68 |
When a 64 bit flash and proprietary codecs are released this will hopefully all |
69 |
be irrelevant. |
70 |
|
71 |
Brandon Edens |
72 |
|
73 |
|
74 |
On Mon, Apr 24, 2006 at 12:12:08PM -0300, Allan Spagnol Comar wrote: |
75 |
|
76 |
> Hi all, Thanks for all the advises !!! |
77 |
> |
78 |
> I will try run some binary 32 bits programs ( until now I was using |
79 |
> just source programs ) |
80 |
> |
81 |
> I got curious with chroot enviroments, is there any literature that I |
82 |
> can find ( like manuals and samples, I am newbee speaking of cross or |
83 |
> dual platforms ). |
84 |
> |
85 |
> Duncan, you real are a little radical but I like some of your |
86 |
> thoughts, one thing that make me curious it will be about the |
87 |
> advantages of using a gentoo x86 on a AMD64, once I would not be able |
88 |
> to make optimizations on compilation flags ? |
89 |
> |
90 |
> Thanks Again, Allan |
91 |
> -- |
92 |
> An application asked: |
93 |
> "Requeires Windows 9x, NT4 or better", |
94 |
> so I?ve installed Linux |
95 |
> |
96 |
> -- |
97 |
> gentoo-amd64@g.o mailing list |
98 |
> |