1 |
On Tue, Apr 25, 2006 at 09:28:47PM -0700, Duncan wrote: |
2 |
|
3 |
> One, the 3.x series simply wasn't designed with AMD64 as a major arch; the |
4 |
> support for AMD64 in 3.x was in many ways simply bolted on, and it showed. |
5 |
> GCC3 simply wasn't designed to be able to take full advantage of the |
6 |
> optimizations possible for AMD64, as opposed to x86. The rewrite for 4.0 |
7 |
> was done with AMD64 in mind, and much better optimizations became possible. |
8 |
|
9 |
As I figured after reading some of the optimization tree stuff. I can imagine |
10 |
there were a bunch of AMD64 optimizations coded as a mish-mash of conditional |
11 |
statements embedded in the existing x86 code. No good. |
12 |
|
13 |
> Two, the reorganization for 4.x gave GCC a much better organized and more |
14 |
> modular hierarchy in general -- one where it is possible to optimize to |
15 |
> greater efficiency because all that spaghetti code that was 3.x is gone, |
16 |
> and it's now far easier for dependant optimizations to be made up and down |
17 |
> the hierarchical chain without risking a serious miscompile regression due |
18 |
> to all the spaghetti code that 3.x had become. That's of course across all |
19 |
> archs, but it made optimizing for AMD64 that much easier, as it was no |
20 |
> longer treated as a special case of x86 in terms of branches off that |
21 |
> spaghetti code. (IOW, there will be improvements for x86 as well, but |
22 |
> they won't be quite as dramatic, both because it was already quite |
23 |
> optimized, and because it was designed in as a major target, while AMD64 |
24 |
> was bolted on, from the GCC3 perspectiive.) |
25 |
> |
26 |
> Of course (and this is point three) the goal for 4.0 was simply to clean |
27 |
> out the spaghetti code and get the rewrite and new framework in place with |
28 |
> as few regressions (both in optimization and in downright miscompiles) as |
29 |
> possible. As such, it didn't advance the concept or optimization much |
30 |
> anyway, because that wasn't the goal, and any such changes intruduced for |
31 |
> 4.0 just complicated the verification process, in terms of ensuring there |
32 |
> were no serious regressions, which /was/ the goal. In that regard, 4.1 is |
33 |
> the 4.x series finally coming into its own. The improvements made |
34 |
> possible by the overall rearchitecting in 4.0 finally begin to appear in |
35 |
> 4.1. The promise of 4.x is now delivered. |
36 |
> |
37 |
> Together, those three points mean a HUGE step for GCC's AMD64 support, 3.x |
38 |
> to 4.1.x. It's the first time it has been possible, and the differences |
39 |
> really /are/ noticeable. |
40 |
|
41 |
I'm actually thinking more along the lines of the Cell processor and similar |
42 |
technologies. My general notion is that the hardware designers want to shift |
43 |
some of the burden of "keeping up with Moore's law" off to the compilers. In |
44 |
that light it'd make sense that the GCC developers would want to shift |
45 |
optimizations higher up in the analysis of source. |
46 |
|
47 |
> (Recall my earlier posting to the effect that xorg's composite rendering, |
48 |
> with xorg-7.0 (modular-X), as compiled by gcc-4.1, is actually practical |
49 |
> now -- it doesn't slow down the system to the point of unusability. BTW, |
50 |
> while I'm not running xorg-7.1 due to stability issues this early in the |
51 |
> release cycle, I played with it a bit, and the improvements to EXA to the |
52 |
> point that it can replace XAA are dramatic! Configuring 2D rendering to |
53 |
> use EXA on xorg-7.1, there is now virtually /zero/, that's right, /zero/ |
54 |
> additional CPU cost, to turning composite on! I was literally ASTOUNDED! |
55 |
> I couldn't have imagined it possible! The significance in terms of |
56 |
> bringing transparency and etc to the X desktop is tremendous! I had |
57 |
> thought that there'd always be an additional cost, and that only those |
58 |
> with the latest video cards (and slaveryware drivers) and just being |
59 |
> introduced CPUs would be able to run with the bells and whistles turned |
60 |
> on, and that we'd have to grow into it, but I was apparently and happily |
61 |
> very very wrong! At least for those with Radeon 92xx series cards -- I've |
62 |
> a 9250 -- even running merged framebuffer with dual 1600x1200 monitors |
63 |
> resolution, the thing had such a low CPU cost that I literally couldn't |
64 |
> tell the difference, either in responsiveness or in the CPU activity |
65 |
> graphs, between composite with all the goodies on, and composite toggled |
66 |
> off altogether. As I said, I couldn't have dreamed that was technically |
67 |
> possible! Of course, that's compiling with gcc-4.1.0. How it works when |
68 |
> compiled with 3.4.6, I really don't know, nor am I eager to personally |
69 |
> find out, tho I'm certainly open to reading the experiences of others.) |
70 |
|
71 |
I can definitely believe it. I saw that video of those two developers giving the |
72 |
"wobbly window" demonstration. They were using some old Thinkpads as their |
73 |
development platform. I think one of them had crummy on-board graphics, barely |
74 |
supporting 3d, with 32MB of GPU RAM. Its funny how Microsoft is pounding the IT |
75 |
sector with information about Vista Capable and the like. Drumming up PC sales |
76 |
while xorg goes GL and proves you don't need a Vista capable sticker to get the |
77 |
same sort of eye-candy effects. What kind of programmers do they have over at |
78 |
Redmond anyway?! |
79 |
|
80 |
> Back to GCC. Looking forward, I see a number of additional significant |
81 |
> improvements marked out for gcc 4.2 and 4.3. With the now clean code and |
82 |
> modular framework of 4.x, its promise of making additional optimizations |
83 |
> (and compiling speed improvements, lets not forget them) possible |
84 |
> continues to be delivered. However, from 4.1, the improvements for AMD64 |
85 |
> will probably simply be incremental once again, because 4.1 is where a |
86 |
> reasonably optimized gcc for amd64 was finally delivered. It's the giant |
87 |
> step. Beyond that, improvements will continue, but should be much smaller |
88 |
> in comparison. |
89 |
|
90 |
You've inspired me. I'll see about getting gcc 4.1 running on my laptop AND |
91 |
finally learning how to correctly slot packages. To think I've gone this long |
92 |
without properly reading the documentation. *DUCK* |
93 |
|
94 |
Thanks for all the details Duncan. I appreciate it. |
95 |
|
96 |
> As for specific CFLAGS/CXXFLAGS, I posted mine with a fairly detailed |
97 |
> explanation of why I chose them, probably about a month to six weeks ago |
98 |
> (as a followon to that xorg 7.0 post mentioned above). I'd suggest looking |
99 |
> it up in the archives if you want the details, and the bit of further |
100 |
> discussion that followed. I'll repeat here briefly. |
101 |
|
102 |
Again thank you. |
103 |
|
104 |
Brandon Edens |