1 |
On 12/10/03 02:15:02, Robin H. Johnson wrote: |
2 |
> On Tue, Dec 09, 2003 at 11:17:23PM +0100, John Nilsson wrote: |
3 |
> > >Thats the express purpose that genflags was created for, to |
4 |
> provide |
5 |
> > >users with a known good set of high-performance CFLAGS so they |
6 |
> didn't |
7 |
> > >need to mess around with it too much. |
8 |
> > Still there is no room for improvement when dealing with system |
9 |
> wide |
10 |
> |
11 |
> > optimization. |
12 |
> Your wording here is unclear, I think you mean to say that there IS |
13 |
> room for |
14 |
> improvement over system wide constant CFLAGS? |
15 |
|
16 |
English is not my native language. Writing this GLEP made me realize I |
17 |
have problems expressing my thoughts in ANY language =). If this GLEP |
18 |
is going to survive, comments on formulations and wordings (and |
19 |
spelling) is greatly appreciated. |
20 |
|
21 |
Yes I meant that it is next to impossible to find a system wide |
22 |
optimization beyond "-march=<arch> -O2 -pipe" fit for the majority of |
23 |
users. |
24 |
|
25 |
|
26 |
> > >strip-flags to remove problematic flags on a per ebuild basis is |
27 |
> the |
28 |
> > >best solution. I do agree that unstable gcc settings are a big |
29 |
> > >problem, eg in a recent bug it turned out the submitter's system |
30 |
> (an |
31 |
> > >older Pentium I) couldn't handle -O3 without flaking out. Reduce |
32 |
> it |
33 |
> > >to -O2 and the box went fine (both for compiling and already |
34 |
> compiled |
35 |
> > >packages). |
36 |
> > This is a bug in GCC. While a workaround may be a quick solution |
37 |
> for |
38 |
> |
39 |
> > Gentoo, one shouldn't base the whole system on bugs. |
40 |
> No it isn't a bug in GCC, it's a bug with the user's specific |
41 |
> hardware. |
42 |
> I have an older Pentium I system that runs just fine with -O3 and the |
43 |
> user's specified CFLAGS. I didn't force everybody to use -O2, I just |
44 |
> got |
45 |
> that user to change his own system down to -O2. |
46 |
|
47 |
I miss understood you. Still this is a bug in a specific cpu. You cant |
48 |
guarantee stability in any case if the hardware is broken. |
49 |
|
50 |
> > >again, genflags was created for this. I've considered a sequel to |
51 |
> > >genflags based on the genetic optimization of compiler flags as |
52 |
> > >mentioned on Slashdot a while ago, but for lack of time, i'm not |
53 |
> even |
54 |
> > >looking at doing it now. |
55 |
> > You might want to chek: |
56 |
> > http://www.coyotegulch.com/potential/gccga/gccga.html |
57 |
> This is the original item I was referencing, but you still run into |
58 |
> the |
59 |
> problem that you need to run things on a system basis to get |
60 |
> effective |
61 |
> results. |
62 |
|
63 |
Yeah, I had the page open when I read your mail so I though I'd spare |
64 |
you the trouble of looking it up =) |
65 |
|
66 |
> http://www.coyotegulch.com/acovea/index.html is the rest of the |
67 |
> article, |
68 |
> |
69 |
> > http://www.rocklinux.net/packages/ccbench.html |
70 |
> This basically brute forces the genetic algorithms, with absolutely |
71 |
> no |
72 |
> thought as to the net effects on the results of the given flags, eg, |
73 |
> on |
74 |
> my home server (an AthlonXP 2400+), it returns these results: |
75 |
> gcc -O3 -march=athlon -fomit-frame-pointer -funroll-loops |
76 |
> -frerun-loop-opt -funroll-all-loops -fschedule-insns |
77 |
> |
78 |
> Of that, '-frerun-loop-opt' and '-fschedule-insns' are redundant as |
79 |
> they |
80 |
> are implied by -O3. |
81 |
> |
82 |
> -fomit-frame-pointer and I can't debug code properly anymore, and if |
83 |
> I |
84 |
> try to use -funroll-all-loops to compile mysql, even with it's |
85 |
> --with-low-memory option, gcc wants 600mb of memory to compile it's |
86 |
> sql_yacc.cc. |
87 |
|
88 |
I had the same reaction. ccbench was what made me realize that any kind |
89 |
of systemwide optimization is only guesswork (often bad such). |
90 |
|
91 |
|
92 |
|
93 |
> > I meant by evolution: the process of users submiting patches to |
94 |
> improve |
95 |
> > individual ebuilds. |
96 |
> What improves the performance of a given application on one machine |
97 |
> does |
98 |
> NOT nessicary improve it on another machine. |
99 |
|
100 |
True, but you would have much better situation to test that fact, then |
101 |
what wa have now. |
102 |
|
103 |
|
104 |
> Read the gcc manpage and see: |
105 |
> -fprofile-arcs |
106 |
> -fbranch-probabilities |
107 |
> (also read http://gcc.gnu.org/news/profiledriven.html) |
108 |
> |
109 |
> Just adding these to ccbench doubles the amount of time taken to |
110 |
> test (as you must compile with -fprofile-arcs, run, compile with |
111 |
> -fbranch-probabilities, run again). It also provides some extremely |
112 |
> interesting and varying results. The bubblesort test for example, |
113 |
> improves between +15% and +300% depending on the other compiler |
114 |
> flags. |
115 |
> Towers of Hanoi goes from -20% to +50%. |
116 |
> |
117 |
> If users submitted _good_ non-interactive testcases for every ebuild, |
118 |
> it |
119 |
> wouldn't difficult to apply -fprofile-arcs/branch-probabilities and |
120 |
> or |
121 |
> acovea to most packages at all, apart from the massive increase in |
122 |
> compile time. |
123 |
|
124 |
Couldn't one save the profile data in the portage tree once a generic |
125 |
usecase was found? |
126 |
|
127 |
|
128 |
> > >Stable and high-performance is an per-system definition, as |
129 |
> evidenced |
130 |
> > >by the bug I mentioned with -O3. |
131 |
> > And should as such be fixed... in gcc. If gcc cant optimize correct |
132 |
> |
133 |
> > knowing the cache size of the cpu, gcc is broken. Fix gcc. |
134 |
> Again, it isn't a gcc bug, it's an issue with a specific machine (not |
135 |
> even a class of systems or cpus). |
136 |
> |
137 |
> Lets take a tangent on this whole issue for a moment. Ignoring the |
138 |
> implementation concerns, the end goal of your GLEP is this: |
139 |
> The basic gain you want, is for the support of per-package CFLAG |
140 |
> modifications (inside the ebuilds), for the purpose of performance |
141 |
> optimization. |
142 |
> |
143 |
> Do I have this correct? |
144 |
|
145 |
Yes pleace ignore implementation details, they whre only provided as an |
146 |
alternative example scenario, Very open for discussion =) |
147 |
|
148 |
The goal is not the speed as such, but the testability of it. I want to |
149 |
move from the current situation where you have absolutley no knowlege |
150 |
of the optimzation results to a situation where you would actually be |
151 |
able to give evidence of improvments or the reverse. |
152 |
|
153 |
Reusability of cflags if you wish =) |
154 |
|
155 |
|
156 |
|
157 |
/John |