1 |
A good reply - answer point by point |
2 |
|
3 |
On Thu, 2003-08-14 at 20:30, Chris Gianelloni wrote: |
4 |
> On Wed, 2003-08-13 at 18:49, William Kenworthy wrote: |
5 |
|
6 |
> Great. I read the article and found no mention of the USE flags |
7 |
> employed. I think you should have honestly posted any information on |
8 |
> things you changed. |
9 |
> |
10 |
No longer available, but I dont think its relevant to the tests we did. |
11 |
Interesting for some maybe, but not a performance issue. |
12 |
This was originally set up as a simple test of 3 relatively new to the |
13 |
distro users installing for the first time. i.e., if I moved from RH to |
14 |
gentoo, where would I start: read through make.conf which is where the |
15 |
docs say to go and set the system up that way and so on. Again, both |
16 |
time and this approach means extensive tuning, and changes that are not |
17 |
part of the initial install were not done. USE flags are to turn |
18 |
options in packages off and on, not set up performance (well at least |
19 |
directly), so I just used the minimum to get the install working for the |
20 |
tests. Superficially, this should be faster as it wouldn't install too |
21 |
much extra |
22 |
|
23 |
> > 2. gentoo-sources 2.4.20 was used - Mandrake came with a newer kernel |
24 |
> > than gentoo's reccomended one (still does), debian was a dogs breakfast |
25 |
> > because stable is so old. We actually tried to put the gentoo kernel on |
26 |
> > mandrake/debian when tracking down the ide cable prob, but got too hard |
27 |
> > - not the way some posts tried to imply) |
28 |
> |
29 |
> Were preemption and low latency turned on? Was the kernel compiled with |
30 |
> the >gcc31 selection for the CPU? Better yet, why not post the .config |
31 |
> from the 3 kernels? |
32 |
They were on, configs no longer available other than some notes I took |
33 |
at the time. Again, not really relevant in the original context of the |
34 |
test - next time as it seems people are interested. |
35 |
> |
36 |
> > 3. optimisations were EXACTLY as recommended by both the make.conf |
37 |
> > entries, which were supported by the cflags from the forum for this cpu: |
38 |
> > a 2G celery (P4 based core) I am not sure now, but I believe I ran |
39 |
> > prelink as well (to match mandrake) - need to find and check the notes. |
40 |
> > 4. Gnumerics problems have been identified and come down to the |
41 |
> > particular version - is fixed in the upcoming stable release even before |
42 |
> > this was found, but the project was unaware that what they believed was |
43 |
> > a slightly slower mod in this version, could be so bad on particular |
44 |
> > data sets - i.e., 30 odd mins in 1.0.13, but is less that 30s on 1.0.19 |
45 |
> > on my laptop |
46 |
> |
47 |
> I hope you only used optimizations listed in the forums for the actual |
48 |
> version of GCC you're running. From the sounds of it, you did not since |
49 |
> you used pentium3 and the pentium4 problems were fixed in the most |
50 |
> recent stable GCC. |
51 |
Not fixed in the version at the time of the tests. Also, in my current |
52 |
make.conf, there is still that huge all capital warning saying dont use |
53 |
pentium4 - nothing about any safe gcc version for the P4. |
54 |
|
55 |
> You also should have definitely used a "default" |
56 |
> Gentoo install with no changes made. The default profile setup would |
57 |
> have been used instead. Your optimizations could have been researched |
58 |
> from GCC rather than taking the word of a bunch of "armchair compiler |
59 |
> experts" on the forums. No offense meant to anyone, but you mention |
60 |
> below that you do much scientific work, yet followed a very poor |
61 |
> scientific model and research documentation for this article, which is |
62 |
> why it has been torn apart so adamantly. Had you given out all of the |
63 |
> information, even if it were simply links to the files from within the |
64 |
> article, it would have given your article much more credibility. |
65 |
> |
66 |
I actually did quite a lot of looking at this. The flags used are a bit |
67 |
different to what I use on my own pentiums and athlons (I use |
68 |
-fomit-frame-pointer for instance) - we limited ourselves to what the |
69 |
user would see reading make.conf as suggested in the documents. |
70 |
I would make the point that this is not an exhaustive no holds barred |
71 |
competition - Criteria could be described as "This was originally set |
72 |
up as a simple test of 3 relatively new to the distro users installing |
73 |
for the first time. i.e., if I moved from RH to gentoo, where would I |
74 |
start: read through make.conf which is where the docs say to go and set |
75 |
the system up that way and so on." |
76 |
|
77 |
I agree it is not a "scientific test", - it was not meant to be, but a |
78 |
simple this one looks faster than that one when I do the same work I do |
79 |
every day - not a special performance suite. We are not trying to be a |
80 |
microsoft and come up with an unreal figure to bolster our sales. |
81 |
|
82 |
> > There seems to be quite a few myths about this test and people upset |
83 |
> > that months were not spent tuning gentoo and every effort made to |
84 |
> > cripple the competition! (one person even suggested the faulty ide cable |
85 |
> > should have been left in the debian box, as that was the way it was |
86 |
> > delivered!) Read the article, and if you need extra information to |
87 |
> > reproduce it, email me or or the author (Indy). It is reproducable - if |
88 |
> > you can obtain the same hardware - I would be very interested if someone |
89 |
> > has this and the time to really go into the why these results occurred |
90 |
> > in more detail than I had the chance to. |
91 |
> |
92 |
> The same machine should have been used for the testing, rather than |
93 |
> three machines. This alone is reason enough to discount your data. |
94 |
> Three different machines WILL have three different levels of |
95 |
> performance. |
96 |
> |
97 |
A couple have mentioned this, but from personnel experiance, I can say |
98 |
yes, you may see some small variations (other than actually faulty |
99 |
hardware), but they should be fairly close if using the same software. |
100 |
We were not using the same software, so I would expect that to submerge |
101 |
any variations in hardware |
102 |
|
103 |
> > and why was this the result? Daniel Robbins suggested on this list that |
104 |
> > gentoo-sources may be the problem, but tests on another machine (we had |
105 |
> > the trial machines for only a couple of days, all of which time was used |
106 |
> > to build gentoo right up until I ctrl-c'd the OO build so we could do |
107 |
> > the tests before handing the hardware back) showed that turning off |
108 |
> > pre-empt and low-latency had zero effect, but changing to an open-mosix |
109 |
> > kernel 2.4.20 was ~10% slower (no thread export). It seemed to come |
110 |
> |
111 |
> I agree with Daniel on some of this. The default Gentoo kernel is not |
112 |
> the fastest out there, it is the most feature rich to meet the various |
113 |
> needs of our user base. I do agree that this kernel should have been |
114 |
> used rather than any other. Also, preempt and the low-latency are |
115 |
> interactivity increases, not raw performance increases. Their |
116 |
> modifications are not easily quantifiable. If you want to test them, I |
117 |
> suggest you look into ConTest |
118 |
> (http://members.optusnet.com.au/ckolivas/kernel/) which was designed for |
119 |
> testing this sort of thing. |
120 |
> |
121 |
I dont use conftest in my day to day work, but I often use gnumeric, |
122 |
gimp and OO - the intention was not to test for pure numbers but for me |
123 |
at least, If I wait for 30 mins to load my spreadsheet under Gentoo, how |
124 |
much longer will poor debianites have to wait ... and the answer was not |
125 |
as long! - so the focus shifted to whats gone wrong with gentoo. |
126 |
|
127 |
connftest will be handy when tuning for the next one (depending on |
128 |
time), but I doubt will be used for actual benchmarks as it is |
129 |
irrelevant to everyday use. |
130 |
> > down to the fact we used -O3 instead of -O2 (think spider might have |
131 |
> > suggested this ?)- in effect over-optimised, and we didnt have a chance |
132 |
> > to correct. From my perspective, most of the "he should have used ... |
133 |
> |
134 |
> No, you definitely "should have used" -O2 rather than -O3. Also, |
135 |
> -fomit-frame-pointer and -mfpmath=sse would have given dramitic |
136 |
> improvements. I'm not going to go into any other optimizations because |
137 |
> the rest are essentially very specific to the hardware/software being |
138 |
> used. I think these are the only "sensible" extra defaults that can be |
139 |
> used on a machine with SSE. |
140 |
Couldn't use them as they were not listed as reccomended in make.conf. |
141 |
|
142 |
Keep in mind that pentium3 implies extra flags - the following is from |
143 |
an email on the gcc list: |
144 |
-march=pentium3 -mcpu=i686 -msse |
145 |
-march=pentium4 -mcpu=pentium4 -msse2 |
146 |
|
147 |
so sse is implied by pentium3, and sse2 is where the invalid code |
148 |
was being generated, hence the warning to stay away from pentium4! |
149 |
|
150 |
> |
151 |
> > may actually have made performance even worse! And besides the time |
152 |
> > issue, these were supposedly the safe, reccomended flags so we went with |
153 |
> > them. Please note that even Mandrake made only a slight gain on debian, |
154 |
> > so 386.586/686 does not make a lot of difference in real world tasks |
155 |
> > (the original aim of the tests) - the tests did tasks that particular |
156 |
> |
157 |
> 386, 586, 686 make little difference compared to 386, 586, pentium4, |
158 |
> which is how it should have been. |
159 |
> |
160 |
> > people used linux for in their day-to-day work - no special tests, so no |
161 |
> > special bias. Yes, I could choose tests that make gentoo shine, or |
162 |
> > debian, or windowsXP. But I dont do those tests every day, whilst that |
163 |
> > spreadsheet was/is used as part of my normal work. And its the same |
164 |
> > with the other tests. |
165 |
> |
166 |
> I actually agreed with most of your tests. You had a hard time being |
167 |
> very time constrained. Honestly, were I in your position, I would not |
168 |
> have made this report at all unless I had a MUCH longer time to test |
169 |
> things. You should look into the kinds of testing that many of the |
170 |
> hardware sites out there use. They tend to take WEEKS on a single |
171 |
> article. It doesn't take their full attention that entire time. After |
172 |
> all, there's only so much interaction you need to do when running a |
173 |
> script which performs hundreds of actions and logs results to a file. |
174 |
> |
175 |
A bit late, as you can only find out this after you start the test - not |
176 |
cricket to say whoops, I'm not winning, go ahead without me! : so gentoo |
177 |
didnt perform up to my hopes. At least we now have a discussion as to |
178 |
why, how to improve it, and perhaps reel in some of the hype. |
179 |
|
180 |
> > So how many gentoo systems out there have every possible optimisation in |
181 |
> > the book, and are actually running slower than ideal? This is a real |
182 |
> |
183 |
> I use quite a few optimizations, which I benchmarked on my machine with |
184 |
> my application/data set and it is the fastest I was able to come up |
185 |
> with. I have actually turned OFF quite a few of the optimizations |
186 |
> recommended by many of the "airmchair compiler experts" out there |
187 |
> because they either provided little to no improvement or actually |
188 |
> decreased performance. I really don't care if something is 0.001% |
189 |
> faster if it takes 400% as long to compile. Especially being a |
190 |
> developer and compiling quite a bit of stuff several times over. |
191 |
> |
192 |
> > problem, and I will be interested in how the cflags projects around |
193 |
> > handle this, as most seem to aim at setting the maximum possible flags: |
194 |
> > not actually tune the system for the ones that work best/most stably. A |
195 |
> > live benchmark test might be more appropriate. |
196 |
> |
197 |
> I agree 100% here. |
198 |
Thanks, hopefully this article has kicked this idea along a bit. |
199 |
> |
200 |
> > Most posts on irc and lists have settled down to "he doesnt know what |
201 |
> > he's doing" (I do), or the tests were unfair to gentoo (they werent, but |
202 |
> > then the same criteria were met by all 3 systems, but with some question |
203 |
> > marks over debian because of its mix - some packages had to be compiled |
204 |
> > locally, not binary) - but the thrust of the article was not that gentoo |
205 |
> > was a dud, but that this was the result within the criteria and time we |
206 |
> > were given, not what we expected, so we need to find out why. Also note |
207 |
> > that this was not intentionally a debian/mandrake/gentto distro test. |
208 |
> |
209 |
> Not being able to tune Gentoo essentially means you did not participate |
210 |
> in the "Gentoo Approach" but rather kludged it together fairly untuned |
211 |
> and pitted against a tuned binary installation and debian. |
212 |
> |
213 |
> > We may be getting a P4 hyperthreaded system to play with, but under |
214 |
> > different rules, where I get to do a bit of tuning first. I have |
215 |
> > already built the core system on another machine using gcc-3.2.3, |
216 |
> > "-march=pentium4 -O3 -pipe -fomit-frame-pointer" I note that the |
217 |
> > pentium4 warning still appears in make.conf, though I believe it no |
218 |
> > longer applies to this gcc. |
219 |
> |
220 |
> It does not apply to the newest stable GCC, so you are correct. |
221 |
> |
222 |
> > A while ago I emailed this list and asked for information on tests and |
223 |
> > settings for HT P4's, without a reply. So again, has anyone done any |
224 |
> > tests on a HT P4 and is willing to support the flags they chose as being |
225 |
> > "the best"? In particular, does -ffast-math give a measurable gain? |
226 |
> |
227 |
> There is not much in the way of HT as it is looked at as a SMP machine |
228 |
> under Linux. All you really do is enable SMP and make sure you use ACPI |
229 |
> in the kernel. The default Gentoo kernel does not have many of the HT |
230 |
> scheduling changes which have gone into the making of the 2.6_test |
231 |
> kernels. There are backports for these, but I would consider that going |
232 |
> a bit overboard, as hand-patching your kernel sources would yield better |
233 |
> results on all three systems and should be left alone. After all, |
234 |
> you're wanting to test the results of the three systems, not of your |
235 |
> hand-made kernel. If you were to decide to use another kernel, I would |
236 |
> say to use the latest vanilla kernel and possibly the latest 2.6_test |
237 |
> kernel on each distribution using the exact same .config to see how much |
238 |
> the kernel makes a difference in performance. You should not use |
239 |
> -ffast-math in anything as a default, as it causes math errors which |
240 |
> should not be introduced into a stable system. |
241 |
> |
242 |
> > Most of my machines have been built as scientific stations, so accuracy |
243 |
> > is more important than ultimate speed, so this is one I have never |
244 |
> > tested. I am not interested in the -O9 -max-everything kiddies who have |
245 |
> > been so vocal, but reasoned choices. |
246 |
> |
247 |
> The -O9 kiddies are the "armchair compiler experts" I spoke of earlier. |
248 |
> They have zero real knowledge of compilers and optimizations at all, but |
249 |
> have "heard from a friend" or "read on a forum" about it so they think |
250 |
> they know it all. I will gladly admit that I know little about |
251 |
> compilers, but I have taken the time to do actual benchmarks on my |
252 |
> system to test my various theories and have chosen what I feel to be the |
253 |
> best combinations for my own needs. |
254 |
Wish more would do as you have done - I dont think too many follow this |
255 |
approach. Most seem to just do the reccomended, or go for the max. I |
256 |
did a few key apps, made my choices and have stayed with them. If a |
257 |
really useful cflags project gets up, it would be nice to run it |
258 |
regularly and perhaps find that the new gcc just emerged gives a |
259 |
measured 5% speed up if you use the new -supercharger flag! |
260 |
|
261 |
I accept there can be a large number of criticisms made of the tests, |
262 |
but most can be countered because of the criteria we set for ourselves. |
263 |
|
264 |
Thanks for the time you spent on this |
265 |
|
266 |
-- |
267 |
William Kenworthy <billk@×××××××××.au> |
268 |
|
269 |
|
270 |
-- |
271 |
gentoo-dev@g.o mailing list |