1 |
On Wed, 2003-08-13 at 18:49, William Kenworthy wrote: |
2 |
> I'll stick my hand up and say I was the person who installed gentoo for |
3 |
> this test. For those who made the previous posts (mostly crap, and who |
4 |
> dont seem to have read the article very well - though it could have been |
5 |
> more informative), perhaps a few facts may help: |
6 |
> |
7 |
> 1. was fully bootstrapped and compiled as stage 1/2/3 on the machine - |
8 |
> not a binary install |
9 |
|
10 |
Great. I read the article and found no mention of the USE flags |
11 |
employed. I think you should have honestly posted any information on |
12 |
things you changed. |
13 |
|
14 |
> 2. gentoo-sources 2.4.20 was used - Mandrake came with a newer kernel |
15 |
> than gentoo's reccomended one (still does), debian was a dogs breakfast |
16 |
> because stable is so old. We actually tried to put the gentoo kernel on |
17 |
> mandrake/debian when tracking down the ide cable prob, but got too hard |
18 |
> - not the way some posts tried to imply) |
19 |
|
20 |
Were preemption and low latency turned on? Was the kernel compiled with |
21 |
the >gcc31 selection for the CPU? Better yet, why not post the .config |
22 |
from the 3 kernels? |
23 |
|
24 |
> 3. optimisations were EXACTLY as recommended by both the make.conf |
25 |
> entries, which were supported by the cflags from the forum for this cpu: |
26 |
> a 2G celery (P4 based core) I am not sure now, but I believe I ran |
27 |
> prelink as well (to match mandrake) - need to find and check the notes. |
28 |
> 4. Gnumerics problems have been identified and come down to the |
29 |
> particular version - is fixed in the upcoming stable release even before |
30 |
> this was found, but the project was unaware that what they believed was |
31 |
> a slightly slower mod in this version, could be so bad on particular |
32 |
> data sets - i.e., 30 odd mins in 1.0.13, but is less that 30s on 1.0.19 |
33 |
> on my laptop |
34 |
|
35 |
I hope you only used optimizations listed in the forums for the actual |
36 |
version of GCC you're running. From the sounds of it, you did not since |
37 |
you used pentium3 and the pentium4 problems were fixed in the most |
38 |
recent stable GCC. You also should have definitely used a "default" |
39 |
Gentoo install with no changes made. The default profile setup would |
40 |
have been used instead. Your optimizations could have been researched |
41 |
from GCC rather than taking the word of a bunch of "armchair compiler |
42 |
experts" on the forums. No offense meant to anyone, but you mention |
43 |
below that you do much scientific work, yet followed a very poor |
44 |
scientific model and research documentation for this article, which is |
45 |
why it has been torn apart so adamantly. Had you given out all of the |
46 |
information, even if it were simply links to the files from within the |
47 |
article, it would have given your article much more credibility. |
48 |
|
49 |
> There seems to be quite a few myths about this test and people upset |
50 |
> that months were not spent tuning gentoo and every effort made to |
51 |
> cripple the competition! (one person even suggested the faulty ide cable |
52 |
> should have been left in the debian box, as that was the way it was |
53 |
> delivered!) Read the article, and if you need extra information to |
54 |
> reproduce it, email me or or the author (Indy). It is reproducable - if |
55 |
> you can obtain the same hardware - I would be very interested if someone |
56 |
> has this and the time to really go into the why these results occurred |
57 |
> in more detail than I had the chance to. |
58 |
|
59 |
The same machine should have been used for the testing, rather than |
60 |
three machines. This alone is reason enough to discount your data. |
61 |
Three different machines WILL have three different levels of |
62 |
performance. |
63 |
|
64 |
> and why was this the result? Daniel Robbins suggested on this list that |
65 |
> gentoo-sources may be the problem, but tests on another machine (we had |
66 |
> the trial machines for only a couple of days, all of which time was used |
67 |
> to build gentoo right up until I ctrl-c'd the OO build so we could do |
68 |
> the tests before handing the hardware back) showed that turning off |
69 |
> pre-empt and low-latency had zero effect, but changing to an open-mosix |
70 |
> kernel 2.4.20 was ~10% slower (no thread export). It seemed to come |
71 |
|
72 |
I agree with Daniel on some of this. The default Gentoo kernel is not |
73 |
the fastest out there, it is the most feature rich to meet the various |
74 |
needs of our user base. I do agree that this kernel should have been |
75 |
used rather than any other. Also, preempt and the low-latency are |
76 |
interactivity increases, not raw performance increases. Their |
77 |
modifications are not easily quantifiable. If you want to test them, I |
78 |
suggest you look into ConTest |
79 |
(http://members.optusnet.com.au/ckolivas/kernel/) which was designed for |
80 |
testing this sort of thing. |
81 |
|
82 |
> down to the fact we used -O3 instead of -O2 (think spider might have |
83 |
> suggested this ?)- in effect over-optimised, and we didnt have a chance |
84 |
> to correct. From my perspective, most of the "he should have used ... |
85 |
|
86 |
No, you definitely "should have used" -O2 rather than -O3. Also, |
87 |
-fomit-frame-pointer and -mfpmath=sse would have given dramitic |
88 |
improvements. I'm not going to go into any other optimizations because |
89 |
the rest are essentially very specific to the hardware/software being |
90 |
used. I think these are the only "sensible" extra defaults that can be |
91 |
used on a machine with SSE. |
92 |
|
93 |
> may actually have made performance even worse! And besides the time |
94 |
> issue, these were supposedly the safe, reccomended flags so we went with |
95 |
> them. Please note that even Mandrake made only a slight gain on debian, |
96 |
> so 386.586/686 does not make a lot of difference in real world tasks |
97 |
> (the original aim of the tests) - the tests did tasks that particular |
98 |
|
99 |
386, 586, 686 make little difference compared to 386, 586, pentium4, |
100 |
which is how it should have been. |
101 |
|
102 |
> people used linux for in their day-to-day work - no special tests, so no |
103 |
> special bias. Yes, I could choose tests that make gentoo shine, or |
104 |
> debian, or windowsXP. But I dont do those tests every day, whilst that |
105 |
> spreadsheet was/is used as part of my normal work. And its the same |
106 |
> with the other tests. |
107 |
|
108 |
I actually agreed with most of your tests. You had a hard time being |
109 |
very time constrained. Honestly, were I in your position, I would not |
110 |
have made this report at all unless I had a MUCH longer time to test |
111 |
things. You should look into the kinds of testing that many of the |
112 |
hardware sites out there use. They tend to take WEEKS on a single |
113 |
article. It doesn't take their full attention that entire time. After |
114 |
all, there's only so much interaction you need to do when running a |
115 |
script which performs hundreds of actions and logs results to a file. |
116 |
|
117 |
> So how many gentoo systems out there have every possible optimisation in |
118 |
> the book, and are actually running slower than ideal? This is a real |
119 |
|
120 |
I use quite a few optimizations, which I benchmarked on my machine with |
121 |
my application/data set and it is the fastest I was able to come up |
122 |
with. I have actually turned OFF quite a few of the optimizations |
123 |
recommended by many of the "airmchair compiler experts" out there |
124 |
because they either provided little to no improvement or actually |
125 |
decreased performance. I really don't care if something is 0.001% |
126 |
faster if it takes 400% as long to compile. Especially being a |
127 |
developer and compiling quite a bit of stuff several times over. |
128 |
|
129 |
> problem, and I will be interested in how the cflags projects around |
130 |
> handle this, as most seem to aim at setting the maximum possible flags: |
131 |
> not actually tune the system for the ones that work best/most stably. A |
132 |
> live benchmark test might be more appropriate. |
133 |
|
134 |
I agree 100% here. |
135 |
|
136 |
> Most posts on irc and lists have settled down to "he doesnt know what |
137 |
> he's doing" (I do), or the tests were unfair to gentoo (they werent, but |
138 |
> then the same criteria were met by all 3 systems, but with some question |
139 |
> marks over debian because of its mix - some packages had to be compiled |
140 |
> locally, not binary) - but the thrust of the article was not that gentoo |
141 |
> was a dud, but that this was the result within the criteria and time we |
142 |
> were given, not what we expected, so we need to find out why. Also note |
143 |
> that this was not intentionally a debian/mandrake/gentto distro test. |
144 |
|
145 |
Not being able to tune Gentoo essentially means you did not participate |
146 |
in the "Gentoo Approach" but rather kludged it together fairly untuned |
147 |
and pitted against a tuned binary installation and debian. |
148 |
|
149 |
> We may be getting a P4 hyperthreaded system to play with, but under |
150 |
> different rules, where I get to do a bit of tuning first. I have |
151 |
> already built the core system on another machine using gcc-3.2.3, |
152 |
> "-march=pentium4 -O3 -pipe -fomit-frame-pointer" I note that the |
153 |
> pentium4 warning still appears in make.conf, though I believe it no |
154 |
> longer applies to this gcc. |
155 |
|
156 |
It does not apply to the newest stable GCC, so you are correct. |
157 |
|
158 |
> A while ago I emailed this list and asked for information on tests and |
159 |
> settings for HT P4's, without a reply. So again, has anyone done any |
160 |
> tests on a HT P4 and is willing to support the flags they chose as being |
161 |
> "the best"? In particular, does -ffast-math give a measurable gain? |
162 |
|
163 |
There is not much in the way of HT as it is looked at as a SMP machine |
164 |
under Linux. All you really do is enable SMP and make sure you use ACPI |
165 |
in the kernel. The default Gentoo kernel does not have many of the HT |
166 |
scheduling changes which have gone into the making of the 2.6_test |
167 |
kernels. There are backports for these, but I would consider that going |
168 |
a bit overboard, as hand-patching your kernel sources would yield better |
169 |
results on all three systems and should be left alone. After all, |
170 |
you're wanting to test the results of the three systems, not of your |
171 |
hand-made kernel. If you were to decide to use another kernel, I would |
172 |
say to use the latest vanilla kernel and possibly the latest 2.6_test |
173 |
kernel on each distribution using the exact same .config to see how much |
174 |
the kernel makes a difference in performance. You should not use |
175 |
-ffast-math in anything as a default, as it causes math errors which |
176 |
should not be introduced into a stable system. |
177 |
|
178 |
> Most of my machines have been built as scientific stations, so accuracy |
179 |
> is more important than ultimate speed, so this is one I have never |
180 |
> tested. I am not interested in the -O9 -max-everything kiddies who have |
181 |
> been so vocal, but reasoned choices. |
182 |
|
183 |
The -O9 kiddies are the "armchair compiler experts" I spoke of earlier. |
184 |
They have zero real knowledge of compilers and optimizations at all, but |
185 |
have "heard from a friend" or "read on a forum" about it so they think |
186 |
they know it all. I will gladly admit that I know little about |
187 |
compilers, but I have taken the time to do actual benchmarks on my |
188 |
system to test my various theories and have chosen what I feel to be the |
189 |
best combinations for my own needs. |
190 |
|
191 |
-- |
192 |
Chris Gianelloni |
193 |
Developer, Gentoo Linux |