1 |
On Wednesday 08 September 2004 14:03, Corvus Corax wrote: |
2 |
> Am Wed, 8 Sep 2004 13:29:01 +0200 |
3 |
> |
4 |
> schrieb Paul de Vrieze <pauldv@g.o>: |
5 |
> > ... |
6 |
> > |
7 |
> > To do this for programs, one would need to have a realistic suite of |
8 |
> > "tests" that simulate the real world use of the application. Of |
9 |
> > course that also allows -fprofile_arcs to be used. |
10 |
> > |
11 |
> > Paul |
12 |
> |
13 |
> depending on the type of program - for easy command line converter |
14 |
> tools, an easy "time" command would be sufficient (used that to |
15 |
> determine potential (in code) optimizations for my motiontrack stuff) |
16 |
|
17 |
Timing is not the issue, the issue is what you have the program do. |
18 |
Initializing and then dying is not really a truthfull representation of |
19 |
normal behaviour. |
20 |
|
21 |
> however for libraries like QT or gtk which affect on screen performace |
22 |
> of gui programs - both run-time and load - this will get much harder. |
23 |
|
24 |
Certainly, for gui's you need some kind of scripting that simulates actual |
25 |
user actions. (fortunately the interactive behaviour is not the |
26 |
bottleneck in most interactive applications as most user actions lead to |
27 |
almost trivial computations) |
28 |
|
29 |
> Maybe one could program a test-suite for each library that fires each |
30 |
> function once and times them, along with a flag saved before startup to |
31 |
> determine load time - but it would have to be done for every huge |
32 |
> library. |
33 |
|
34 |
This is handwork, and also the different functions need different weights |
35 |
based on a.o. the frequency of their work. In other words it is a very |
36 |
big lot of work, and completely specific to the library/application at |
37 |
hand. |
38 |
|
39 |
> |
40 |
> And finally the timing of interactive programs itself - well, usually |
41 |
> most time goes while waiting for user input anyway, but there are |
42 |
> timing critical tasks, too, imagine pattern searches or other big db |
43 |
> operatins - or file load/save in openoffice, picture effects in gimp |
44 |
> and such. |
45 |
> those could maybe timed by doing them on a real huge data blog, where |
46 |
> the single operation takes that long, that the user can measure it |
47 |
> manually with a stopwatch. |
48 |
|
49 |
This is not interesting for gentoo to offer. If the user wants to do |
50 |
manual timing he allways can. What would be interesting is automated |
51 |
timing. This unfortunately is not really easy with current packages. This |
52 |
is most easy for applications with test suites. With relatively small |
53 |
effort these test suites could double up as "representative applications |
54 |
use" for timing and arc profiling (a newer option of gcc) |
55 |
|
56 |
> If the operation takes 40 seconds, and you can gain 3 seconds by |
57 |
> optimisations, it is a blunt measurable improvement. However I dont |
58 |
> like that idea, maybe one can time the operation by watching tmp files |
59 |
> in background or something like this. |
60 |
> |
61 |
> Or the maintainer could go into the code and insert some debug lines to |
62 |
> print timing information to stderr or such. But this would be way to |
63 |
> much work for most maintainers and most software isnt it ? |
64 |
|
65 |
Well, I agree that there is much that can be done to improve application |
66 |
performance. However, even with cflags (which in many cases do not make |
67 |
the huge difference), it are all application specific optimizations. |
68 |
These are things that application providers should do, not |
69 |
packagers/distributors. We don't have the knowledge or the time to try to |
70 |
find out for each specific cpu/application combination what the "fastest" |
71 |
cflags are. |
72 |
|
73 |
Also the review at the website still has it's issues. While it is quite |
74 |
known that -O3 is in many cases slower than -O2, it is also true that the |
75 |
internal architecture of CPU's matters. Fact is that gcc has had support |
76 |
and testing on amd64 machines for a lot longer time than on xeons with |
77 |
64bit extensions. Scheduling on the two different cpus is likely to have |
78 |
different optimal strategies. It is unlikely that the xeon 64-bit |
79 |
scheduler is as optimized as the amd64 scheduler. The x86_64 compiler |
80 |
also defaults to generating amd64 optimal code (amd64 used to be the only |
81 |
processor), so it is not strange that this codes performs much faster on |
82 |
an opteron than on a xeon. |
83 |
|
84 |
This leads to the observation that one can still base a buying decision on |
85 |
the benchmarks one can not actually say that the opteron IS faster than |
86 |
the xeon. Only that the opteron can execute opteron optimized code faster |
87 |
than a xeon can. It is also known that the pentium4 architecture (which |
88 |
the xeon has) is highly dependend on good scheduling so the observation |
89 |
is not really a surprise. |
90 |
|
91 |
Paul |
92 |
|
93 |
ps. This does not mean that the opteron is not faster than the xeon, just |
94 |
that this test does not give a reasonable indication of it. |
95 |
|
96 |
-- |
97 |
Paul de Vrieze |
98 |
Gentoo Developer |
99 |
Mail: pauldv@g.o |
100 |
Homepage: http://www.devrieze.net |