Gentoo Archives: gentoo-dev

From: Paul de Vrieze <pauldv@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] Do we want optimal performance?
Date: Wed, 08 Sep 2004 13:19:11
Message-Id: 200409081517.06195.pauldv@gentoo.org
In Reply to: Re: [gentoo-dev] Do we want optimal performance? by Corvus Corax
1 On Wednesday 08 September 2004 14:03, Corvus Corax wrote:
2 > Am Wed, 8 Sep 2004 13:29:01 +0200
3 >
4 > schrieb Paul de Vrieze <pauldv@g.o>:
5 > > ...
6 > >
7 > > To do this for programs, one would need to have a realistic suite of
8 > > "tests" that simulate the real world use of the application. Of
9 > > course that also allows -fprofile_arcs to be used.
10 > >
11 > > Paul
12 >
13 > depending on the type of program - for easy command line converter
14 > tools, an easy "time" command would be sufficient (used that to
15 > determine potential (in code) optimizations for my motiontrack stuff)
16
17 Timing is not the issue, the issue is what you have the program do.
18 Initializing and then dying is not really a truthfull representation of
19 normal behaviour.
20
21 > however for libraries like QT or gtk which affect on screen performace
22 > of gui programs - both run-time and load - this will get much harder.
23
24 Certainly, for gui's you need some kind of scripting that simulates actual
25 user actions. (fortunately the interactive behaviour is not the
26 bottleneck in most interactive applications as most user actions lead to
27 almost trivial computations)
28
29 > Maybe one could program a test-suite for each library that fires each
30 > function once and times them, along with a flag saved before startup to
31 > determine load time - but it would have to be done for every huge
32 > library.
33
34 This is handwork, and also the different functions need different weights
35 based on a.o. the frequency of their work. In other words it is a very
36 big lot of work, and completely specific to the library/application at
37 hand.
38
39 >
40 > And finally the timing of interactive programs itself - well, usually
41 > most time goes while waiting for user input anyway, but there are
42 > timing critical tasks, too, imagine pattern searches or other big db
43 > operatins - or file load/save in openoffice, picture effects in gimp
44 > and such.
45 > those could maybe timed by doing them on a real huge data blog, where
46 > the single operation takes that long, that the user can measure it
47 > manually with a stopwatch.
48
49 This is not interesting for gentoo to offer. If the user wants to do
50 manual timing he allways can. What would be interesting is automated
51 timing. This unfortunately is not really easy with current packages. This
52 is most easy for applications with test suites. With relatively small
53 effort these test suites could double up as "representative applications
54 use" for timing and arc profiling (a newer option of gcc)
55
56 > If the operation takes 40 seconds, and you can gain 3 seconds by
57 > optimisations, it is a blunt measurable improvement. However I dont
58 > like that idea, maybe one can time the operation by watching tmp files
59 > in background or something like this.
60 >
61 > Or the maintainer could go into the code and insert some debug lines to
62 > print timing information to stderr or such. But this would be way to
63 > much work for most maintainers and most software isnt it ?
64
65 Well, I agree that there is much that can be done to improve application
66 performance. However, even with cflags (which in many cases do not make
67 the huge difference), it are all application specific optimizations.
68 These are things that application providers should do, not
69 packagers/distributors. We don't have the knowledge or the time to try to
70 find out for each specific cpu/application combination what the "fastest"
71 cflags are.
72
73 Also the review at the website still has it's issues. While it is quite
74 known that -O3 is in many cases slower than -O2, it is also true that the
75 internal architecture of CPU's matters. Fact is that gcc has had support
76 and testing on amd64 machines for a lot longer time than on xeons with
77 64bit extensions. Scheduling on the two different cpus is likely to have
78 different optimal strategies. It is unlikely that the xeon 64-bit
79 scheduler is as optimized as the amd64 scheduler. The x86_64 compiler
80 also defaults to generating amd64 optimal code (amd64 used to be the only
81 processor), so it is not strange that this codes performs much faster on
82 an opteron than on a xeon.
83
84 This leads to the observation that one can still base a buying decision on
85 the benchmarks one can not actually say that the opteron IS faster than
86 the xeon. Only that the opteron can execute opteron optimized code faster
87 than a xeon can. It is also known that the pentium4 architecture (which
88 the xeon has) is highly dependend on good scheduling so the observation
89 is not really a surprise.
90
91 Paul
92
93 ps. This does not mean that the opteron is not faster than the xeon, just
94 that this test does not give a reasonable indication of it.
95
96 --
97 Paul de Vrieze
98 Gentoo Developer
99 Mail: pauldv@g.o
100 Homepage: http://www.devrieze.net