Re: [gentoo-dev] Do we want optimal performance? - gentoo-dev

From:	Paul de Vrieze <pauldv@g.o>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] Do we want optimal performance?
Date:	Wed, 08 Sep 2004 13:19:11
Message-Id:	`200409081517.06195.pauldv@gentoo.org`
In Reply to:	Re: [gentoo-dev] Do we want optimal performance? by Corvus Corax

1

On Wednesday 08 September 2004 14:03, Corvus Corax wrote:

2

> Am Wed, 8 Sep 2004 13:29:01 +0200

3

>

4

> schrieb Paul de Vrieze <pauldv@g.o>:

5

> > ...

6

> >

7

> > To do this for programs, one would need to have a realistic suite of

8

> > "tests" that simulate the real world use of the application. Of

9

> > course that also allows -fprofile_arcs to be used.

10

> >

11

> > Paul

12

>

13

> depending on the type of program - for easy command line converter

14

> tools, an easy "time" command would be sufficient (used that to

15

> determine potential (in code) optimizations for my motiontrack stuff)

16

17

Timing is not the issue, the issue is what you have the program do. 

18

Initializing and then dying is not really a truthfull representation of 

19

normal behaviour.

20

21

> however for libraries like QT or gtk which affect on screen performace

22

> of gui programs - both run-time and load - this will get much harder.

23

24

Certainly, for gui's you need some kind of scripting that simulates actual 

25

user actions. (fortunately the interactive behaviour is not the 

26

bottleneck in most interactive applications as most user actions lead to 

27

almost trivial computations)

28

29

> Maybe one could program a test-suite for each library that fires each

30

> function once and times them, along with a flag saved before startup to

31

> determine load time - but it would have to be done for every huge

32

> library.

33

34

This is handwork, and also the different functions need different weights 

35

based on a.o. the frequency of their work. In other words it is a very 

36

big lot of work, and completely specific to the library/application at 

37

hand.

38

39

>

40

> And finally the timing of interactive programs itself - well, usually

41

> most time goes while waiting for user input anyway, but there are

42

> timing critical tasks, too, imagine pattern searches or other big db

43

> operatins - or file load/save in openoffice, picture effects in gimp

44

> and such.

45

> those could maybe timed by doing them on a real huge data blog, where

46

> the single operation takes that long, that the user can measure it

47

> manually with a stopwatch.

48

49

This is not interesting for gentoo to offer. If the user wants to do 

50

manual timing he allways can. What would be interesting is automated 

51

timing. This unfortunately is not really easy with current packages. This 

52

is most easy for applications with test suites. With relatively small 

53

effort these test suites could double up as "representative applications 

54

use" for timing and arc profiling (a newer option of gcc)

55

56

> If the operation takes 40 seconds, and you can gain 3 seconds by

57

> optimisations, it is a blunt measurable improvement. However I dont

58

> like that idea, maybe one can time the operation by watching tmp files

59

> in background or something like this.

60

>

61

> Or the maintainer could go into the code and insert some debug lines to

62

> print timing information to stderr or such. But this would be way to

63

> much work for most maintainers and most software isnt it ?

64

65

Well, I agree that there is much that can be done to improve application 

66

performance. However, even with cflags (which in many cases do not make 

67

the huge difference), it are all application specific optimizations. 

68

These are things that application providers should do, not 

69

packagers/distributors. We don't have the knowledge or the time to try to 

70

find out for each specific cpu/application combination what the "fastest" 

71

cflags are.

72

73

Also the review at the website still has it's issues. While it is quite 

74

known that -O3 is in many cases slower than -O2, it is also true that the 

75

internal architecture of CPU's matters. Fact is that gcc has had support 

76

and testing on amd64 machines for a lot longer time than on xeons with 

77

64bit extensions. Scheduling on the two different cpus is likely to have 

78

different optimal strategies. It is unlikely that the xeon 64-bit 

79

scheduler is as optimized as the amd64 scheduler. The x86_64 compiler 

80

also defaults to generating amd64 optimal code (amd64 used to be the only 

81

processor), so it is not strange that this codes performs much faster on 

82

an opteron than on a xeon.

83

84

This leads to the observation that one can still base a buying decision on 

85

the benchmarks one can not actually say that the opteron IS faster than 

86

the xeon. Only that the opteron can execute opteron optimized code faster 

87

than a xeon can. It is also known that the pentium4 architecture (which 

88

the xeon has) is highly dependend on good scheduling so the observation 

89

is not really a surprise.

90

91

Paul

92

93

ps. This does not mean that the opteron is not faster than the xeon, just 

94

that this test does not give a reasonable indication of it.

95

96

--

97

Paul de Vrieze

98

Gentoo Developer

99

Mail: pauldv@g.o

100

Homepage: http://www.devrieze.net

1	On Wednesday 08 September 2004 14:03, Corvus Corax wrote:
2	> Am Wed, 8 Sep 2004 13:29:01 +0200
3	>
4	> schrieb Paul de Vrieze <pauldv@g.o>:
5	> > ...
6	> >
7	> > To do this for programs, one would need to have a realistic suite of
8	> > "tests" that simulate the real world use of the application. Of
9	> > course that also allows -fprofile_arcs to be used.
10	> >
11	> > Paul
12	>
13	> depending on the type of program - for easy command line converter
14	> tools, an easy "time" command would be sufficient (used that to
15	> determine potential (in code) optimizations for my motiontrack stuff)
16
17	Timing is not the issue, the issue is what you have the program do.
18	Initializing and then dying is not really a truthfull representation of
19	normal behaviour.
20
21	> however for libraries like QT or gtk which affect on screen performace
22	> of gui programs - both run-time and load - this will get much harder.
23
24	Certainly, for gui's you need some kind of scripting that simulates actual
25	user actions. (fortunately the interactive behaviour is not the
26	bottleneck in most interactive applications as most user actions lead to
27	almost trivial computations)
28
29	> Maybe one could program a test-suite for each library that fires each
30	> function once and times them, along with a flag saved before startup to
31	> determine load time - but it would have to be done for every huge
32	> library.
33
34	This is handwork, and also the different functions need different weights
35	based on a.o. the frequency of their work. In other words it is a very
36	big lot of work, and completely specific to the library/application at
37	hand.
38
39	>
40	> And finally the timing of interactive programs itself - well, usually
41	> most time goes while waiting for user input anyway, but there are
42	> timing critical tasks, too, imagine pattern searches or other big db
43	> operatins - or file load/save in openoffice, picture effects in gimp
44	> and such.
45	> those could maybe timed by doing them on a real huge data blog, where
46	> the single operation takes that long, that the user can measure it
47	> manually with a stopwatch.
48
49	This is not interesting for gentoo to offer. If the user wants to do
50	manual timing he allways can. What would be interesting is automated
51	timing. This unfortunately is not really easy with current packages. This
52	is most easy for applications with test suites. With relatively small
53	effort these test suites could double up as "representative applications
54	use" for timing and arc profiling (a newer option of gcc)
55
56	> If the operation takes 40 seconds, and you can gain 3 seconds by
57	> optimisations, it is a blunt measurable improvement. However I dont
58	> like that idea, maybe one can time the operation by watching tmp files
59	> in background or something like this.
60	>
61	> Or the maintainer could go into the code and insert some debug lines to
62	> print timing information to stderr or such. But this would be way to
63	> much work for most maintainers and most software isnt it ?
64
65	Well, I agree that there is much that can be done to improve application
66	performance. However, even with cflags (which in many cases do not make
67	the huge difference), it are all application specific optimizations.
68	These are things that application providers should do, not
69	packagers/distributors. We don't have the knowledge or the time to try to
70	find out for each specific cpu/application combination what the "fastest"
71	cflags are.
72
73	Also the review at the website still has it's issues. While it is quite
74	known that -O3 is in many cases slower than -O2, it is also true that the
75	internal architecture of CPU's matters. Fact is that gcc has had support
76	and testing on amd64 machines for a lot longer time than on xeons with
77	64bit extensions. Scheduling on the two different cpus is likely to have
78	different optimal strategies. It is unlikely that the xeon 64-bit
79	scheduler is as optimized as the amd64 scheduler. The x86_64 compiler
80	also defaults to generating amd64 optimal code (amd64 used to be the only
81	processor), so it is not strange that this codes performs much faster on
82	an opteron than on a xeon.
83
84	This leads to the observation that one can still base a buying decision on
85	the benchmarks one can not actually say that the opteron IS faster than
86	the xeon. Only that the opteron can execute opteron optimized code faster
87	than a xeon can. It is also known that the pentium4 architecture (which
88	the xeon has) is highly dependend on good scheduling so the observation
89	is not really a surprise.
90
91	Paul
92
93	ps. This does not mean that the opteron is not faster than the xeon, just
94	that this test does not give a reasonable indication of it.
95
96	--
97	Paul de Vrieze
98	Gentoo Developer
99	Mail: pauldv@g.o
100	Homepage: http://www.devrieze.net

Gentoo Archives: gentoo-dev