[gentoo-amd64] Re: System becomes almost unusable when compling c++ code - gentoo-amd64

From:	Duncan <1i5t5.duncan@×××.net>
To:	gentoo-amd64@l.g.o
Subject:	[gentoo-amd64] Re: System becomes almost unusable when compling c++ code
Date:	Fri, 03 Aug 2007 09:35:38
Message-Id:	`pan.2007.08.03.09.33.24@cox.net`
In Reply to:	[gentoo-amd64] System becomes almost unusable when compling c++ code by Shaochun Wang

1

Shaochun Wang <scwang@××××××.cn> posted 20070803065913.GA23254@localhost,

2

excerpted below, on  Fri, 03 Aug 2007 14:59:13 +0800:

3

4

> Every time I compile C++ code, e.g. app-i18n/scim-qtimm, my desktop

5

> system becomes almost not

6

> interactive. I have already set PORTAGE_NICENESS="15" in /etc/make.conf.

7

>

8

> Any suggestion?

9

10

Do you have a single-core single-CPU system, or multi-one-or-the-other?

11

12

In any case, unless you are running folding@home or the like, something 

13

truly idle-only that you want the emerge to get higher priority than, you 

14

should consider PORTAGE_NICENESS=19.   The reason being, a +19 nice is 

15

treated as idle priority by the scheduler, giving the rest of the system 

16

slightly higher responsiveness (lower latency), while giving the idle 

17

task somewhat longer time slices.  The effect can actually be to /speed/ 

18

/up/ compiles over a positive niceness <19 or even over normal 

19

scheduling, due to the longer timeslices.

20

21

What clock tick setting are you using, and what is your preemption 

22

setting?  Particularly if you are single-core and CPU, a higher clock 

23

tick setting (Timer frequency), not 100 certainly (that's for servers), 

24

probably 300 or 1000, will increase responsiveness at the cost of tasks 

25

taking longer (shorter timeslices, more overhead processing timeslices 

26

each second).  Similarly with preemption.  You'll want that set to 

27

Preemptible Kernel (Low-Latence Desktop) or at least Voluntary Kernel 

28

Premption (Desktop).  Also be sure the Preempt the Big Kernel Lock option 

29

is toggled ON.

30

31

Conversely, lower settings, No Forced Preemption (Server) or Voluntary 

32

Kernel Preemption (Desktop), and 100, 250 or 300 tick rate should work 

33

better for multi-core or multi-CPU SMP systems, because they can spread 

34

the load a bit more.  FWIW, dual Opteron 242 (so dual single cores) here, 

35

I'm running voluntary preemption, BKL preemption, and 300 Hz tick 

36

frequency.  That's the highest I'd recommend for a general purpose multi-

37

core or multi-CPU system, but tho I'd recommend as above, 1000 Hz tick 

38

and full preemption for single CPU/core systems.

39

40

You may also wish to play with the MAKEOPTS setting, typically -jX, where 

41

X is the number of CPUs/cores plus one to 150% of the CPUs/cores.  Thus, 

42

a single core/cpu system's recommended setting is -j2.  However, you may 

43

find -j1 increases your responsiveness.  Or you can try -j2, but add -l1 

44

or the like.  With GNU make (but not all others, you may have to remove 

45

the -lX portion for some merges), that'll tell it to allow up to two jobs 

46

(the -j2) but ONLY start a second one if the load average is below 1 (-

47

l1).  That's generally fairly effective.

48

49

There are other things to consider as well.  How do you usually compile, 

50

in a terminal window (xterm, konsole, gterm, etc) or at the text 

51

console?  I've noted that at least with konsole and with composite 

52

rendering (real transparency) turned on, non-niced CPU usage goes thru 

53

the roof trying to keep the konsole updated at times.  Causing the 

54

konsole window not to display, either minimizing it, shading it, or 

55

switching to a different desktop workspace so the konsole isn't shown, 

56

eliminates the issue.  If I'm /really/ planning on going to town (say a 

57

new KDE release came out and I have 100 plus packages of mostly C++ to 

58

compile), I'll turn off composite rendering as well.  When there's rapid 

59

display updates such as when compile output is scrolling by, if X and the 

60

X clients don't have to do all that extra work drawing and compositing 

61

areas normally hidden by other windows, it makes a big difference.  (It 

62

should be noted that typical composite overhead is <5% of a single CPU, 

63

here, more like 2% unless I've a huge bunch of windows open, and that's 

64

on a dual 1600x1200 display.  Radeon 9250, the last Radeons for which 

65

there's decent freedomware drivers, FWIW, tho the reverse engineering 

66

effort on the r300 and r400 series is progressing nicely.)

67

68

Then there's the standard stuff, but it'd tend to affect more than simply 

69

C++ compiling.  Make sure your SATA/PATA/SCSI chipsets are running their 

70

correct drivers with DMA enabled, not just generic, no DMA compatibility 

71

mode.  That's a big one but if it affected you you'd probably notice it 

72

elsewhere as well.  

73

74

If you have lots of memory (2 gigs or better, 4 gigs is nice, I have 8 

75

but that's overkill), strongly consider setting up your PORTAGE_TMPDIR 

76

(/var/tmp by default) on tmpfs.  Having all those temporary files 

77

typically used during a compile and merge written to memory only, instead 

78

of having to wait for several orders of magnitude slower hard disk 

79

access, DRAMATICALLY speeds up compiles, while at the SAME time speeding 

80

up general system responsiveness during the merge, because disk access 

81

slows down the /entire/ system, especially when whatever else you are 

82

working on is trying to access the disk at the same time.

83

84

To give you an example of what things /can/ be like, with my now aging 

85

dual Opteron 242 setup here (soon to be upgraded to dual-core Opteron 

86

290s), with 8 gigs memory (as I said, overkill, 4 gig would be fine), /

87

tmp on tmpfs (with $PORTAGE_TMPDIR=/tmp), 4-disk RAID-6 system (two 

88

parity stripes, so effectively 2-way striped), RAID-0 $PORTDIR and 

89

ccache, I routinely run MAKEOPTS="-j1000" (not that it ever gets there, 

90

but some builds don't like the unlimited -j, no number), and run five or 

91

more parallel emerges at the same time (using emerge -pt to get a tree 

92

output and -a for verification, so I can setup non-conflicting parallel 

93

emerges).  That's usually for a KDE update where I have 100 or so C++ KDE 

94

packages to merge, so it's mostly C++.  Because the config and certain 

95

other sections aren't parallelized and thus do only a single job, even 

96

with that, my load average seldom rises above 20 or 25.  Or, on the 

97

kernel, which is C not C++ but parallelizes VERY well, I'll get a load 

98

average of several /hundred/.  It's fun to see it go that high! =8^)  

99

Still, even with a 300-500 load average or 20ish load average on C++ (the 

100

most I seem to hit is 30), while I do get a bit of lag on the mouse, and 

101

the panel clock and ksysguard displays sometimes freeze for 10 seconds at 

102

a time, it's still surprising to me it's not /entirely/ unusable.  As 

103

well, I can be playing an Internet radio stream the entire time (and no, 

104

I don't have anything set real-time, either), with few if any dropouts at 

105

all.  That's REALLY astounding to me!  Up to a 500 load average, yet the 

106

scheduler continues to work well enough to give the network, player and 

107

audio system all the time it needs to prevent both dropped network 

108

packets and dropped audio data!  (It's obvious the scheduler prioritizes 

109

both the IP stack and the audio system without intervention, and equally 

110

obvious the KDE panel doesn't get the same auto prioritization, as 

111

clearly, 10 second updates as on the panel simply wouldn't cut it on the 

112

network or audio stream.  That be as it may, it's still fascinating to 

113

watch, seeing the load average climb to several hundred when I 

114

deliberately compile a kernel with -j1000, without a single hitch or skip 

115

in the audio playback at all!)

116

117

--

118

Duncan - List replies preferred.   No HTML msgs.

119

"Every nonfree program has a lord, a master --

120

and if you use the program, he is your master."  Richard Stallman

121

122

--

123

gentoo-amd64@g.o mailing list

Gentoo Archives: gentoo-amd64

Replies

1	Shaochun Wang <scwang@××××××.cn> posted 20070803065913.GA23254@localhost,
2	excerpted below, on Fri, 03 Aug 2007 14:59:13 +0800:
3
4	> Every time I compile C++ code, e.g. app-i18n/scim-qtimm, my desktop
5	> system becomes almost not
6	> interactive. I have already set PORTAGE_NICENESS="15" in /etc/make.conf.
7	>
8	> Any suggestion?
9
10	Do you have a single-core single-CPU system, or multi-one-or-the-other?
11
12	In any case, unless you are running folding@home or the like, something
13	truly idle-only that you want the emerge to get higher priority than, you
14	should consider PORTAGE_NICENESS=19. The reason being, a +19 nice is
15	treated as idle priority by the scheduler, giving the rest of the system
16	slightly higher responsiveness (lower latency), while giving the idle
17	task somewhat longer time slices. The effect can actually be to /speed/
18	/up/ compiles over a positive niceness <19 or even over normal
19	scheduling, due to the longer timeslices.
20
21	What clock tick setting are you using, and what is your preemption
22	setting? Particularly if you are single-core and CPU, a higher clock
23	tick setting (Timer frequency), not 100 certainly (that's for servers),
24	probably 300 or 1000, will increase responsiveness at the cost of tasks
25	taking longer (shorter timeslices, more overhead processing timeslices
26	each second). Similarly with preemption. You'll want that set to
27	Preemptible Kernel (Low-Latence Desktop) or at least Voluntary Kernel
28	Premption (Desktop). Also be sure the Preempt the Big Kernel Lock option
29	is toggled ON.
30
31	Conversely, lower settings, No Forced Preemption (Server) or Voluntary
32	Kernel Preemption (Desktop), and 100, 250 or 300 tick rate should work
33	better for multi-core or multi-CPU SMP systems, because they can spread
34	the load a bit more. FWIW, dual Opteron 242 (so dual single cores) here,
35	I'm running voluntary preemption, BKL preemption, and 300 Hz tick
36	frequency. That's the highest I'd recommend for a general purpose multi-
37	core or multi-CPU system, but tho I'd recommend as above, 1000 Hz tick
38	and full preemption for single CPU/core systems.
39
40	You may also wish to play with the MAKEOPTS setting, typically -jX, where
41	X is the number of CPUs/cores plus one to 150% of the CPUs/cores. Thus,
42	a single core/cpu system's recommended setting is -j2. However, you may
43	find -j1 increases your responsiveness. Or you can try -j2, but add -l1
44	or the like. With GNU make (but not all others, you may have to remove
45	the -lX portion for some merges), that'll tell it to allow up to two jobs
46	(the -j2) but ONLY start a second one if the load average is below 1 (-
47	l1). That's generally fairly effective.
48
49	There are other things to consider as well. How do you usually compile,
50	in a terminal window (xterm, konsole, gterm, etc) or at the text
51	console? I've noted that at least with konsole and with composite
52	rendering (real transparency) turned on, non-niced CPU usage goes thru
53	the roof trying to keep the konsole updated at times. Causing the
54	konsole window not to display, either minimizing it, shading it, or
55	switching to a different desktop workspace so the konsole isn't shown,
56	eliminates the issue. If I'm /really/ planning on going to town (say a
57	new KDE release came out and I have 100 plus packages of mostly C++ to
58	compile), I'll turn off composite rendering as well. When there's rapid
59	display updates such as when compile output is scrolling by, if X and the
60	X clients don't have to do all that extra work drawing and compositing
61	areas normally hidden by other windows, it makes a big difference. (It
62	should be noted that typical composite overhead is <5% of a single CPU,
63	here, more like 2% unless I've a huge bunch of windows open, and that's
64	on a dual 1600x1200 display. Radeon 9250, the last Radeons for which
65	there's decent freedomware drivers, FWIW, tho the reverse engineering
66	effort on the r300 and r400 series is progressing nicely.)
67
68	Then there's the standard stuff, but it'd tend to affect more than simply
69	C++ compiling. Make sure your SATA/PATA/SCSI chipsets are running their
70	correct drivers with DMA enabled, not just generic, no DMA compatibility
71	mode. That's a big one but if it affected you you'd probably notice it
72	elsewhere as well.
73
74	If you have lots of memory (2 gigs or better, 4 gigs is nice, I have 8
75	but that's overkill), strongly consider setting up your PORTAGE_TMPDIR
76	(/var/tmp by default) on tmpfs. Having all those temporary files
77	typically used during a compile and merge written to memory only, instead
78	of having to wait for several orders of magnitude slower hard disk
79	access, DRAMATICALLY speeds up compiles, while at the SAME time speeding
80	up general system responsiveness during the merge, because disk access
81	slows down the /entire/ system, especially when whatever else you are
82	working on is trying to access the disk at the same time.
83
84	To give you an example of what things /can/ be like, with my now aging
85	dual Opteron 242 setup here (soon to be upgraded to dual-core Opteron
86	290s), with 8 gigs memory (as I said, overkill, 4 gig would be fine), /
87	tmp on tmpfs (with $PORTAGE_TMPDIR=/tmp), 4-disk RAID-6 system (two
88	parity stripes, so effectively 2-way striped), RAID-0 $PORTDIR and
89	ccache, I routinely run MAKEOPTS="-j1000" (not that it ever gets there,
90	but some builds don't like the unlimited -j, no number), and run five or
91	more parallel emerges at the same time (using emerge -pt to get a tree
92	output and -a for verification, so I can setup non-conflicting parallel
93	emerges). That's usually for a KDE update where I have 100 or so C++ KDE
94	packages to merge, so it's mostly C++. Because the config and certain
95	other sections aren't parallelized and thus do only a single job, even
96	with that, my load average seldom rises above 20 or 25. Or, on the
97	kernel, which is C not C++ but parallelizes VERY well, I'll get a load
98	average of several /hundred/. It's fun to see it go that high! =8^)
99	Still, even with a 300-500 load average or 20ish load average on C++ (the
100	most I seem to hit is 30), while I do get a bit of lag on the mouse, and
101	the panel clock and ksysguard displays sometimes freeze for 10 seconds at
102	a time, it's still surprising to me it's not /entirely/ unusable. As
103	well, I can be playing an Internet radio stream the entire time (and no,
104	I don't have anything set real-time, either), with few if any dropouts at
105	all. That's REALLY astounding to me! Up to a 500 load average, yet the
106	scheduler continues to work well enough to give the network, player and
107	audio system all the time it needs to prevent both dropped network
108	packets and dropped audio data! (It's obvious the scheduler prioritizes
109	both the IP stack and the audio system without intervention, and equally
110	obvious the KDE panel doesn't get the same auto prioritization, as
111	clearly, 10 second updates as on the panel simply wouldn't cut it on the
112	network or audio stream. That be as it may, it's still fascinating to
113	watch, seeing the load average climb to several hundred when I
114	deliberately compile a kernel with -j1000, without a single hitch or skip
115	in the audio playback at all!)
116
117	--
118	Duncan - List replies preferred. No HTML msgs.
119	"Every nonfree program has a lord, a master --
120	and if you use the program, he is your master." Richard Stallman
121
122	--
123	gentoo-amd64@g.o mailing list