[gentoo-amd64] Re: oom killer problems - gentoo-amd64

From:	Duncan <1i5t5.duncan@×××.net>
To:	gentoo-amd64@l.g.o
Subject:	[gentoo-amd64] Re: oom killer problems
Date:	Thu, 29 Sep 2005 07:18:11
Message-Id:	`pan.2005.09.29.07.14.54.858287@cox.net`
In Reply to:	[gentoo-amd64] oom killer problems by "Hemmann

1

Hemmann, Volker Armin posted

2

<200509282235.32195.volker.armin.hemmann@××××××××××××.de>, excerpted

3

below,  on Wed, 28 Sep 2005 22:35:32 +0200:

4

5

> Hi,

6

> when I try to emerge kdepim-3.4.2 with the kdeenablefinal use-flag I get

7

> a lot of oom-kills.

8

> I got them with 512mb, so I upgraded to 1gig and still have them. What

9

> puzzles me is, that I have a lot of swap free when it happens.. could

10

> someone please tell me, why the oom-killer becomes active, when there is

11

> still a lot of free swap?

12

> I am just an user, so using easy words would be much appreciated ;)

13

>

14

[snip]

15

>

16

> kernel is 2.6.13-r2

17

> I have 1gb of ram, and approximatly 1gb of swap.

18

>

19

>  I  emerged kdepim without kdeenablefinal, so there is no big pressure,

20

>  I am

21

> just curious

22

23

There's something about the "lots of swap left" thing below.  However,

24

that's theory, I'll cover the practical stuff first, leaving that aspect

25

for later.

26

27

kdeenablefinal requires HUGE amounts of memory, no doubt about it.  I've

28

not had serious issues with my gig of memory (dual Opterons as you seem to

29

have), using kdeenablefinal here, but I've been doing things rather

30

different than you probably have, and any one of the things I've done

31

different may be the reason I haven't had the memory issue to the severity

32

you have.

33

34

1.  I have swap entirely disabled.

35

36

Here was my reasosning (apart from the issue at hand).  I was reading an

37

explanation of some of the aspects of the kernel VMM (virtual memory

38

manager) on LWN (Linux Weekly News, lwn.net), when I suddenly realized

39

that all the complexity they were describing I could probably do without,

40

by turning off swap, since I'd recently upgraded to a gig of RAM.  I

41

reasoned that I normally ran a quarter to a third of that in application

42

memory, so even if I doubled normal use at times, I'd still have a third

43

of a gig of free memory available for cache.  Further, I reasoned that if

44

something should use all that memory and STILL run out, it was likely a

45

runaway process, gobbling all the memory available, and that I might as

46

well have it activate the OOM killer at a gig, without further lugging the

47

system down, than at  2 G (or whatever), lugging the system down with a

48

swap storm so I couldn't do anything about it anyway.  For the most part,

49

I've been quite happy with my decision, altho now that suspend is starting

50

to look like it'll work for dual CPU systems (suspend to RAM sort of

51

worked, for the first time here, early in the .13 rcs, but they reverted

52

it for .13 release, as it needed more work), I may enable swap again, if

53

only to get suspend to disk functionality.

54

55

Of course, I'm not saying disabling swap is the right thing for you, but

56

I've been happy with it, here.  Anyway, a gig of RAM, swap disabled, so

57

the VMM complexity that's part of managing swap also disabled.  It's

58

possible that's a factor, tho I'm guessing the stuff below is more likely.

59

60

2.  Possibly the biggest factor is the KDE packages used.  I'm using the

61

split-ebuilds, NOT the monolithic category packages.  It's possible that's

62

the difference.  Further, I don't have all the split-packages that compose

63

kdepim-meta merged.  I have kmail and knode merged, with dependencies of

64

course, but don't have a handheld to worry about syncing to, so skipped

65

all those split-ebuilds that form part of kdepim-meta (and are part of the

66

monolithic ebuild), except where kmail/knode etc had them as dependencies.

67

Thus, no kitchensync, korn, kandy, kdepim-kresources, etc.

68

69

There are therefore two possibilities here.  One is that one of the

70

individual apps I skipped requires more memory.  The other is that the

71

monolithic ebuild you used does several things at once (possibly due to

72

your jobs setting, see below) where the split ebuilds do them in series,

73

therefore limiting the maximum memory required at a given moment.

74

75

3.  I'm NOT using unsermake.  For some reason, it hasn't worked for me

76

since KDE 3.2 or so.  I've tried different versions, but always had either

77

an error, or despite my settings, the ebuild doesn't seem to register

78

unsermake and thus uses the normal make system.  Unsermake is better at

79

parallellizing the various jobs, making more efficient use of multiple

80

CPUs, but also, given the memory required for enable final, likely causing

81

higher memory stress than ordinary gnu-make does.  If you are using that

82

and it's otherwise working for you, that may be the difference.

83

84

The rest of the possibilities may or may not apply.  You didn't include

85

the output of emerge info, so I can't  compare the relevant  info from

86

your system to mine.  However, I suspect they /do/ apply, for reasons

87

which should be clear as I present them, below.

88

89

4.  It appears (from the snipped stuff) you are running dual CPU (or a

90

single dual-core CPU).  How many jobs do you have portage configured for?

91

With my dual-CPU system, I originally had four set, but after seeing what

92

KDE compiling with kdeenablefinal did to my memory resources, even a gig,

93

I decided I better reduce that to three!  If you have  four or more

94

parallel jobs set, THAT could very possibly be your problem, right there.

95

You can probably do four or more jobs OR kdeenablefinal, but not BOTH, at

96

least not BOTH, while running X and KDE at the same time!

97

98

I should mention that I sometimes run multiple emerges (each with three

99

jobs) in parallel.  I *DID* run into OOM issues when trying to do that

100

with kmail and another large KDE package.  Kmail is of course part of

101

kdepim, and my experience DOES confirm that it's one of the largest in

102

memory requirements, with kdeenablefinal set.  I could emerge small things

103

in parallel with it, stuff like kworldwatch, say, but nothing major, like

104

konqueror.  Thus, I can almost certainly say that six jobs will trigger

105

the OOM killer, when some of them are kmail, and could speculate that five

106

jobs would do it, at some point in the kmail compilation.  Four jobs may

107

or may not work, but three did, for me, under the conditions explained in

108

the other six points, of course.

109

110

(Note that the unsermake thing could compound the issue here, because as I

111

said, it's better at finding things to run in parallel than the normal

112

make system is.)

113

114

5.  I'm now running gcc-4.0.1, and have been compiling kde with

115

gcc-4.0.0-preX or later since kde-3.4.0.  gcc-4.x is still package.mask-ed

116

on Gentoo, because some packages still don't compile with it.  Of course,

117

that's easily worked around because Gentoo slots gcc, so I have the latest

118

gcc-3.4.x installed, in addition to gcc-4.x, and can (and do) easily

119

switch between them using gcc-config.  However, the fact that gcc-4 is

120

still masked for Gentoo, means you probably aren't running it, while I am,

121

and that's what I compile kde with.  The 4.x version is enough different

122

from 3.4.x that memory use can be expected to be rather different as well.

123

It's quite possible that the kdeenablefinal stuff requires even more

124

memory with gcc-3.x than it does with the 4.x I've been successfully

125

using.

126

127

6.  It's also possible something else in the configuration affects

128

compile-time memory usage.  There are CFLAGS, of course, and I'm also

129

running newer (and still masked, AFAIK) versions of binutils and glibc,

130

with patches specifically for gcc-4.

131

132

7.  I don't do my kernels thru Gentoo, preferring instead to use the

133

kernel straight off of kernel.org, You say kernel 2.6.13-r2, the r2

134

indicating a Gentoo revision, but you don't say /which/ Gentoo kernel you

135

are running.  The VMM is complex enough and has a wide enough variety of

136

patches circulating for it, that it's possible you hit a bug that wasn't

137

in the mainline kernel.org kernel that I'm running.  Or... it may be some

138

other factor in our differing kernel configs.

139

140

...

141

142

Now to the theory.  Why would OOM trigger when you had all that free swap?

143

There are two possible explanations I am aware of and maybe others that

144

I'm not.

145

146

1.  "Memory allocation" is a verb as well as a noun.

147

148

We know that enablefinal uses lots of memory.  The USE flag description

149

mentions that and we've discovered it to be /very/ true.  If you run

150

ksysguard on your panel as I do, and monitor memory using it as I do (or

151

run a VT with a top session running if compiling at the text console), you

152

are also aware that memory use during compile sessions, particularly KDE

153

compile sessions with enablefinal set, varies VERY drastically!  From my

154

observations, each "job" will at times eat more and more memory, until

155

with kmail in particular, multiple jobs are taking well over 200MB of

156

memory a piece!  (See why I mentioned parallel jobs above?  At 200,

157

possibly 300+ MB apiece, multiple parallel jobs eat up the memory VERY

158

fast!)  After grabbing more and more memory for awhile, a job will

159

suddenly complete and release it ALL at once.  The memory usage graph will

160

suddenly drop multiple hundreds of megabytes -- for ONE job!

161

162

Well, during the memory usage increase phase, each job will allocate more

163

and more memory, a chunk at a time.  It's possible (tho not likely from my

164

observations of this particular usage pattern) that an app could want X MB

165

of memory all at once, in ordered to complete the task.  Until it gets

166

that memory it can't go any further, the task it is trying to do is half

167

complete so it can't release any memory either, without losing what it has

168

already done.  If the allocation request is big enough, (or you have

169

several of them in parallel all at the same time that together are big

170

enough), it can cause the OOM to trigger even with what looks like quite a

171

bit of free memory left, because all available cache and other memory that

172

can be freed has already been freed, and no app can continue to the point

173

of being able to release memory, without grabbing some memory first.  If

174

one of them is wanting a LOT of memory, and the OOM killer isn't killing

175

it off first (there are various OOM killer algorithms out there, some

176

using different factors for picking the app to die than others), stuff

177

will start dieing to allow the app wanting all that memory to get it.

178

179

Of course, it could also be very plainly a screwed up VMM or OOM killer,

180

as well.  These things aren't exactly simple to get right... and if gcc

181

took an unexpected optimization that has side effects...

182

183

2.  There is memory and there is "memory", and then there is 'memory' and

184

"'memory'" and '"memory"' as well.  <g>

185

186

There is of course the obvious difference between real/physical and

187

swap/virtual memory, with real memory being far faster (while at the same

188

time being slower than L2 cache, which is slower than L1 cache, which is

189

slower than the registers, which can be accessed at full CPU speed, but

190

that's beside the point for this discussion).

191

192

That's only the tip of the iceberg, however.  From the software's

193

perspective, that division mainly affects locked memory vs swappable

194

memory.  The kernel is always locked memory -- it cannot be swapped, even

195

drivers that are never used, the reason it makes sense to keep your kernel

196

as small as possible, leaving more room in real memory for programs to

197

use.  Depending on your kernel and its configuration, various forms of

198

RAMDISK, ramfs vs tmpfs vs ... may be locked (or not).  Likewise, some

199

kernel patches and configs make it easier or harder for applications to

200

lock memory as well.  Maybe a complicating factor here is that you had a

201

lot of locked memory and the compile process required more locked memory

202

than was left?  I'm not sure how much locked memory a normal process on a

203

normal kernel can have, if any, but given both that and the fact that the

204

kernel you were running is unknown, it's a possibility.

205

206

Then there are the "memory zones".  Fortunately, amd64 is less complicated

207

in this respect than x86.  However, various memory zones do still exist,

208

and not only do some things require memory in a specific zone, but it can

209

be difficult to transfer in-use memory from one zone to another, even

210

where it COULD be placed in a different  zone.  Up until earlier this

211

year, it was often impossible to transfer memory between zones without

212

using the backing store (swap).  That was the /only/ way possible!

213

However, as I said, amd64 is less complicated in this respect than x86, so

214

memory zones weren't likely the issue here -- unless something was going

215

wrong, of of course.

216

217

Finally, there's the "contiguous memory" issue.  Right after boot, your

218

system has lots of free memory, in large blobs of contiguous pages.  It's

219

easy to get contiguous memory allocated in blocks of 256, 512, and 1024

220

pages at once.  As uptime increases, however, memory gets fragmented thru

221

normal use.  A system that has been up awhile will have far fewer 1024

222

page blocks immediately available for use, and fewer 512 and 256 page

223

blocks as well. Total memory available may be the same, but if it's all in

224

1 and 2 page blocks, it'll take some serious time to move stuff around to

225

allocate a 1024 page contiguous block -- if it's even possible to do at

226

all.  Given the type of memory access patterns I've observed during kde

227

merges with enablefinal on, while I'm not technically skilled enough to

228

verify my suspicions, of the listed possibilities which are those I know,

229

I believe this to be the most likely culprit, the reason the OOM killer

230

was activating even while swap (and possibly even main memory) was still

231

free.

232

233

I'm sure there are other variations on the theme, however, other memory

234

type restrictions, and it may have been one of /those/ that it just so

235

happened came up short at the time you needed it.  In any case, as should

236

be quite plain by now, a raw "available memory" number doesn't give

237

/anything/ /even/ /close/ to the entire picture, at the detail needed to

238

fully grok why the OOM killer was activating, when overall memory wasn't

239

apparently in short supply at all.

240

241

I should also mention those numbers I snipped.  I know enough to just

242

begin to make a bit of sense out of them, but not enough to /understand/

243

them, at least to the point of understanding what they are saying is

244

wrong.  You can see the contiguous memory block figures  for each of the

245

DMA and normal memory zones.  4kB pages, so the 1024 page blocks are 4MB. 

246

I just don't understand enough about the internals to grok either them or

247

this log snip, however.  I know the general theories and hopefully

248

explained them well enough, but don't know how they apply concretely. 

249

Perhaps someone else does.

250

251

--

252

Duncan - List replies preferred.   No HTML msgs.

253

"Every nonfree program has a lord, a master --

254

and if you use the program, he is your master."  Richard Stallman in

255

http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

256

257

258

--

259

gentoo-amd64@g.o mailing list

Gentoo Archives: gentoo-amd64

Replies

1	Hemmann, Volker Armin posted
2	<200509282235.32195.volker.armin.hemmann@××××××××××××.de>, excerpted
3	below, on Wed, 28 Sep 2005 22:35:32 +0200:
4
5	> Hi,
6	> when I try to emerge kdepim-3.4.2 with the kdeenablefinal use-flag I get
7	> a lot of oom-kills.
8	> I got them with 512mb, so I upgraded to 1gig and still have them. What
9	> puzzles me is, that I have a lot of swap free when it happens.. could
10	> someone please tell me, why the oom-killer becomes active, when there is
11	> still a lot of free swap?
12	> I am just an user, so using easy words would be much appreciated ;)
13	>
14	[snip]
15	>
16	> kernel is 2.6.13-r2
17	> I have 1gb of ram, and approximatly 1gb of swap.
18	>
19	> I emerged kdepim without kdeenablefinal, so there is no big pressure,
20	> I am
21	> just curious
22
23	There's something about the "lots of swap left" thing below. However,
24	that's theory, I'll cover the practical stuff first, leaving that aspect
25	for later.
26
27	kdeenablefinal requires HUGE amounts of memory, no doubt about it. I've
28	not had serious issues with my gig of memory (dual Opterons as you seem to
29	have), using kdeenablefinal here, but I've been doing things rather
30	different than you probably have, and any one of the things I've done
31	different may be the reason I haven't had the memory issue to the severity
32	you have.
33
34	1. I have swap entirely disabled.
35
36	Here was my reasosning (apart from the issue at hand). I was reading an
37	explanation of some of the aspects of the kernel VMM (virtual memory
38	manager) on LWN (Linux Weekly News, lwn.net), when I suddenly realized
39	that all the complexity they were describing I could probably do without,
40	by turning off swap, since I'd recently upgraded to a gig of RAM. I
41	reasoned that I normally ran a quarter to a third of that in application
42	memory, so even if I doubled normal use at times, I'd still have a third
43	of a gig of free memory available for cache. Further, I reasoned that if
44	something should use all that memory and STILL run out, it was likely a
45	runaway process, gobbling all the memory available, and that I might as
46	well have it activate the OOM killer at a gig, without further lugging the
47	system down, than at 2 G (or whatever), lugging the system down with a
48	swap storm so I couldn't do anything about it anyway. For the most part,
49	I've been quite happy with my decision, altho now that suspend is starting
50	to look like it'll work for dual CPU systems (suspend to RAM sort of
51	worked, for the first time here, early in the .13 rcs, but they reverted
52	it for .13 release, as it needed more work), I may enable swap again, if
53	only to get suspend to disk functionality.
54
55	Of course, I'm not saying disabling swap is the right thing for you, but
56	I've been happy with it, here. Anyway, a gig of RAM, swap disabled, so
57	the VMM complexity that's part of managing swap also disabled. It's
58	possible that's a factor, tho I'm guessing the stuff below is more likely.
59
60	2. Possibly the biggest factor is the KDE packages used. I'm using the
61	split-ebuilds, NOT the monolithic category packages. It's possible that's
62	the difference. Further, I don't have all the split-packages that compose
63	kdepim-meta merged. I have kmail and knode merged, with dependencies of
64	course, but don't have a handheld to worry about syncing to, so skipped
65	all those split-ebuilds that form part of kdepim-meta (and are part of the
66	monolithic ebuild), except where kmail/knode etc had them as dependencies.
67	Thus, no kitchensync, korn, kandy, kdepim-kresources, etc.
68
69	There are therefore two possibilities here. One is that one of the
70	individual apps I skipped requires more memory. The other is that the
71	monolithic ebuild you used does several things at once (possibly due to
72	your jobs setting, see below) where the split ebuilds do them in series,
73	therefore limiting the maximum memory required at a given moment.
74
75	3. I'm NOT using unsermake. For some reason, it hasn't worked for me
76	since KDE 3.2 or so. I've tried different versions, but always had either
77	an error, or despite my settings, the ebuild doesn't seem to register
78	unsermake and thus uses the normal make system. Unsermake is better at
79	parallellizing the various jobs, making more efficient use of multiple
80	CPUs, but also, given the memory required for enable final, likely causing
81	higher memory stress than ordinary gnu-make does. If you are using that
82	and it's otherwise working for you, that may be the difference.
83
84	The rest of the possibilities may or may not apply. You didn't include
85	the output of emerge info, so I can't compare the relevant info from
86	your system to mine. However, I suspect they /do/ apply, for reasons
87	which should be clear as I present them, below.
88
89	4. It appears (from the snipped stuff) you are running dual CPU (or a
90	single dual-core CPU). How many jobs do you have portage configured for?
91	With my dual-CPU system, I originally had four set, but after seeing what
92	KDE compiling with kdeenablefinal did to my memory resources, even a gig,
93	I decided I better reduce that to three! If you have four or more
94	parallel jobs set, THAT could very possibly be your problem, right there.
95	You can probably do four or more jobs OR kdeenablefinal, but not BOTH, at
96	least not BOTH, while running X and KDE at the same time!
97
98	I should mention that I sometimes run multiple emerges (each with three
99	jobs) in parallel. I DID run into OOM issues when trying to do that
100	with kmail and another large KDE package. Kmail is of course part of
101	kdepim, and my experience DOES confirm that it's one of the largest in
102	memory requirements, with kdeenablefinal set. I could emerge small things
103	in parallel with it, stuff like kworldwatch, say, but nothing major, like
104	konqueror. Thus, I can almost certainly say that six jobs will trigger
105	the OOM killer, when some of them are kmail, and could speculate that five
106	jobs would do it, at some point in the kmail compilation. Four jobs may
107	or may not work, but three did, for me, under the conditions explained in
108	the other six points, of course.
109
110	(Note that the unsermake thing could compound the issue here, because as I
111	said, it's better at finding things to run in parallel than the normal
112	make system is.)
113
114	5. I'm now running gcc-4.0.1, and have been compiling kde with
115	gcc-4.0.0-preX or later since kde-3.4.0. gcc-4.x is still package.mask-ed
116	on Gentoo, because some packages still don't compile with it. Of course,
117	that's easily worked around because Gentoo slots gcc, so I have the latest
118	gcc-3.4.x installed, in addition to gcc-4.x, and can (and do) easily
119	switch between them using gcc-config. However, the fact that gcc-4 is
120	still masked for Gentoo, means you probably aren't running it, while I am,
121	and that's what I compile kde with. The 4.x version is enough different
122	from 3.4.x that memory use can be expected to be rather different as well.
123	It's quite possible that the kdeenablefinal stuff requires even more
124	memory with gcc-3.x than it does with the 4.x I've been successfully
125	using.
126
127	6. It's also possible something else in the configuration affects
128	compile-time memory usage. There are CFLAGS, of course, and I'm also
129	running newer (and still masked, AFAIK) versions of binutils and glibc,
130	with patches specifically for gcc-4.
131
132	7. I don't do my kernels thru Gentoo, preferring instead to use the
133	kernel straight off of kernel.org, You say kernel 2.6.13-r2, the r2
134	indicating a Gentoo revision, but you don't say /which/ Gentoo kernel you
135	are running. The VMM is complex enough and has a wide enough variety of
136	patches circulating for it, that it's possible you hit a bug that wasn't
137	in the mainline kernel.org kernel that I'm running. Or... it may be some
138	other factor in our differing kernel configs.
139
140	...
141
142	Now to the theory. Why would OOM trigger when you had all that free swap?
143	There are two possible explanations I am aware of and maybe others that
144	I'm not.
145
146	1. "Memory allocation" is a verb as well as a noun.
147
148	We know that enablefinal uses lots of memory. The USE flag description
149	mentions that and we've discovered it to be /very/ true. If you run
150	ksysguard on your panel as I do, and monitor memory using it as I do (or
151	run a VT with a top session running if compiling at the text console), you
152	are also aware that memory use during compile sessions, particularly KDE
153	compile sessions with enablefinal set, varies VERY drastically! From my
154	observations, each "job" will at times eat more and more memory, until
155	with kmail in particular, multiple jobs are taking well over 200MB of
156	memory a piece! (See why I mentioned parallel jobs above? At 200,
157	possibly 300+ MB apiece, multiple parallel jobs eat up the memory VERY
158	fast!) After grabbing more and more memory for awhile, a job will
159	suddenly complete and release it ALL at once. The memory usage graph will
160	suddenly drop multiple hundreds of megabytes -- for ONE job!
161
162	Well, during the memory usage increase phase, each job will allocate more
163	and more memory, a chunk at a time. It's possible (tho not likely from my
164	observations of this particular usage pattern) that an app could want X MB
165	of memory all at once, in ordered to complete the task. Until it gets
166	that memory it can't go any further, the task it is trying to do is half
167	complete so it can't release any memory either, without losing what it has
168	already done. If the allocation request is big enough, (or you have
169	several of them in parallel all at the same time that together are big
170	enough), it can cause the OOM to trigger even with what looks like quite a
171	bit of free memory left, because all available cache and other memory that
172	can be freed has already been freed, and no app can continue to the point
173	of being able to release memory, without grabbing some memory first. If
174	one of them is wanting a LOT of memory, and the OOM killer isn't killing
175	it off first (there are various OOM killer algorithms out there, some
176	using different factors for picking the app to die than others), stuff
177	will start dieing to allow the app wanting all that memory to get it.
178
179	Of course, it could also be very plainly a screwed up VMM or OOM killer,
180	as well. These things aren't exactly simple to get right... and if gcc
181	took an unexpected optimization that has side effects...
182
183	2. There is memory and there is "memory", and then there is 'memory' and
184	"'memory'" and '"memory"' as well. <g>
185
186	There is of course the obvious difference between real/physical and
187	swap/virtual memory, with real memory being far faster (while at the same
188	time being slower than L2 cache, which is slower than L1 cache, which is
189	slower than the registers, which can be accessed at full CPU speed, but
190	that's beside the point for this discussion).
191
192	That's only the tip of the iceberg, however. From the software's
193	perspective, that division mainly affects locked memory vs swappable
194	memory. The kernel is always locked memory -- it cannot be swapped, even
195	drivers that are never used, the reason it makes sense to keep your kernel
196	as small as possible, leaving more room in real memory for programs to
197	use. Depending on your kernel and its configuration, various forms of
198	RAMDISK, ramfs vs tmpfs vs ... may be locked (or not). Likewise, some
199	kernel patches and configs make it easier or harder for applications to
200	lock memory as well. Maybe a complicating factor here is that you had a
201	lot of locked memory and the compile process required more locked memory
202	than was left? I'm not sure how much locked memory a normal process on a
203	normal kernel can have, if any, but given both that and the fact that the
204	kernel you were running is unknown, it's a possibility.
205
206	Then there are the "memory zones". Fortunately, amd64 is less complicated
207	in this respect than x86. However, various memory zones do still exist,
208	and not only do some things require memory in a specific zone, but it can
209	be difficult to transfer in-use memory from one zone to another, even
210	where it COULD be placed in a different zone. Up until earlier this
211	year, it was often impossible to transfer memory between zones without
212	using the backing store (swap). That was the /only/ way possible!
213	However, as I said, amd64 is less complicated in this respect than x86, so
214	memory zones weren't likely the issue here -- unless something was going
215	wrong, of of course.
216
217	Finally, there's the "contiguous memory" issue. Right after boot, your
218	system has lots of free memory, in large blobs of contiguous pages. It's
219	easy to get contiguous memory allocated in blocks of 256, 512, and 1024
220	pages at once. As uptime increases, however, memory gets fragmented thru
221	normal use. A system that has been up awhile will have far fewer 1024
222	page blocks immediately available for use, and fewer 512 and 256 page
223	blocks as well. Total memory available may be the same, but if it's all in
224	1 and 2 page blocks, it'll take some serious time to move stuff around to
225	allocate a 1024 page contiguous block -- if it's even possible to do at
226	all. Given the type of memory access patterns I've observed during kde
227	merges with enablefinal on, while I'm not technically skilled enough to
228	verify my suspicions, of the listed possibilities which are those I know,
229	I believe this to be the most likely culprit, the reason the OOM killer
230	was activating even while swap (and possibly even main memory) was still
231	free.
232
233	I'm sure there are other variations on the theme, however, other memory
234	type restrictions, and it may have been one of /those/ that it just so
235	happened came up short at the time you needed it. In any case, as should
236	be quite plain by now, a raw "available memory" number doesn't give
237	/anything/ /even/ /close/ to the entire picture, at the detail needed to
238	fully grok why the OOM killer was activating, when overall memory wasn't
239	apparently in short supply at all.
240
241	I should also mention those numbers I snipped. I know enough to just
242	begin to make a bit of sense out of them, but not enough to /understand/
243	them, at least to the point of understanding what they are saying is
244	wrong. You can see the contiguous memory block figures for each of the
245	DMA and normal memory zones. 4kB pages, so the 1024 page blocks are 4MB.
246	I just don't understand enough about the internals to grok either them or
247	this log snip, however. I know the general theories and hopefully
248	explained them well enough, but don't know how they apply concretely.
249	Perhaps someone else does.
250
251	--
252	Duncan - List replies preferred. No HTML msgs.
253	"Every nonfree program has a lord, a master --
254	and if you use the program, he is your master." Richard Stallman in
255	http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
256
257
258	--
259	gentoo-amd64@g.o mailing list