1 |
On Thursday 29 September 2005 09:14, Duncan wrote: |
2 |
> Hemmann, Volker Armin posted |
3 |
> <200509282235.32195.volker.armin.hemmann@××××××××××××.de>, excerpted |
4 |
> |
5 |
|
6 |
> kdeenablefinal requires HUGE amounts of memory, no doubt about it. I've |
7 |
> not had serious issues with my gig of memory (dual Opterons as you seem to |
8 |
> have), using kdeenablefinal here, but I've been doing things rather |
9 |
> different than you probably have, and any one of the things I've done |
10 |
> different may be the reason I haven't had the memory issue to the severity |
11 |
> you have. |
12 |
> |
13 |
|
14 |
yeah, but with my 32bit system even 512mb were enough for building kdepim with |
15 |
kdeenablefinal |
16 |
|
17 |
|
18 |
> |
19 |
> The rest of the possibilities may or may not apply. You didn't include |
20 |
> the output of emerge info, so I can't compare the relevant info from |
21 |
> your system to mine. However, I suspect they /do/ apply, for reasons |
22 |
> which should be clear as I present them, below. |
23 |
> |
24 |
> 4. It appears (from the snipped stuff) you are running dual CPU (or a |
25 |
> single dual-core CPU). How many jobs do you have portage configured for? |
26 |
> With my dual-CPU system, I originally had four set, but after seeing what |
27 |
> KDE compiling with kdeenablefinal did to my memory resources, even a gig, |
28 |
> I decided I better reduce that to three! If you have four or more |
29 |
> parallel jobs set, THAT could very possibly be your problem, right there. |
30 |
> You can probably do four or more jobs OR kdeenablefinal, but not BOTH, at |
31 |
> least not BOTH, while running X and KDE at the same time! |
32 |
> |
33 |
|
34 |
no, single cpu, single core. |
35 |
|
36 |
Here is my emerge info: |
37 |
ortage 2.0.52-r1 (default-linux/amd64/2005.1, gcc-3.4.4, glibc-2.3.5-r1, |
38 |
2.6.13-gentoo-r2 x86_64) |
39 |
================================================================= |
40 |
System uname: 2.6.13-gentoo-r2 x86_64 AMD Athlon(tm) 64 Processor 3200+ |
41 |
Gentoo Base System version 1.12.0_pre8 |
42 |
ccache version 2.4 [disabled] |
43 |
dev-lang/python: 2.3.5, 2.4.2 |
44 |
sys-apps/sandbox: 1.2.13 |
45 |
sys-devel/autoconf: 2.13, 2.59-r7 |
46 |
sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6 |
47 |
sys-devel/binutils: 2.16.1 |
48 |
sys-devel/libtool: 1.5.20 |
49 |
virtual/os-headers: 2.6.11-r2 |
50 |
ACCEPT_KEYWORDS="amd64 ~amd64" |
51 |
AUTOCLEAN="yes" |
52 |
CBUILD="x86_64-pc-linux-gnu" |
53 |
CFLAGS="-march=k8 -O2 -fweb -ftracer -fpeel-loops -msse3 -pipe" |
54 |
CHOST="x86_64-pc-linux-gnu" |
55 |
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /var/qmail/control" |
56 |
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" |
57 |
CXXFLAGS="-march=k8 -O2 -fweb -ftracer -fpeel-loops -msse3 -pipe" |
58 |
DISTDIR="/usr/portage/distfiles" |
59 |
FEATURES="autoconfig distlocks sandbox sfperms strict" |
60 |
GENTOO_MIRRORS="ftp://ftp.tu-clausthal.de/pub/linux/gentoo/" |
61 |
LC_ALL="de_DE@euro" |
62 |
LINGUAS="de" |
63 |
MAKEOPTS="-j2" |
64 |
PKGDIR="/usr/portage/packages" |
65 |
PORTAGE_TMPDIR="/var/tmp" |
66 |
PORTDIR="/usr/portage" |
67 |
SYNC="rsync://rsync.gentoo.org/gentoo-portage" |
68 |
USE="amd64 S3TC X acpi alsa audiofile avi bash-completion berkdb bitmap-fonts |
69 |
bluetooth bzip2 cairo cdparanoia cdr cpudetection crypt curl dvd dvdr dvdread |
70 |
emboss emul-linux-x86 encode exif ffmpeg fftw foomaticdb fortran ftp gif gimp |
71 |
glitz glut glx gnokii gpm gstreamer gtk gtk2 icq id3 imagemagick imlib irmc |
72 |
jabber java javascrip jp2 jpeg jpeg2k kde kdeenablefinal kdepim lame lesstif |
73 |
libwww lm_sensors lzo lzw lzw-tiff mad matroska mjpeg mmap mng motif mp3 mpeg |
74 |
mpeg2 mplayer mysql ncurses nls no-old-linux nocd nosendmail nowin nptl |
75 |
nsplugin nvidia offensive ogg openal opengl oscar pam pdflib perl player png |
76 |
posix python qt quicktime rar readline reiserfs scanner sdl sendfile |
77 |
sharedmem sms sndfile sockets spell ssl stencil-buffer subtitles svg sysfs |
78 |
tcpd tga theora tiff transcode truetype truetype-fonts type1 type1-fonts |
79 |
unicode usb userlocales v4l v4l2 vcd videos visualization vorbis wmf xanim |
80 |
xine xml xml2 xpm xrandr xsl xv xvid xvmc yv12 zlib zvbi linguas_de |
81 |
userland_GNU kernel_linux elibc_glibc" |
82 |
Unset: ASFLAGS, CTARGET, LANG, LDFLAGS, PORTDIR_OVERLAY |
83 |
|
84 |
as you can see, makeopts is at -j2 |
85 |
|
86 |
> |
87 |
> (Note that the unsermake thing could compound the issue here, because as I |
88 |
> said, it's better at finding things to run in parallel than the normal |
89 |
> make system is.) |
90 |
> |
91 |
> 5. I'm now running gcc-4.0.1, and have been compiling kde with |
92 |
> gcc-4.0.0-preX or later since kde-3.4.0. gcc-4.x is still package.mask-ed |
93 |
> on Gentoo, because some packages still don't compile with it. Of course, |
94 |
> that's easily worked around because Gentoo slots gcc, so I have the latest |
95 |
> gcc-3.4.x installed, in addition to gcc-4.x, and can (and do) easily |
96 |
> switch between them using gcc-config. However, the fact that gcc-4 is |
97 |
> still masked for Gentoo, means you probably aren't running it, while I am, |
98 |
> and that's what I compile kde with. The 4.x version is enough different |
99 |
> from 3.4.x that memory use can be expected to be rather different as well. |
100 |
> It's quite possible that the kdeenablefinal stuff requires even more |
101 |
> memory with gcc-3.x than it does with the 4.x I've been successfully |
102 |
> using. |
103 |
|
104 |
hm, I read some stuff on anandtech, that shows, that the apps compiled with |
105 |
gcc4 are a LOT slower than apps compiled with 3.4 on the amd64 platform. So I |
106 |
stay away from it, until I see some numbers, that convince me to the opposite |
107 |
- and until I can be sure, that almost everything builds with it ;) |
108 |
|
109 |
|
110 |
> |
111 |
> 7. I don't do my kernels thru Gentoo, preferring instead to use the |
112 |
> kernel straight off of kernel.org, You say kernel 2.6.13-r2, the r2 |
113 |
> indicating a Gentoo revision, but you don't say /which/ Gentoo kernel you |
114 |
> are running. The VMM is complex enough and has a wide enough variety of |
115 |
> patches circulating for it, that it's possible you hit a bug that wasn't |
116 |
> in the mainline kernel.org kernel that I'm running. Or... it may be some |
117 |
> other factor in our differing kernel configs. |
118 |
|
119 |
yes I said, at the bottom of my mail: |
120 |
kernel is 2.6.13-r2 |
121 |
|
122 |
> ... |
123 |
> |
124 |
> Now to the theory. Why would OOM trigger when you had all that free swap? |
125 |
> There are two possible explanations I am aware of and maybe others that |
126 |
> I'm not. |
127 |
> |
128 |
> 1. "Memory allocation" is a verb as well as a noun. |
129 |
> |
130 |
> We know that enablefinal uses lots of memory. The USE flag description |
131 |
> mentions that and we've discovered it to be /very/ true. If you run |
132 |
> ksysguard on your panel as I do, and monitor memory using it as I do (or |
133 |
> run a VT with a top session running if compiling at the text console), you |
134 |
> are also aware that memory use during compile sessions, particularly KDE |
135 |
> compile sessions with enablefinal set, varies VERY drastically! From my |
136 |
> observations, each "job" will at times eat more and more memory, until |
137 |
> with kmail in particular, multiple jobs are taking well over 200MB of |
138 |
> memory a piece! (See why I mentioned parallel jobs above? At 200, |
139 |
> possibly 300+ MB apiece, multiple parallel jobs eat up the memory VERY |
140 |
> fast!) After grabbing more and more memory for awhile, a job will |
141 |
> suddenly complete and release it ALL at once. The memory usage graph will |
142 |
> suddenly drop multiple hundreds of megabytes -- for ONE job! |
143 |
|
144 |
i watched the memory consumption with gkrellm2. |
145 |
At first, there were several hundered mb free, dropping fast to ~150mb free, |
146 |
which droppend slower to 20-50mb free. There it was 'locked' for some time, |
147 |
when suddenly the oom-killer sprang in (I did not watch gkrellm continously, |
148 |
even with a 3200+ kdepim takes more time to built, than I can watch gkrellm |
149 |
without a break). But the behaviour was the same for 512mb or 1 |
150 |
gb of ram. |
151 |
|
152 |
> Well, during the memory usage increase phase, each job will allocate more |
153 |
> and more memory, a chunk at a time. It's possible (tho not likely from my |
154 |
> observations of this particular usage pattern) that an app could want X MB |
155 |
> of memory all at once, in ordered to complete the task. Until it gets |
156 |
> that memory it can't go any further, the task it is trying to do is half |
157 |
> complete so it can't release any memory either, without losing what it has |
158 |
> already done. If the allocation request is big enough, (or you have |
159 |
> several of them in parallel all at the same time that together are big |
160 |
> enough), it can cause the OOM to trigger even with what looks like quite a |
161 |
> bit of free memory left, because all available cache and other memory that |
162 |
> can be freed has already been freed, and no app can continue to the point |
163 |
> of being able to release memory, without grabbing some memory first. If |
164 |
> one of them is wanting a LOT of memory, and the OOM killer isn't killing |
165 |
> it off first (there are various OOM killer algorithms out there, some |
166 |
> using different factors for picking the app to die than others), stuff |
167 |
> will start dieing to allow the app wanting all that memory to get it. |
168 |
> |
169 |
> Of course, it could also be very plainly a screwed up VMM or OOM killer, |
170 |
> as well. These things aren't exactly simple to get right... and if gcc |
171 |
> took an unexpected optimization that has side effects... |
172 |
> |
173 |
> 2. There is memory and there is "memory", and then there is 'memory' and |
174 |
> "'memory'" and '"memory"' as well. <g> |
175 |
> |
176 |
> There is of course the obvious difference between real/physical and |
177 |
> swap/virtual memory, with real memory being far faster (while at the same |
178 |
> time being slower than L2 cache, which is slower than L1 cache, which is |
179 |
> slower than the registers, which can be accessed at full CPU speed, but |
180 |
> that's beside the point for this discussion). |
181 |
> |
182 |
> That's only the tip of the iceberg, however. From the software's |
183 |
> perspective, that division mainly affects locked memory vs swappable |
184 |
> memory. The kernel is always locked memory -- it cannot be swapped, even |
185 |
> drivers that are never used, the reason it makes sense to keep your kernel |
186 |
> as small as possible, leaving more room in real memory for programs to |
187 |
> use. Depending on your kernel and its configuration, various forms of |
188 |
> RAMDISK, ramfs vs tmpfs vs ... may be locked (or not). Likewise, some |
189 |
> kernel patches and configs make it easier or harder for applications to |
190 |
> lock memory as well. Maybe a complicating factor here is that you had a |
191 |
> lot of locked memory and the compile process required more locked memory |
192 |
> than was left? I'm not sure how much locked memory a normal process on a |
193 |
> normal kernel can have, if any, but given both that and the fact that the |
194 |
> kernel you were running is unknown, it's a possibility. |
195 |
|
196 |
I don't use ramdisks, and the only tempfs user is udev - with ~180kb used. |
197 |
|
198 |
> |
199 |
> Then there are the "memory zones". Fortunately, amd64 is less complicated |
200 |
> in this respect than x86. However, various memory zones do still exist, |
201 |
> and not only do some things require memory in a specific zone, but it can |
202 |
> be difficult to transfer in-use memory from one zone to another, even |
203 |
> where it COULD be placed in a different zone. Up until earlier this |
204 |
> year, it was often impossible to transfer memory between zones without |
205 |
> using the backing store (swap). That was the /only/ way possible! |
206 |
> However, as I said, amd64 is less complicated in this respect than x86, so |
207 |
> memory zones weren't likely the issue here -- unless something was going |
208 |
> wrong, of of course. |
209 |
> |
210 |
> Finally, there's the "contiguous memory" issue. Right after boot, your |
211 |
> system has lots of free memory, in large blobs of contiguous pages. It's |
212 |
> easy to get contiguous memory allocated in blocks of 256, 512, and 1024 |
213 |
> pages at once. As uptime increases, however, memory gets fragmented thru |
214 |
> normal use. A system that has been up awhile will have far fewer 1024 |
215 |
> page blocks immediately available for use, and fewer 512 and 256 page |
216 |
> blocks as well. Total memory available may be the same, but if it's all in |
217 |
> 1 and 2 page blocks, it'll take some serious time to move stuff around to |
218 |
> allocate a 1024 page contiguous block -- if it's even possible to do at |
219 |
> all. Given the type of memory access patterns I've observed during kde |
220 |
> merges with enablefinal on, while I'm not technically skilled enough to |
221 |
> verify my suspicions, of the listed possibilities which are those I know, |
222 |
> I believe this to be the most likely culprit, the reason the OOM killer |
223 |
> was activating even while swap (and possibly even main memory) was still |
224 |
> free. |
225 |
> |
226 |
> I'm sure there are other variations on the theme, however, other memory |
227 |
> type restrictions, and it may have been one of /those/ that it just so |
228 |
> happened came up short at the time you needed it. In any case, as should |
229 |
> be quite plain by now, a raw "available memory" number doesn't give |
230 |
> /anything/ /even/ /close/ to the entire picture, at the detail needed to |
231 |
> fully grok why the OOM killer was activating, when overall memory wasn't |
232 |
> apparently in short supply at all. |
233 |
> |
234 |
> I should also mention those numbers I snipped. I know enough to just |
235 |
> begin to make a bit of sense out of them, but not enough to /understand/ |
236 |
> them, at least to the point of understanding what they are saying is |
237 |
> wrong. You can see the contiguous memory block figures for each of the |
238 |
> DMA and normal memory zones. 4kB pages, so the 1024 page blocks are 4MB. |
239 |
> I just don't understand enough about the internals to grok either them or |
240 |
> this log snip, however. I know the general theories and hopefully |
241 |
> explained them well enough, but don't know how they apply concretely. |
242 |
> Perhaps someone else does. |
243 |
> |
244 |
|
245 |
thanks for your time - I will try vanilla kernel.org kernels this weekend and |
246 |
if there is any difference, I will post again. |
247 |
|
248 |
|
249 |
Glück Auf |
250 |
Volker |
251 |
|
252 |
-- |
253 |
gentoo-amd64@g.o mailing list |