Gentoo Archives: gentoo-amd64

From: Bernhard Auzinger <e0026053@×××××××××××××××××.at>
To: gentoo-amd64@l.g.o
Subject: Re: [gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite
Date: Thu, 09 Feb 2006 19:08:37
Message-Id: 200602092009.41427.e0026053@student.tuwien.ac.at
In Reply to: [gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite by Duncan <1i5t5.duncan@cox.net>
1 May I put my oar into your optimisation dicussion.
2
3 It's funny, Duncan. On the one side you are saving every byte of cpu-cache. On
4 the other side, you are happy by having forked bashes in your main memory.
5 But how do you take control about that? I mean, how do you get the code of
6 your forked bashes away from your cpu cache to have it free for kernel code?
7
8 A long time ago . . ., I was testing some CFLAGS on my own programs. I wrote a
9 fast-fourier algorithm myself, only to see the "impressive" difference
10 between Os, O3 and some other optimisation flags. I fed my fast-fourier
11 algorithm with a large amount of input. But no matter how hard I tried to get
12 it faster by changing the flags, it didn't work. The difference is marginal
13 and not every flag brings improvement for every program. The only thing that
14 changed a lot was the time gcc needs to perform those optimisations.
15
16 Bernhard
17 Am Donnerstag 09 Februar 2006 01:17 schrieb Duncan:
18 > Simon Stelling posted <43EA568D.6020307@g.o>, excerpted below, on
19 >
20 > Wed, 08 Feb 2006 21:37:33 +0100:
21 > > Duncan wrote:
22 > >> I should really create a page listing all the little Gentoo admin
23 > >> scripts I've come up with and how I use them. I'm sure a few folks
24 > >> anyway would likely find them useful.
25 > >>
26 > >> The idea behind most of them is to create shortcuts to having to type in
27 > >> long emerge lines, with all sorts of arbitrary command line parameters.
28 > >> The majority of these fall into two categories, ea* and ep*, short for
29 > >> emerge --ask <additional parameters> and emerge --pretend ... . Thus, I
30 > >> have epworld and eaworld, the pretend and ask versions of emerge -NuDv
31 > >> world, epsys and easys, the same for system, eplog <package>, emerge
32 > >> --pretend --log --verbose (package name to be added to the command line
33 > >> so eplog gcc, for instance, to see the changes between my current and
34 > >> the new version of gcc), eptree <package>, to use the tree output, etc.
35 > >
36 > > Interesting. But why do you use scripts and not simple aliases? Every
37 > > time you launch your script the HD performs a seek (which is very
38 > > expensive in time), copies the script into memory and then forks a whole
39 > > bash process to execute a one-liner. Using alias, which is a bash
40 > > built-in, wouldn't fork a process and therefore be much faster.
41 >
42 > My thinking, which is possibly incorrect (your input appreciated), is that
43 > file-based scripts get pulled into cache the first time they are executed,
44 > and will remain there (with a gig of memory) pretty much until I'm done
45 > doing my upgrades. At the same time, they are simply in cache, not
46 > something in bash's memory, so if the memory is needed, it will be
47 > reclaimed. As well, after I'm done and on to other tasks, the cached
48 > commands will eventually be replaced by other data, if need be.
49 >
50 > Aliases (and bash-functions) are held in memory. That's not as flexible
51 > as cache in terms of being knocked out of memory if the memory is needed
52 > by other things. Sure, that memory may be flushed to disk-based swap, but
53 > that's disk based the same as the actual script files I'm using, so
54 > reading it back into main memory if it's faulted out will take something
55 > comparable to the time it'd take to read in the script file again anyway.
56 > That's little gain, with the additional overhead and therefore loss of
57 > having to manage the temp-copy in swapped memory, if it comes to that.
58 >
59 > Actually, there are some details here that may affect things. I don't
60 > know enough about the following factors to be able to evaluate how they
61 > balance out, but the real reason I chose individual scripts is below.
62 >
63 > One, here anyway, tho not on most systems, I'm running four SATA disks in
64 > RAID. The swap is actually not on the RAID, as the kernel manages it like
65 > RAID on its own, provided all four swap areas are set to the same priority
66 > (they are), which means swap is running on the equivalent of
67 > four-way-striped RAID-0. Meanwhile, the scripts, as part of my main
68 > system, are on RAID-6 for redundancy, so with the same four disks backing
69 > the RAID-6 as the swap, I've only effectively two-way-striped storage
70 > there, the other two disk stripes being parity. Thus, retrieval from the
71 > 4-way-striped swap should in theory be more efficient than from the
72 > 2-way-striped regular storage. OTOH, the granularity of the stripe
73 > in either case, against the size of the one or two-line script, likely
74 > means that it'll be pulled from a single stripe (at the speed of
75 > reading from a single disk, tho there are parallelizing opportunities
76 > not available on a single disk). It's also likely that the swap will be
77 > more optimally managed for fast retrieval than the location on the regular
78 > filesystem is. Balanced against that we have the overhead of maintaining
79 > the swap tracking.
80 >
81 > That's assuming it would swap that out to the dedicated swap in the first
82 > place. I'm not familiar with Linux's VM, but given that the aliases and
83 > functions would be file-based in either case, it's possible it would
84 > simply drop the data from main memory, relying on the fact that that the
85 > data is clean file-backed data and could be read-in directly from the
86 > files again, if necessary, rather than bothering with actually creating a
87 > temporary copy of the /same/ data in swap, taking time to do so when it
88 > could just read it back in from the file.
89 >
90 > Another aspect is the effect of data vs metadata caching. Again, I'm not
91 > familiar with how Linux manages this, and indeed, it may differ between
92 > filesystems, but the idea is that if the file metadata is still cached,
93 > even if the file itself isn't, it's a single disk seek and read to read
94 > the data back in, as opposed to multiple seeks and reads, following the
95 > logical directory structure to fetch each directory table in the
96 > hierarchy until it reaches the entry that actually has the file location,
97 > before it can read the file itself, to read the file initially, or if the
98 > location metadata has been flushed as well. (Back several years ago on
99 > MSWormOS, one of the first things I always did after a reinstall was set
100 > the system to server profile, which kept a far larger metadata cache, on
101 > the theory that the metadata was usually smaller than the data, and for
102 > dirs, sharable among many data files, so I'd rather spend cache memory on
103 > metadata than data. The other choices were the default desktop profile,
104 > and laptop, a much smaller metadata cache. I originally learned about
105 > these as a result of reading about a bug in the original 95 as shipped,
106 > that swapped some entries in the registry, and therefore cached FAR less
107 > metadata than it should have. I don't know where these tweaks are located
108 > on Linux, or how to go about adjusting them safely.)
109 >
110 > Basically, therefore, I don't believe aliases to be a big positive, and
111 > possibly somewhat of a negative, as opposed to scripts, because the
112 > scripts will be cached in most cases after initial use anyway, yet they
113 > have the advantage of not having to be maintained or tracked in memory
114 > when I'm doing other tasks and the system needs that cache.
115 >
116 > Given that I don't believe it's a big positive, I prefer the
117 > administrative convenience and maintainability of separate scripts.
118 >
119 > There /is/ a third alternative, that I came across recently, that I think
120 > is a good idea. If you'd coomment, perhaps it would help me sort out the
121 > implications.
122 >
123 > The idea, simply put, is "bash command theming", single scripts that can
124 > be invoked that will "theme" a command prompt for the tasks at hand. I
125 > didn't read the entire article I saw covering this, but skimmed it enough
126 > to get the gist. A single invokable script for each set of tasks, say
127 > perl programming, bash programming, working with portage, etc, that would
128 > set up a specific set of aliases and functions for that task. Invoking
129 > the script with the "off" parameter would erase that set of aliases and
130 > bash functions, thereby recovering the memory, and do any related cleanup
131 > like resetting the path if necessary to exclude any task specific
132 > commands. Taking this a step further, a variable could be setup that
133 > would list the theme or themes that were active, that the theme-setup
134 > script could read and automatically deactivate the previous theme while
135 > switching to the new one. One could even share functionality between
136 > themes, sourcing common files, which would check the active theme and
137 > adjust their behavior based on the active theme.
138 >
139 > This alias and function theming wouldn't be quite as modular (tho with
140 > sourcing it could be) as the individual scripts, but would maintain the
141 > performance advantages (if any) of the alias/function idea, while at the
142 > same time allowing the memory reclamation of the cached-script option. It
143 > sounds really good, but I'm not yet convinced the benefits would be worth
144 > the additional effort of setting up those themes, since the solution I
145 > have works.
146 >
147 > One VERY NICE benefit of the themes idea is that it would directly
148 > address any namespace pollution concerns. It has a direct appeal to
149 > programmers and anyone else that's ever had to deal with such issues, for
150 > that reason alone. One single command on the path to invoke the theme,
151 > possibly even an eselect-like command shared among themes, with
152 > everything else off-path and out of the namespace unless that theme is
153 > invoked! /VERY/ appealing indeed. OTOH, there are those who'll never
154 > remember the theme they have active at the moment, and be constantly
155 > confused. For these folks, it'd be a nightmare!
156 >
157 > > man emerge:
158 > > --oneshot (-1)
159 > >
160 > > IIRC --oneshot has a short form since 2.0.52 was released.
161 >
162 > Learn new things everyday. Thanks! I remember how pleased I was to have
163 > --newuse, and even more so when I discovered -N, so very nice!
164 >
165 > >> ... Deep breath... <g>
166 > >>
167 > >> All that as a preliminary explanation to this: Along with the above, I
168 > >> have a set of efetch functions, that invoke the -f form, so just do the
169 > >> fetch, not the actual compile and merge, and esyn (there's already an
170 > >> esync function in something or other I have merged so I just call it
171 > >> esyn), which does emerge sync, then updates the esearch db, then
172 > >> automatically fetches all the packages that an eaworld would want to
173 > >> update, so they are ready for me to merge at my leisure.
174 > >
175 > > I'm a bit confused now. You use *functions* to do that? Or do you mean
176 > > scripts? By the way: with alias you could name your custom "script"
177 > > esync because it doesn't place a file on the harddisk.
178 >
179 > Scripts. I was using "functions" in the generic sense here. I did
180 > realize before I sent that it had a dual meaning, but figured it wasn't
181 > important enough a distinction to go back and correct, or explain.
182 > Unfortunately, every time I decide to skip something like that, I get
183 > called on it, which doesn't help my posts get any shorter! =8^)
184 >
185 > >> I choose -Os, optimize for size, because a modern CPU and the various
186 > >> cache levels are FAR faster than main memory.
187 > >
188 > > Given the fact that two CPUs, only differing in L2 Cache size, have
189 > > nearly the same performance, I doubt that the performance increase is
190 > > very big. Some interesting figures:
191 > >
192 > > Athlon64 something (forgot what, but shouldn't matter anyway) with 1 MB
193 > > L2-cache is 4% faster than an Athlon64 of the same frequency but with
194 > > only 512kB L2-cache. The bigger the cache sizes you compare get, the
195 > > smaller the performance increase. Since you run a dual Opteron system
196 > > with 1 MB L2 cache per CPU I tend to say that the actual performance
197 > > increase you experience is about 3%. But then I didn't take into account
198 > > that -Os leaves out a few optimizations which would be included by -O2,
199 > > the default optimization level, which actually makes the code a bit
200 > > slower when compared to -O2. So, the performance increase you really
201 > > experience shrinks to about 0-2%. I'd tend to proclaim that -O2 is even
202 > > faster for most of the code, but that's only my feeling.
203 >
204 > Interesting, indeed. I'd counter that it likely has to do with how many
205 > tasks are being juggled as well, plus the number of kernel/user context
206 > switches, of course. I wonder under what load, and with what task-type,
207 > the above 4% difference was measured.
208 >
209 > Of course, the definitive way to end the argument would be to do some
210 > profiling and get some hard numbers, but I don't think either you or I
211 > consider it an important enough factor in our lives to go to /that/ sort
212 > of trouble. <g>
213 >
214 > > Beside that I should mention that -Os sometimes still has problems with
215 > > huge packages like glibc.
216 >
217 > Interestingly enough, while Gentoo's glibc ebuilds stripflags to -O2, I
218 > did try it with all that stripflags logic disabled. For glibc, it /does/
219 > seem to slow things down, or did back with gcc-3.3 (IIRC) anyway. I tried
220 > the same glibc both ways. I would have tried tinkering further, but
221 > decided it wasn't worth complicating debugging and the like, since glibc
222 > is loaded by virtually everything, and I'd never be able to tell if it was
223 > my funny tweaks to glibc, or some actual issue with whatever package.
224 > Besides, that's an aweful costly package, in terms of recompile time, not
225 > to mention system stability, to be experimenting with. I /can/ say,
226 > however, that it didn't crash or cause any other issues I could see or
227 > attribute to it.
228 >
229 > OTOH, I haven't tried it with xorg-modular yet, but the monolithic xorg
230 > builds seemed to perform better with -Os. I tried one of them (6.8??)
231 > both ways too. I ended up routinely killing the stripflags logic, but I
232 > was modifying other portions of the ebuild as well (so it compiled only
233 > the ATI video driver, and only installed the 100-dpi fonts, not 75-dpi,
234 > among other things), so that was just one of several modifications I was
235 > making, tho the only real performance affecting one. Performance in X was
236 > better, but it DID take longer to switch to a VT, when I tried that. In
237 > fact, at one point, the switch to VT functionality broke, but someone
238 > mentioned it was broken in general at that point for certain drivers,
239 > anyway, so I'm not sure my optimizations had anything to do with it.
240 >
241 > >> Of course, this is theory, and the practical case can and will differ
242 > >> depending on the instructions actually being compiled. In particular,
243 > >> streaming media apps and media encoding/decoding are likely to still
244 > >> benefit from the traditional loop elimination style optimizations,
245 > >> because they run thru so much data already, that cache is routinely
246 > >> trashed anyway, regardless of the size of your instructions. As well,
247 > >> that type of application tends to have a LOT of looping instructions to
248 > >> optimize!
249 > >>
250 > >> By contrast, something like the kernel will benefit more than usual
251 > >> from size optimization. First, it's always memory locked and as such
252 > >> can't be swapped, and even "slow" main memory is still **MANY**
253 > >> **MANY** times faster than swap, so a smaller kernel means more other
254 > >> stuff fits into main memory with it, and isn't swapped as much. Second,
255 > >> parts of the
256 > >
257 > > Funny to hear this from somebody with 4 GB RAM in his system. I don't
258 > > know how bloated your kernel is, but even if -Os would reduce the size
259 > > of my kernel to **the half**, which is totally impossible, it wouldn't
260 > > be enough to load the mail I am just answering into RAM. So, basically,
261 > > this reasoning is just ridiculous.
262 >
263 > I won't argue with that. BTW, still at a gig, much to my frustration! I
264 > put off upgrading memory when I decided my disk was in danger of going bad
265 > and I ended up deciding to go 4-disk SATA based RAID. Then I upgraded my
266 > stereo near Christmas... Now the CC is almost paid off again, so I'm
267 > looking at that memory upgrade again.
268 >
269 > Much to my frustration, memory prices don't seem to be dropping much
270 > lately!
271 >
272 > > You are referring a lot to the gcc manpage, but obviously you missed
273 > > this part:
274 > >
275 > > -fomit-frame-pointer
276 > > Don't keep the frame pointer in a register for functions that
277 > > don't need one. This avoids the instructions to save, set up
278 > > and restore frame pointers; it also makes an extra register
279 > > available in many functions. It also makes debugging
280 > > impossible on some machines.
281 > >
282 > > On some machines, such as the VAX, this flag has no effect,
283 > > because the standard calling sequence automatically handles
284 > > the frame pointer and nothing is saved by pretending it
285 > > doesn't exist. The machine-description macro
286 > > "FRAME_POINTER_REQUIRED" controls whether a target machine
287 > > supports this flag.
288 > >
289 > > Enabled at levels -O, -O2, -O3, -Os.
290 > >
291 > > I have to say that I am a bit disappointed now. You seemed to be one of
292 > > those people who actually inform themselves before sticking new flags
293 > > into their CFLAGS.
294 >
295 > ??
296 >
297 > I'm not sure which way you mean this. It was in my CFLAGS list, but I
298 > didn't discuss it as it's fairly common (from my observation, nearly as
299 > common as -pipe) and seems fairly non-controversial on Gentoo. Did you
300 > miss it in my CFLAGS and are saying I should be using it, or did you see
301 > it and are saying its unnecessary and redundant because it's enabled by
302 > the -Os?
303 >
304 > If the latter, yes, but as mentioned above in the context of glibc, -Os is
305 > sometimes stripped. In that case, the redundancy of having the basic
306 > -fomit-frame-pointer is useful, unless it's also stripped, but as I said,
307 > it seems much less controversial than some flags and is often
308 > specifically allowed where most are stripped.
309 >
310 > Or, are you saying I should avoid it due to the debugging implications? I
311 > don't quite get it.
312 >
313 > >> !!! Relying on the shell to locate gcc, this may break !!! DISTCC,
314 > >> installing gcc-config and setting your current gcc !!! profile will fix
315 > >> this
316 > >>
317 > >> Another warning, likewise to stderr and thus not in the eis output.
318 > >> This one is due to the fact that eselect, the eventual systemwide
319 > >> replacement for gcc-config and a number of other commands, uses a
320 > >> different method to set the compiler than gcc-config did, and portage
321 > >> hasn't been adjusted to full compatibility just yet. Portage finds the
322 > >> proper gcc just fine for itself, but there'd be problems if distcc was
323 > >> involved, thus the warning.
324 > >
325 > > Didn't know about this. Have you filed a bug yet on the topic? Or is
326 > > there already one?
327 >
328 > There is one. I don't recall if I filed it or if it was already there,
329 > but both JH and the portage folks know about the issue. IIRC, the portage
330 > folks decided it was their side that needed changed, but that required
331 > changes to the distcc package, and I don't know how that has gone since I
332 > don't use distcc, except that I was slightly surprised to see the warning
333 > in portage 2.1 still.
334 >
335 > >> MAKEOPTS="-j4"
336 > >>
337 > >> The four jobs is nice for a dual-CPU system -- when it works.
338 > >> Unfortunately, the unpack and configure steps are serialized, so the
339 > >> jobs option does little good, there. To make most efficient use of the
340 > >> available cycles when I have a lot to merge, therefore, I'll run as
341 > >> many as five merges in parallel. I do this quite regularly with KDE
342 > >> upgrades like the one to 3.5.1, where I use the split KDE ebuilds and
343 > >> have something north of 100 packages to merge before KDE is fully
344 > >> upgraded.
345 > >
346 > > I really wonder how you would paralellize unpacking and configuring a
347 > > package.
348 >
349 > That's what was nice about configcache, which was supposed to be in the
350 > next portage, but I haven't seen or heard anything about it for awhile,
351 > and the next portage, 2.1, is what I'm using. configcache seriously
352 > shortened that stage of the build, leaving more of it parallelized, but...
353 >
354 > I was using it for awhile, patching successive versions of portage, but it
355 > broke about the time sandbox split, the dev said he wasn't maintaining the
356 > old version since it was going in the new portage, and I tried updating
357 > the patch but eventually ran into what I think were unrelated issues but
358 > decided to drop that in one of my troubleshooting steps and never picked
359 > it up again.
360 >
361 > I'd certainly like to have it back again, tho. If it's working in 2.1,
362 > I've not seen it documented or seen any hints in the emerge output, as
363 > were there before. You seen or heard anything?
364 >
365 > BTW, what is your opinion on -ftracer? Several devs I've noticed use it,
366 > but the manpage says it's not that useful without active profiling, which
367 > means compiling, profiling, and recompiling, AFAIK. It's possible the
368 > devs running it do that, but I doubt it, and otherwise, I don't see that
369 > it should be that useful? I don't know if you run it, but since I've got
370 > your attention, I thought I'd ask what you think about it. Is there
371 > something of significance I'm missing, or are they, or are they actually
372 > doing that compile/profile/recompile thing? It just doesn't make sense to
373 > me. I've seen it in several user posted CFLAGS as well, but I'll bet a
374 > good portion of them are simply because they saw it in a dev's CFLAGS and
375 > decided it looked useful, not because they understand any implications
376 > stated in the manpage. (Not that I always do either, but... <g>)
377 >
378 > --
379 > Duncan - List replies preferred. No HTML msgs.
380 > "Every nonfree program has a lord, a master --
381 > and if you use the program, he is your master." Richard Stallman in
382 > http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
383 --
384 gentoo-amd64@g.o mailing list

Replies

Subject Author
Re: [gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite "Kevin F. Quinn (Gentoo)" <kevquinn@g.o>
[gentoo-amd64] Re: Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite Duncan <1i5t5.duncan@×××.net>