Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite
Date: Thu, 09 Feb 2006 00:20:28
Message-Id: pan.2006.02.09.00.17.14.495666@cox.net
In Reply to: Re: [gentoo-amd64] Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite by Simon Stelling
1 Simon Stelling posted <43EA568D.6020307@g.o>, excerpted below, on
2 Wed, 08 Feb 2006 21:37:33 +0100:
3
4 > Duncan wrote:
5
6 >> I should really create a page listing all the little Gentoo admin scripts
7 >> I've come up with and how I use them. I'm sure a few folks anyway would
8 >> likely find them useful.
9 >>
10 >> The idea behind most of them is to create shortcuts to having to type in
11 >> long emerge lines, with all sorts of arbitrary command line parameters.
12 >> The majority of these fall into two categories, ea* and ep*, short for
13 >> emerge --ask <additional parameters> and emerge --pretend ... . Thus, I
14 >> have epworld and eaworld, the pretend and ask versions of emerge -NuDv
15 >> world, epsys and easys, the same for system, eplog <package>, emerge
16 >> --pretend --log --verbose (package name to be added to the command line so
17 >> eplog gcc, for instance, to see the changes between my current and the new
18 >> version of gcc), eptree <package>, to use the tree output, etc.
19 >
20 > Interesting. But why do you use scripts and not simple aliases? Every time you
21 > launch your script the HD performs a seek (which is very expensive in time),
22 > copies the script into memory and then forks a whole bash process to execute a
23 > one-liner. Using alias, which is a bash built-in, wouldn't fork a process and
24 > therefore be much faster.
25
26 My thinking, which is possibly incorrect (your input appreciated), is that
27 file-based scripts get pulled into cache the first time they are executed,
28 and will remain there (with a gig of memory) pretty much until I'm done
29 doing my upgrades. At the same time, they are simply in cache, not
30 something in bash's memory, so if the memory is needed, it will be
31 reclaimed. As well, after I'm done and on to other tasks, the cached
32 commands will eventually be replaced by other data, if need be.
33
34 Aliases (and bash-functions) are held in memory. That's not as flexible
35 as cache in terms of being knocked out of memory if the memory is needed
36 by other things. Sure, that memory may be flushed to disk-based swap, but
37 that's disk based the same as the actual script files I'm using, so
38 reading it back into main memory if it's faulted out will take something
39 comparable to the time it'd take to read in the script file again anyway.
40 That's little gain, with the additional overhead and therefore loss of
41 having to manage the temp-copy in swapped memory, if it comes to that.
42
43 Actually, there are some details here that may affect things. I don't
44 know enough about the following factors to be able to evaluate how they
45 balance out, but the real reason I chose individual scripts is below.
46
47 One, here anyway, tho not on most systems, I'm running four SATA disks in
48 RAID. The swap is actually not on the RAID, as the kernel manages it like
49 RAID on its own, provided all four swap areas are set to the same priority
50 (they are), which means swap is running on the equivalent of
51 four-way-striped RAID-0. Meanwhile, the scripts, as part of my main
52 system, are on RAID-6 for redundancy, so with the same four disks backing
53 the RAID-6 as the swap, I've only effectively two-way-striped storage
54 there, the other two disk stripes being parity. Thus, retrieval from the
55 4-way-striped swap should in theory be more efficient than from the
56 2-way-striped regular storage. OTOH, the granularity of the stripe
57 in either case, against the size of the one or two-line script, likely
58 means that it'll be pulled from a single stripe (at the speed of
59 reading from a single disk, tho there are parallelizing opportunities
60 not available on a single disk). It's also likely that the swap will be
61 more optimally managed for fast retrieval than the location on the regular
62 filesystem is. Balanced against that we have the overhead of maintaining
63 the swap tracking.
64
65 That's assuming it would swap that out to the dedicated swap in the first
66 place. I'm not familiar with Linux's VM, but given that the aliases and
67 functions would be file-based in either case, it's possible it would
68 simply drop the data from main memory, relying on the fact that that the
69 data is clean file-backed data and could be read-in directly from the
70 files again, if necessary, rather than bothering with actually creating a
71 temporary copy of the /same/ data in swap, taking time to do so when it
72 could just read it back in from the file.
73
74 Another aspect is the effect of data vs metadata caching. Again, I'm not
75 familiar with how Linux manages this, and indeed, it may differ between
76 filesystems, but the idea is that if the file metadata is still cached,
77 even if the file itself isn't, it's a single disk seek and read to read
78 the data back in, as opposed to multiple seeks and reads, following the
79 logical directory structure to fetch each directory table in the
80 hierarchy until it reaches the entry that actually has the file location,
81 before it can read the file itself, to read the file initially, or if the
82 location metadata has been flushed as well. (Back several years ago on
83 MSWormOS, one of the first things I always did after a reinstall was set
84 the system to server profile, which kept a far larger metadata cache, on
85 the theory that the metadata was usually smaller than the data, and for
86 dirs, sharable among many data files, so I'd rather spend cache memory on
87 metadata than data. The other choices were the default desktop profile,
88 and laptop, a much smaller metadata cache. I originally learned about
89 these as a result of reading about a bug in the original 95 as shipped,
90 that swapped some entries in the registry, and therefore cached FAR less
91 metadata than it should have. I don't know where these tweaks are located
92 on Linux, or how to go about adjusting them safely.)
93
94 Basically, therefore, I don't believe aliases to be a big positive, and
95 possibly somewhat of a negative, as opposed to scripts, because the
96 scripts will be cached in most cases after initial use anyway, yet they
97 have the advantage of not having to be maintained or tracked in memory
98 when I'm doing other tasks and the system needs that cache.
99
100 Given that I don't believe it's a big positive, I prefer the
101 administrative convenience and maintainability of separate scripts.
102
103 There /is/ a third alternative, that I came across recently, that I think
104 is a good idea. If you'd coomment, perhaps it would help me sort out the
105 implications.
106
107 The idea, simply put, is "bash command theming", single scripts that can
108 be invoked that will "theme" a command prompt for the tasks at hand. I
109 didn't read the entire article I saw covering this, but skimmed it enough
110 to get the gist. A single invokable script for each set of tasks, say
111 perl programming, bash programming, working with portage, etc, that would
112 set up a specific set of aliases and functions for that task. Invoking
113 the script with the "off" parameter would erase that set of aliases and
114 bash functions, thereby recovering the memory, and do any related cleanup
115 like resetting the path if necessary to exclude any task specific
116 commands. Taking this a step further, a variable could be setup that
117 would list the theme or themes that were active, that the theme-setup
118 script could read and automatically deactivate the previous theme while
119 switching to the new one. One could even share functionality between
120 themes, sourcing common files, which would check the active theme and
121 adjust their behavior based on the active theme.
122
123 This alias and function theming wouldn't be quite as modular (tho with
124 sourcing it could be) as the individual scripts, but would maintain the
125 performance advantages (if any) of the alias/function idea, while at the
126 same time allowing the memory reclamation of the cached-script option. It
127 sounds really good, but I'm not yet convinced the benefits would be worth
128 the additional effort of setting up those themes, since the solution I
129 have works.
130
131 One VERY NICE benefit of the themes idea is that it would directly
132 address any namespace pollution concerns. It has a direct appeal to
133 programmers and anyone else that's ever had to deal with such issues, for
134 that reason alone. One single command on the path to invoke the theme,
135 possibly even an eselect-like command shared among themes, with
136 everything else off-path and out of the namespace unless that theme is
137 invoked! /VERY/ appealing indeed. OTOH, there are those who'll never
138 remember the theme they have active at the moment, and be constantly
139 confused. For these folks, it'd be a nightmare!
140
141 > man emerge:
142 > --oneshot (-1)
143 >
144 > IIRC --oneshot has a short form since 2.0.52 was released.
145
146 Learn new things everyday. Thanks! I remember how pleased I was to have
147 --newuse, and even more so when I discovered -N, so very nice!
148
149 >> ... Deep breath... <g>
150 >>
151 >> All that as a preliminary explanation to this: Along with the above, I
152 >> have a set of efetch functions, that invoke the -f form, so just do the
153 >> fetch, not the actual compile and merge, and esyn (there's already an
154 >> esync function in something or other I have merged so I just call it
155 >> esyn), which does emerge sync, then updates the esearch db, then
156 >> automatically fetches all the packages that an eaworld would want to
157 >> update, so they are ready for me to merge at my leisure.
158 >
159 > I'm a bit confused now. You use *functions* to do that? Or do you mean
160 > scripts? By the way: with alias you could name your custom "script"
161 > esync because it doesn't place a file on the harddisk.
162
163 Scripts. I was using "functions" in the generic sense here. I did
164 realize before I sent that it had a dual meaning, but figured it wasn't
165 important enough a distinction to go back and correct, or explain.
166 Unfortunately, every time I decide to skip something like that, I get
167 called on it, which doesn't help my posts get any shorter! =8^)
168
169 >> I choose -Os, optimize for size, because a modern CPU and the various
170 >> cache levels are FAR faster than main memory.
171 >
172 > Given the fact that two CPUs, only differing in L2 Cache size, have
173 > nearly the same performance, I doubt that the performance increase is
174 > very big. Some interesting figures:
175 >
176 > Athlon64 something (forgot what, but shouldn't matter anyway) with 1 MB
177 > L2-cache is 4% faster than an Athlon64 of the same frequency but with only 512kB
178 > L2-cache. The bigger the cache sizes you compare get, the smaller the
179 > performance increase. Since you run a dual Opteron system with 1 MB L2
180 > cache per CPU I tend to say that the actual performance increase you
181 > experience is about 3%. But then I didn't take into account that -Os
182 > leaves out a few optimizations which would be included by -O2, the
183 > default optimization level, which actually makes the code a bit slower
184 > when compared to -O2. So, the performance increase you really experience
185 > shrinks to about 0-2%. I'd tend to proclaim that -O2 is even faster for
186 > most of the code, but that's only my feeling.
187
188 Interesting, indeed. I'd counter that it likely has to do with how many
189 tasks are being juggled as well, plus the number of kernel/user context
190 switches, of course. I wonder under what load, and with what task-type,
191 the above 4% difference was measured.
192
193 Of course, the definitive way to end the argument would be to do some
194 profiling and get some hard numbers, but I don't think either you or I
195 consider it an important enough factor in our lives to go to /that/ sort
196 of trouble. <g>
197
198 > Beside that I should mention that -Os sometimes still has problems with
199 > huge packages like glibc.
200
201 Interestingly enough, while Gentoo's glibc ebuilds stripflags to -O2, I
202 did try it with all that stripflags logic disabled. For glibc, it /does/
203 seem to slow things down, or did back with gcc-3.3 (IIRC) anyway. I tried
204 the same glibc both ways. I would have tried tinkering further, but
205 decided it wasn't worth complicating debugging and the like, since glibc
206 is loaded by virtually everything, and I'd never be able to tell if it was
207 my funny tweaks to glibc, or some actual issue with whatever package.
208 Besides, that's an aweful costly package, in terms of recompile time, not
209 to mention system stability, to be experimenting with. I /can/ say,
210 however, that it didn't crash or cause any other issues I could see or
211 attribute to it.
212
213 OTOH, I haven't tried it with xorg-modular yet, but the monolithic xorg
214 builds seemed to perform better with -Os. I tried one of them (6.8??)
215 both ways too. I ended up routinely killing the stripflags logic, but I
216 was modifying other portions of the ebuild as well (so it compiled only
217 the ATI video driver, and only installed the 100-dpi fonts, not 75-dpi,
218 among other things), so that was just one of several modifications I was
219 making, tho the only real performance affecting one. Performance in X was
220 better, but it DID take longer to switch to a VT, when I tried that. In
221 fact, at one point, the switch to VT functionality broke, but someone
222 mentioned it was broken in general at that point for certain drivers,
223 anyway, so I'm not sure my optimizations had anything to do with it.
224
225 >> Of course, this is theory, and the practical case can and will differ
226 >> depending on the instructions actually being compiled. In particular,
227 >> streaming media apps and media encoding/decoding are likely to still
228 >> benefit from the traditional loop elimination style optimizations,
229 >> because they run thru so much data already, that cache is routinely
230 >> trashed anyway, regardless of the size of your instructions. As well,
231 >> that type of application tends to have a LOT of looping instructions to
232 >> optimize!
233 >>
234 >> By contrast, something like the kernel will benefit more than usual
235 >> from size optimization. First, it's always memory locked and as such
236 >> can't be swapped, and even "slow" main memory is still **MANY**
237 >> **MANY** times faster than swap, so a smaller kernel means more other
238 >> stuff fits into main memory with it, and isn't swapped as much. Second,
239 >> parts of the
240 >
241 > Funny to hear this from somebody with 4 GB RAM in his system. I don't
242 > know how bloated your kernel is, but even if -Os would reduce the size
243 > of my kernel to **the half**, which is totally impossible, it wouldn't
244 > be enough to load the mail I am just answering into RAM. So, basically,
245 > this reasoning is just ridiculous.
246
247 I won't argue with that. BTW, still at a gig, much to my frustration! I
248 put off upgrading memory when I decided my disk was in danger of going bad
249 and I ended up deciding to go 4-disk SATA based RAID. Then I upgraded my
250 stereo near Christmas... Now the CC is almost paid off again, so I'm
251 looking at that memory upgrade again.
252
253 Much to my frustration, memory prices don't seem to be dropping much
254 lately!
255
256 > You are referring a lot to the gcc manpage, but obviously you missed
257 > this part:
258 >
259 > -fomit-frame-pointer
260 > Don't keep the frame pointer in a register for functions that
261 > don't need one. This avoids the instructions to save, set up
262 > and restore frame pointers; it also makes an extra register
263 > available in many functions. It also makes debugging
264 > impossible on some machines.
265 >
266 > On some machines, such as the VAX, this flag has no effect,
267 > because the standard calling sequence automatically handles
268 > the frame pointer and nothing is saved by pretending it
269 > doesn't exist. The machine-description macro
270 > "FRAME_POINTER_REQUIRED" controls whether a target machine
271 > supports this flag.
272 >
273 > Enabled at levels -O, -O2, -O3, -Os.
274 >
275 > I have to say that I am a bit disappointed now. You seemed to be one of
276 > those people who actually inform themselves before sticking new flags
277 > into their CFLAGS.
278
279 ??
280
281 I'm not sure which way you mean this. It was in my CFLAGS list, but I
282 didn't discuss it as it's fairly common (from my observation, nearly as
283 common as -pipe) and seems fairly non-controversial on Gentoo. Did you
284 miss it in my CFLAGS and are saying I should be using it, or did you see
285 it and are saying its unnecessary and redundant because it's enabled by
286 the -Os?
287
288 If the latter, yes, but as mentioned above in the context of glibc, -Os is
289 sometimes stripped. In that case, the redundancy of having the basic
290 -fomit-frame-pointer is useful, unless it's also stripped, but as I said,
291 it seems much less controversial than some flags and is often
292 specifically allowed where most are stripped.
293
294 Or, are you saying I should avoid it due to the debugging implications? I
295 don't quite get it.
296
297 >> !!! Relying on the shell to locate gcc, this may break !!! DISTCC,
298 >> installing gcc-config and setting your current gcc !!! profile will fix
299 >> this
300 >>
301 >> Another warning, likewise to stderr and thus not in the eis output.
302 >> This one is due to the fact that eselect, the eventual systemwide
303 >> replacement for gcc-config and a number of other commands, uses a
304 >> different method to set the compiler than gcc-config did, and portage
305 >> hasn't been adjusted to full compatibility just yet. Portage finds the
306 >> proper gcc just fine for itself, but there'd be problems if distcc was
307 >> involved, thus the warning.
308 >
309 > Didn't know about this. Have you filed a bug yet on the topic? Or is
310 > there already one?
311
312 There is one. I don't recall if I filed it or if it was already there,
313 but both JH and the portage folks know about the issue. IIRC, the portage
314 folks decided it was their side that needed changed, but that required
315 changes to the distcc package, and I don't know how that has gone since I
316 don't use distcc, except that I was slightly surprised to see the warning
317 in portage 2.1 still.
318
319 >> MAKEOPTS="-j4"
320 >>
321 >> The four jobs is nice for a dual-CPU system -- when it works.
322 >> Unfortunately, the unpack and configure steps are serialized, so the
323 >> jobs option does little good, there. To make most efficient use of the
324 >> available cycles when I have a lot to merge, therefore, I'll run as
325 >> many as five merges in parallel. I do this quite regularly with KDE
326 >> upgrades like the one to 3.5.1, where I use the split KDE ebuilds and
327 >> have something north of 100 packages to merge before KDE is fully
328 >> upgraded.
329 >
330 > I really wonder how you would paralellize unpacking and configuring a
331 > package.
332
333 That's what was nice about configcache, which was supposed to be in the
334 next portage, but I haven't seen or heard anything about it for awhile,
335 and the next portage, 2.1, is what I'm using. configcache seriously
336 shortened that stage of the build, leaving more of it parallelized, but...
337
338 I was using it for awhile, patching successive versions of portage, but it
339 broke about the time sandbox split, the dev said he wasn't maintaining the
340 old version since it was going in the new portage, and I tried updating
341 the patch but eventually ran into what I think were unrelated issues but
342 decided to drop that in one of my troubleshooting steps and never picked
343 it up again.
344
345 I'd certainly like to have it back again, tho. If it's working in 2.1,
346 I've not seen it documented or seen any hints in the emerge output, as
347 were there before. You seen or heard anything?
348
349 BTW, what is your opinion on -ftracer? Several devs I've noticed use it,
350 but the manpage says it's not that useful without active profiling, which
351 means compiling, profiling, and recompiling, AFAIK. It's possible the
352 devs running it do that, but I doubt it, and otherwise, I don't see that
353 it should be that useful? I don't know if you run it, but since I've got
354 your attention, I thought I'd ask what you think about it. Is there
355 something of significance I'm missing, or are they, or are they actually
356 doing that compile/profile/recompile thing? It just doesn't make sense to
357 me. I've seen it in several user posted CFLAGS as well, but I'll bet a
358 good portion of them are simply because they saw it in a dev's CFLAGS and
359 decided it looked useful, not because they understand any implications
360 stated in the manpage. (Not that I always do either, but... <g>)
361
362 --
363 Duncan - List replies preferred. No HTML msgs.
364 "Every nonfree program has a lord, a master --
365 and if you use the program, he is your master." Richard Stallman in
366 http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
367
368
369 --
370 gentoo-amd64@g.o mailing list

Replies