Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Old topic comes round again: -Ox optimizations.
Date: Thu, 03 Nov 2005 08:50:44
The topic of the usefulness of various -Ox optimizations comes up every so
often.  This will likely be old news to Gentoo old hands, but it
should be useful for newbies, at least.  Anyway, I found this commentary
by Linus interesting:

The thread wanders quite a bit, but this subthread started with someone
displaying a listing of their kernels ordered by size back thru 2.2.x,
noting that the size had increased from ~ half a MB back then to ~ 1.5 MB

Somebody then asked (among other things) if the compiler used was the
same, listing the size of the /same/ kernel compiled with gcc-2.95.x
against the gcc-4.1.0-snapshot they are running.  The 4.1 compiled kernel
was MUCH larger.

Someone replied that the comparison wasn't fair -- if you don't ask gcc to
optimize for size, don't blame it for not doing so -- and proceeded to
compare kernels compiled with 4.1 and normal options, against
those compiled using the kernel embedded option, and then choosing
appropriately the answers to the output of "make oldconfig" after having
changed to "embedded". Among other things, he added -Os to the kernel's
gcc command line, and he probably dropped symbols as well.  What else he
chose from the options he didn't say, but the resulting core kernel
(allnoconfig) ended up /very/ similar in size to the old 2.5 kernel he
compared against (I believe the first one to have allnoconfig as an option).

A few posts down, Linus then posted this comment, from the bottom of the
linked post above:


And we should probably make -Os the default. Apparently Fedora 
already does that by just forcibly hacking the Kconfig files.

With modern CPU's, instructions are almost "free". The real cost is in 
cache misses, and that tends to be doubly true of system software that 
tends to have a lot more cache misses than "normal" programs (because 
people try hard to batch up system calls like write etc, so by the time 
the kernel is called, the L1 cache is mostly flushed already - possibly 
the L2 too. And interrupts may be in the "fast path", but they'd sure as 
hell better not happen so often that they stay cached very well etc etc).

So -Os probably performs better in real life, and likely only performs 
worse on micro-benchmarks. Sadly, micro-benchmarks are often very 
instructive in many other ways.


(Dave Jones later confirms that Fedora does indeed normally run -Os.)

(If you want to view the entire thread, use the "Go to the topic" link
near the top left of the page.)

So... certainly for kernel and probably for glibc stuff (tho I believe
Gentoo kills -Os on glibc compiles unless you hack out that portion of the
ebuild, in your own overlay or whatever), -Os is likely to be the best
choice.  For most of userland, -Os may be best, but to a smaller degree,
and performance should be similar with -O2, trading size for medium speed
optimizations in what amounts to a wash. The exceptions would likely be
media encoders and the like, where the working set is large and in a data
streaming environment, and -O3 may make sense.  In the general case,
however, -O3 likely does NOT make sense, because it's almost always so
expensive in size that the gains in speed over -O2 are far outweighed.

IMO (and for the kernel anyway, Linus's as well)...

Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in

gentoo-amd64@g.o mailing list