Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: Problems configuring gentoo
Date: Sat, 04 Oct 2008 15:30:25
Message-Id: pan.2008.10.04.15.30.13@cox.net
In Reply to: Re: [gentoo-amd64] Problems configuring gentoo by Richard Freeman
1 Richard Freeman <rich0@g.o> posted 48E759DB.6080905@g.o,
2 excerpted below, on Sat, 04 Oct 2008 07:56:11 -0400:
3
4 > Ben de Groot wrote:
5 >>
6 >> -Os optimizes for size, while -O2 optimizes for speed. There is no need
7 >> at all to use -Os on a modern desktop machine, and it will run
8 >> comparatively slower than -O2 optimized code, which is probably not
9 >> what you want.
10 >>
11 >>
12 > There are a couple of schools of thought on that, and I think
13 > performance can depend a great deal on what program you're talking
14 > about.
15 >
16 > On any machine, memory is a limited resource. Oh sure, you could just
17 > "spend a little more on decent RAM", but you could also "spend a little
18 > more on a decent CPU" or whatever. For a given amount of money you can
19 > only buy so much hardware, so any dollar spent on RAM is a dollar not
20 > spent on something else.
21
22 I agree, but stress the limits of the L1/L2(/L3?) caches more than
23 general system memory. You can generally buy more system memory fairly
24 easily, but Lx cache sizes aren't so easy to chance, and are likely to
25 remain quite limited resources for some time.
26
27 So I generally see the benefits of -Os over -O2, with a few exceptions.
28 -freorder-blocks-and-partition (only in CFLAGS, it doesn't work on C++/
29 CXXFLAGS) can increase the raw size but manages cache better because it
30 separates code into hot and cold blocks, giving the hot code a better
31 chance at staying in-cache (according to the gcc manpage). There's a
32 couple other similar flags.
33
34 Of course, to /really/ get performance, one would need to compile with
35 code profiling instrumentation turned on, run the program as you would
36 normally (but profiled) for awhile to generate some profiling history,
37 then recompile using that history to help optimize things. This BTW is
38 one of the reasons I wonder about -ftracer when I see it in someone's
39 CFLAGS. The gcc manpage says it helps other optimization, but then links
40 it to -fprofile-use. How much help it is without the profiling isn't
41 covered, but given the increase in size and the effect of that on caches,
42 it's likely not worth it without the profiling. How many people compile
43 first for profiling, run the program to generate profiles, then recompile
44 using the profile data? Right, not so many, at least for most apps. In
45 that case, why do they have -ftracer in their general CFLAGS?
46
47 That said, I recently switched to -O2 from my long time -Os. Much of the
48 difference in gcc-3 was due to -funit-at-a-time and similar
49 optimizations, enabled by default early on for -Os, but not for -O2 until
50 gcc-4.something, I believe. Modern gcc is more cache-usage-performance
51 aware than gcc-3 was, and I think most of the remaining differences are
52 like the -freorder-blocks-and-partition thing, they affect CPU cache
53 usage negatively enough that you don't want them enabled except for old
54 machines, embedded, and perhaps the now popular netbook/atom type
55 applications.
56
57 Talking about which... I just got my Acer Aspire One (32-bit Atom n270
58 CPU), and intend to do a 32-bit chroot on my main machine and create
59 binpkgs to merge to the AA1. Any idea what sort of CFLAGS to use on it?
60 I know it doesn't have all that fancy branch prediction and prefetch
61 stuff of a normal modern x86_(32/64) CPU. One suggestion I've seen is
62 -march=686, and I'll probably do -Os for it, but what about stuff like
63 -fweb -frename-registers, etc? It does have thru SSE3 at least, so I can
64 enable that too.
65
66 --
67 Duncan - List replies preferred. No HTML msgs.
68 "Every nonfree program has a lord, a master --
69 and if you use the program, he is your master." Richard Stallman