Gentoo Archives: gentoo-amd64

From: "Sami Näätänen" <sn.ml@××××××××××××.fi>
To: gentoo-amd64@l.g.o
Subject: Re: [gentoo-amd64] Re: CFLAGS question from a AMD64 newbie
Date: Tue, 09 Dec 2008 20:34:43
Message-Id: 200812092234.39964.sn.ml@keijukammari.fi
In Reply to: [gentoo-amd64] Re: CFLAGS question from a AMD64 newbie by Duncan <1i5t5.duncan@cox.net>
1 On Tuesday 09 December 2008 18:07:38 Duncan wrote:
2 > Sami Näätänen <sn.ml@××××××××××××.fi> posted
3 > 200812091423.30562.sn.ml@××××××××××××.fi, excerpted below, on Tue, 09 Dec
4 >
5 > 2008 14:23:30 +0200:
6 > > My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled
7 > > with a 4GB of memory. No overclocking etc. Want this to be stable. :)
8 > >
9 > > I'm just curious what people use as their stable CFLAGS in amd64 Gentoo?
10 > > (Sorry if this has been up lately, but I just switched to 64bit env
11 > > so...)
12 > >
13 > >
14 > > Here is mine and some explanation of why (And I use ~arch system with
15 > > gcc 4.3)
16 >
17 > Well, you say you want stable, but then say you use ~arch, so I see
18 > you're not too stick in the mud. =:^)
19
20 Well stable binaries as I said in my clarifying (at least a litle) second
21 post. :)
22
23 > Here's mine, for a dual Opteron 290:
24 >
25 > CFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
26 > all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
27 > fdirectives-only -freorder-blocks-and-partition -combine"
28 >
29 > CXXFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
30 > all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
31 > fdirectives-only"
32 >
33 > You can look them up in the gcc manpage, or look back a year or so when I
34 > explained most of them, altho that was a couple gcc versions ago and they
35 > weren't quite the same.
36 >
37 > But my basic strategy is this: Because memory is so much slower than
38 > cache on a modern processor, in general it should pay to optimize for
39 > size even if it costs a few CPU cycles once in awhile. Thus, until
40 > fairly recently I used -Os, but with gcc-4.3, decided to switch to -O2
41 > since gcc is getting smarter about such optimizations with -O2 now, and
42 > the few additional size optimizations with -Os now tend to be at the
43 > expense of cache (think -freorder-blocks-and-partition). In any case, I
44 > certainly don't want -O3 or too much loop unrolling and inlining, at the
45 > expense of cache.
46 >
47 > -frename-registers and -fweb are useful for taking advantage of the
48 > additional registers x86_64 has. -fdirectives-only is there because it
49 > works better with ccache, which I use. You know about -ftree-vectorize
50 > and -combine is discussed elsewhere on-thread. -fmerge-all-constants
51 > isn't strictly C standard, but I've had absolutely zero issues with it,
52 > and it's going to help with cache. -freorder-blocks-and-partition won't
53 > work on most C++ code, thus (along with -combine) the reason I split
54 > CFLAGS and CXXFLAGS, but it tells gcc to keep hot code together so it
55 > stays in cache better. The various -fgcse-* options make gcc stricter
56 > about global common subexpression elimination (gcse) under various
57 > conditions. This shouldn't add to size and may in fact reduce size by
58 > reducing instruction count (or moving it out of loops, size neutral), but
59 > it can increase compile time, the reason a few of them are enabled at -O3
60 > only, by default.
61 >
62 > -combine is the one that causes the most problems, handled per trouble-
63 > package as mentioned in the other thread using /etc/portage/env/* files.
64 > The -fredorder-blocks-and-partition can in some cases as well, but if you
65 > don't have either of those in CXXFLAGS, you'll avoid a lot of the problem
66 > right there. Those are the only C(XX)FLAGS I have had issues with
67 > lately. The others have worked just fine.
68 >
69 > With quad-core you will likely be interested in upping your MAKEOPTS job
70 > count as well. Just be aware that it too can cause issues at times.
71 > Again, however, it's easily worked around per-package as you come across
72 > them using the env/* files to set MAKEOPTS=-j1 or whatever.
73
74 Yeah forgot to told that too. I in fact like to -j <num cores> as then There
75 is no need for renicing in most cases and the system stays smooth.
76
77 > Since you mentioned running ~arch, and assuming your PM is still portage,
78 > you may also want to take a look at the emerge's --jobs and --load-
79 > average options, for parallel emerges, if you haven't already. If you
80 > use them you'll probably find --keep-going useful as well, so it doesn't
81 > stop just because one of the parallel merges failed.
82
83 Well paludis man for quite a while much better dependency handling.
84
85 > Finally, if you haven't already, consider pointing PORTAGE_TMPDIR at a
86 > tmpfs. With 4 gig memory it should speed things up dramatically, and the
87 > worst-case is that it uses swap, sending to disk what would be 100%
88 > guaranteed to go to disk if you had PORTAGE_TMPDIR on disk.
89
90 Eah I have
91 3GB tmpfs for /var/tmp/paludis and
92 1GB tmpfs for /tmp to speed things up in normal operation. And as memory seems
93 to be quite cheap I might change to 8GB. After all there is no such thing as
94 too much memory... (Actually there can be, but then one has the wrong HW to
95 use that memory ;) )

Replies

Subject Author
[gentoo-amd64] Re: CFLAGS question from a AMD64 newbie Duncan <1i5t5.duncan@×××.net>