Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: CFLAGS question from a AMD64 newbie
Date: Tue, 09 Dec 2008 16:07:53
Message-Id: pan.2008.12.09.16.07.38@cox.net
In Reply to: [gentoo-amd64] CFLAGS question from a AMD64 newbie by "Sami Näätänen"
1 Sami Näätänen <sn.ml@××××××××××××.fi> posted
2 200812091423.30562.sn.ml@××××××××××××.fi, excerpted below, on Tue, 09 Dec
3 2008 14:23:30 +0200:
4
5 > My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled
6 > with a 4GB of memory. No overclocking etc. Want this to be stable. :)
7 >
8 > I'm just curious what people use as their stable CFLAGS in amd64 Gentoo?
9 > (Sorry if this has been up lately, but I just switched to 64bit env
10 > so...)
11 >
12 >
13 > Here is mine and some explanation of why (And I use ~arch system with
14 > gcc 4.3)
15
16 Well, you say you want stable, but then say you use ~arch, so I see
17 you're not too stick in the mud. =:^)
18
19 Here's mine, for a dual Opteron 290:
20
21 CFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
22 all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
23 fdirectives-only -freorder-blocks-and-partition -combine"
24
25 CXXFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
26 all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
27 fdirectives-only"
28
29 You can look them up in the gcc manpage, or look back a year or so when I
30 explained most of them, altho that was a couple gcc versions ago and they
31 weren't quite the same.
32
33 But my basic strategy is this: Because memory is so much slower than
34 cache on a modern processor, in general it should pay to optimize for
35 size even if it costs a few CPU cycles once in awhile. Thus, until
36 fairly recently I used -Os, but with gcc-4.3, decided to switch to -O2
37 since gcc is getting smarter about such optimizations with -O2 now, and
38 the few additional size optimizations with -Os now tend to be at the
39 expense of cache (think -freorder-blocks-and-partition). In any case, I
40 certainly don't want -O3 or too much loop unrolling and inlining, at the
41 expense of cache.
42
43 -frename-registers and -fweb are useful for taking advantage of the
44 additional registers x86_64 has. -fdirectives-only is there because it
45 works better with ccache, which I use. You know about -ftree-vectorize
46 and -combine is discussed elsewhere on-thread. -fmerge-all-constants
47 isn't strictly C standard, but I've had absolutely zero issues with it,
48 and it's going to help with cache. -freorder-blocks-and-partition won't
49 work on most C++ code, thus (along with -combine) the reason I split
50 CFLAGS and CXXFLAGS, but it tells gcc to keep hot code together so it
51 stays in cache better. The various -fgcse-* options make gcc stricter
52 about global common subexpression elimination (gcse) under various
53 conditions. This shouldn't add to size and may in fact reduce size by
54 reducing instruction count (or moving it out of loops, size neutral), but
55 it can increase compile time, the reason a few of them are enabled at -O3
56 only, by default.
57
58 -combine is the one that causes the most problems, handled per trouble-
59 package as mentioned in the other thread using /etc/portage/env/* files.
60 The -fredorder-blocks-and-partition can in some cases as well, but if you
61 don't have either of those in CXXFLAGS, you'll avoid a lot of the problem
62 right there. Those are the only C(XX)FLAGS I have had issues with
63 lately. The others have worked just fine.
64
65 With quad-core you will likely be interested in upping your MAKEOPTS job
66 count as well. Just be aware that it too can cause issues at times.
67 Again, however, it's easily worked around per-package as you come across
68 them using the env/* files to set MAKEOPTS=-j1 or whatever.
69
70 Since you mentioned running ~arch, and assuming your PM is still portage,
71 you may also want to take a look at the emerge's --jobs and --load-
72 average options, for parallel emerges, if you haven't already. If you
73 use them you'll probably find --keep-going useful as well, so it doesn't
74 stop just because one of the parallel merges failed.
75
76 Finally, if you haven't already, consider pointing PORTAGE_TMPDIR at a
77 tmpfs. With 4 gig memory it should speed things up dramatically, and the
78 worst-case is that it uses swap, sending to disk what would be 100%
79 guaranteed to go to disk if you had PORTAGE_TMPDIR on disk.
80
81 --
82 Duncan - List replies preferred. No HTML msgs.
83 "Every nonfree program has a lord, a master --
84 and if you use the program, he is your master." Richard Stallman

Replies

Subject Author
Re: [gentoo-amd64] Re: CFLAGS question from a AMD64 newbie Branko Badrljica <brankob@××××××××××.com>
Re: [gentoo-amd64] Re: CFLAGS question from a AMD64 newbie "Sami Näätänen" <sn.ml@××××××××××××.fi>
Re: [gentoo-amd64] Re: CFLAGS question from a AMD64 newbie Branko Badrljica <brankob@××××××××××.com>