1 |
Sami Näätänen <sn.ml@××××××××××××.fi> posted |
2 |
200812091423.30562.sn.ml@××××××××××××.fi, excerpted below, on Tue, 09 Dec |
3 |
2008 14:23:30 +0200: |
4 |
|
5 |
> My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled |
6 |
> with a 4GB of memory. No overclocking etc. Want this to be stable. :) |
7 |
> |
8 |
> I'm just curious what people use as their stable CFLAGS in amd64 Gentoo? |
9 |
> (Sorry if this has been up lately, but I just switched to 64bit env |
10 |
> so...) |
11 |
> |
12 |
> |
13 |
> Here is mine and some explanation of why (And I use ~arch system with |
14 |
> gcc 4.3) |
15 |
|
16 |
Well, you say you want stable, but then say you use ~arch, so I see |
17 |
you're not too stick in the mud. =:^) |
18 |
|
19 |
Here's mine, for a dual Opteron 290: |
20 |
|
21 |
CFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge- |
22 |
all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize - |
23 |
fdirectives-only -freorder-blocks-and-partition -combine" |
24 |
|
25 |
CXXFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge- |
26 |
all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize - |
27 |
fdirectives-only" |
28 |
|
29 |
You can look them up in the gcc manpage, or look back a year or so when I |
30 |
explained most of them, altho that was a couple gcc versions ago and they |
31 |
weren't quite the same. |
32 |
|
33 |
But my basic strategy is this: Because memory is so much slower than |
34 |
cache on a modern processor, in general it should pay to optimize for |
35 |
size even if it costs a few CPU cycles once in awhile. Thus, until |
36 |
fairly recently I used -Os, but with gcc-4.3, decided to switch to -O2 |
37 |
since gcc is getting smarter about such optimizations with -O2 now, and |
38 |
the few additional size optimizations with -Os now tend to be at the |
39 |
expense of cache (think -freorder-blocks-and-partition). In any case, I |
40 |
certainly don't want -O3 or too much loop unrolling and inlining, at the |
41 |
expense of cache. |
42 |
|
43 |
-frename-registers and -fweb are useful for taking advantage of the |
44 |
additional registers x86_64 has. -fdirectives-only is there because it |
45 |
works better with ccache, which I use. You know about -ftree-vectorize |
46 |
and -combine is discussed elsewhere on-thread. -fmerge-all-constants |
47 |
isn't strictly C standard, but I've had absolutely zero issues with it, |
48 |
and it's going to help with cache. -freorder-blocks-and-partition won't |
49 |
work on most C++ code, thus (along with -combine) the reason I split |
50 |
CFLAGS and CXXFLAGS, but it tells gcc to keep hot code together so it |
51 |
stays in cache better. The various -fgcse-* options make gcc stricter |
52 |
about global common subexpression elimination (gcse) under various |
53 |
conditions. This shouldn't add to size and may in fact reduce size by |
54 |
reducing instruction count (or moving it out of loops, size neutral), but |
55 |
it can increase compile time, the reason a few of them are enabled at -O3 |
56 |
only, by default. |
57 |
|
58 |
-combine is the one that causes the most problems, handled per trouble- |
59 |
package as mentioned in the other thread using /etc/portage/env/* files. |
60 |
The -fredorder-blocks-and-partition can in some cases as well, but if you |
61 |
don't have either of those in CXXFLAGS, you'll avoid a lot of the problem |
62 |
right there. Those are the only C(XX)FLAGS I have had issues with |
63 |
lately. The others have worked just fine. |
64 |
|
65 |
With quad-core you will likely be interested in upping your MAKEOPTS job |
66 |
count as well. Just be aware that it too can cause issues at times. |
67 |
Again, however, it's easily worked around per-package as you come across |
68 |
them using the env/* files to set MAKEOPTS=-j1 or whatever. |
69 |
|
70 |
Since you mentioned running ~arch, and assuming your PM is still portage, |
71 |
you may also want to take a look at the emerge's --jobs and --load- |
72 |
average options, for parallel emerges, if you haven't already. If you |
73 |
use them you'll probably find --keep-going useful as well, so it doesn't |
74 |
stop just because one of the parallel merges failed. |
75 |
|
76 |
Finally, if you haven't already, consider pointing PORTAGE_TMPDIR at a |
77 |
tmpfs. With 4 gig memory it should speed things up dramatically, and the |
78 |
worst-case is that it uses swap, sending to disk what would be 100% |
79 |
guaranteed to go to disk if you had PORTAGE_TMPDIR on disk. |
80 |
|
81 |
-- |
82 |
Duncan - List replies preferred. No HTML msgs. |
83 |
"Every nonfree program has a lord, a master -- |
84 |
and if you use the program, he is your master." Richard Stallman |