1 |
On Tuesday 09 December 2008 18:07:38 Duncan wrote: |
2 |
> Sami Näätänen <sn.ml@××××××××××××.fi> posted |
3 |
> 200812091423.30562.sn.ml@××××××××××××.fi, excerpted below, on Tue, 09 Dec |
4 |
> |
5 |
> 2008 14:23:30 +0200: |
6 |
> > My system is an Intel quad core core2 with a 2.4 GHz clock speed coupled |
7 |
> > with a 4GB of memory. No overclocking etc. Want this to be stable. :) |
8 |
> > |
9 |
> > I'm just curious what people use as their stable CFLAGS in amd64 Gentoo? |
10 |
> > (Sorry if this has been up lately, but I just switched to 64bit env |
11 |
> > so...) |
12 |
> > |
13 |
> > |
14 |
> > Here is mine and some explanation of why (And I use ~arch system with |
15 |
> > gcc 4.3) |
16 |
> |
17 |
> Well, you say you want stable, but then say you use ~arch, so I see |
18 |
> you're not too stick in the mud. =:^) |
19 |
|
20 |
Well stable binaries as I said in my clarifying (at least a litle) second |
21 |
post. :) |
22 |
|
23 |
> Here's mine, for a dual Opteron 290: |
24 |
> |
25 |
> CFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge- |
26 |
> all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize - |
27 |
> fdirectives-only -freorder-blocks-and-partition -combine" |
28 |
> |
29 |
> CXXFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge- |
30 |
> all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize - |
31 |
> fdirectives-only" |
32 |
> |
33 |
> You can look them up in the gcc manpage, or look back a year or so when I |
34 |
> explained most of them, altho that was a couple gcc versions ago and they |
35 |
> weren't quite the same. |
36 |
> |
37 |
> But my basic strategy is this: Because memory is so much slower than |
38 |
> cache on a modern processor, in general it should pay to optimize for |
39 |
> size even if it costs a few CPU cycles once in awhile. Thus, until |
40 |
> fairly recently I used -Os, but with gcc-4.3, decided to switch to -O2 |
41 |
> since gcc is getting smarter about such optimizations with -O2 now, and |
42 |
> the few additional size optimizations with -Os now tend to be at the |
43 |
> expense of cache (think -freorder-blocks-and-partition). In any case, I |
44 |
> certainly don't want -O3 or too much loop unrolling and inlining, at the |
45 |
> expense of cache. |
46 |
> |
47 |
> -frename-registers and -fweb are useful for taking advantage of the |
48 |
> additional registers x86_64 has. -fdirectives-only is there because it |
49 |
> works better with ccache, which I use. You know about -ftree-vectorize |
50 |
> and -combine is discussed elsewhere on-thread. -fmerge-all-constants |
51 |
> isn't strictly C standard, but I've had absolutely zero issues with it, |
52 |
> and it's going to help with cache. -freorder-blocks-and-partition won't |
53 |
> work on most C++ code, thus (along with -combine) the reason I split |
54 |
> CFLAGS and CXXFLAGS, but it tells gcc to keep hot code together so it |
55 |
> stays in cache better. The various -fgcse-* options make gcc stricter |
56 |
> about global common subexpression elimination (gcse) under various |
57 |
> conditions. This shouldn't add to size and may in fact reduce size by |
58 |
> reducing instruction count (or moving it out of loops, size neutral), but |
59 |
> it can increase compile time, the reason a few of them are enabled at -O3 |
60 |
> only, by default. |
61 |
> |
62 |
> -combine is the one that causes the most problems, handled per trouble- |
63 |
> package as mentioned in the other thread using /etc/portage/env/* files. |
64 |
> The -fredorder-blocks-and-partition can in some cases as well, but if you |
65 |
> don't have either of those in CXXFLAGS, you'll avoid a lot of the problem |
66 |
> right there. Those are the only C(XX)FLAGS I have had issues with |
67 |
> lately. The others have worked just fine. |
68 |
> |
69 |
> With quad-core you will likely be interested in upping your MAKEOPTS job |
70 |
> count as well. Just be aware that it too can cause issues at times. |
71 |
> Again, however, it's easily worked around per-package as you come across |
72 |
> them using the env/* files to set MAKEOPTS=-j1 or whatever. |
73 |
|
74 |
Yeah forgot to told that too. I in fact like to -j <num cores> as then There |
75 |
is no need for renicing in most cases and the system stays smooth. |
76 |
|
77 |
> Since you mentioned running ~arch, and assuming your PM is still portage, |
78 |
> you may also want to take a look at the emerge's --jobs and --load- |
79 |
> average options, for parallel emerges, if you haven't already. If you |
80 |
> use them you'll probably find --keep-going useful as well, so it doesn't |
81 |
> stop just because one of the parallel merges failed. |
82 |
|
83 |
Well paludis man for quite a while much better dependency handling. |
84 |
|
85 |
> Finally, if you haven't already, consider pointing PORTAGE_TMPDIR at a |
86 |
> tmpfs. With 4 gig memory it should speed things up dramatically, and the |
87 |
> worst-case is that it uses swap, sending to disk what would be 100% |
88 |
> guaranteed to go to disk if you had PORTAGE_TMPDIR on disk. |
89 |
|
90 |
Eah I have |
91 |
3GB tmpfs for /var/tmp/paludis and |
92 |
1GB tmpfs for /tmp to speed things up in normal operation. And as memory seems |
93 |
to be quite cheap I might change to 8GB. After all there is no such thing as |
94 |
too much memory... (Actually there can be, but then one has the wrong HW to |
95 |
use that memory ;) ) |