Gentoo Archives: gentoo-amd64

From: Branko Badrljica <brankob@××××××××××.com>
To: gentoo-amd64@l.g.o
Subject: Re: [gentoo-amd64] Re: CFLAGS question from a AMD64 newbie
Date: Tue, 09 Dec 2008 16:47:38
Message-Id: 493EADA3.3090907@avtomatika.com
In Reply to: [gentoo-amd64] Re: CFLAGS question from a AMD64 newbie by Duncan <1i5t5.duncan@cox.net>
1 Duncan wrote:
2 >
3 >
4 > Well, you say you want stable, but then say you use ~arch, so I see
5 > you're not too stick in the mud. =:^)
6 >
7 > Here's mine, for a dual Opteron 290:
8 >
9 > CFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
10 > all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
11 > fdirectives-only -freorder-blocks-and-partition -combine"
12 >
13 > CXXFLAGS="-march=opteron-sse3 -pipe -O2 -frename-registers -fweb -fmerge-
14 > all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize -
15 > fdirectives-only"
16 >
17 > You can look them up in the gcc manpage, or look back a year or so when I
18 > explained most of them, altho that was a couple gcc versions ago and they
19 > weren't quite the same.
20 >
21 >
22 <SNIP>
23
24
25 Been there, done praactically that, but it didn't make one quark of
26 difference overall, except throwing gcc in a coma now and then,
27 lenghtening compile problems and causing odd ( but rare ) bugs.
28
29 I tried to time several C programs of mine and found that plain -O1
30 worked substantially better than plain -O2.
31
32 After that, I said sod all and used plain vanilla CLFAGS on new gcc and
33 with right march. Works fine, with same speed, faster compiles and much
34 less headaches on average.
35
36 In my experience, exotic CFLAGS can make a difference, but this varies
37 wildldy from program part to program part, so unless one knows exactly
38 what he is doing, he might be better of trusting compiler to use sane
39 path with -O2. Besides that, portage doesn't have an option to compile
40 just some part of the code with another, non_default CFLAGS...
41
42
43
44 > But my basic strategy is this: Because memory is so much slower than
45 > cache on a modern processor, in general it should pay to optimize for
46 > size even if it costs a few CPU cycles once in awhile.
47 True, but he is asking for P4, which was notorious for having long
48 pipelina and a neadache after cache miss, so for him -O2 or even -03
49 might be better in _some_ cases.
50 But even so, IMVHO it is simply not worth the time and effort to fiddle
51 with this, I'd use golden default with right march here also and be
52 done with it.