Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: First Impressions
Date: Wed, 27 Sep 2006 09:14:06
Message-Id: efdf6p$pib$1@sea.gmane.org
In Reply to: [gentoo-amd64] First Impressions by Daniel Iliev
1 Daniel Iliev <danny@××××××××.com> posted 451A110B.2080401@××××××××.com,
2 excerpted below, on Wed, 27 Sep 2006 08:50:03 +0300:
3
4 > So let me start a with 2 newbie questions caused by my first impressions
5 > from the x86_64 world:
6 >
7 > 1) I use CFLAGS="-march=athlon64 -mfpmath=sse -msse -msse2 -msse3
8 > -m3dnow -mmmx -O3 -fomit-frame-pointer -pipe -fpic". Portage complains
9 > with *red letters* about the fpic flag. Every time I emerge something it
10 > says that "fpic breaks things", but I haven't met a single breakage so
11 > far. Is that a bug? Actually there was an ebuild which could not be
12 > compiled if mysql was compiled w/o "fpic". I'm not 100% sure but AFAIR
13 > it was dev-perl/DBD-mysql.
14 >
15 > 2) I see too many flags that are disabled by the profile - the kind with
16 > the parenthesis around them, like "(-3dnow)". Why? As I mentioned above
17 > I enable some of these through my CFLAGS - e.g. (-mmx), (-mmxext),
18 > (-sse) and (-sse2) and everything works perfect.
19
20 It seems that you missed some of the Gentoo/AMD64 documentation.
21 Many/most of your questions are answered there. Unfortunately, I'm not
22 aware of a simple easy to use list of everything in one spot, so it's
23 reading a bit of documentation here, a bit more there, etc.
24
25 The main Gentoo/AMD64 project page. (This would be the logical place for
26 such a list, but it's more the project page, tho it links some of the
27 docs, it's just not as easy to find those links as it could be.)
28 http://amd64.gentoo.org
29
30 Gentoo/AMD64 FAQ:
31 http://www.gentoo.org/doc/en/gentoo-amd64-faq.xml
32
33 Gentoo/AMD64 HOWTOs. (There's one on -fPIC here, tho the explanation is
34 a bit developer-centric.)
35 http://www.gentoo.org/proj/en/base/amd64/howtos/index.xml
36
37 A brief direct answer to your questions follows:
38
39 * The sse etc CFLAGS are arch dependent. Unlike x86 where the
40 mmx/sse/other-extensions instructions were added as the arch matured, on
41 amd64, they are part of the definition of the arch itself. All x86_64
42 (amd64) CPUs will have mmx/sse/sse2, etc. Thus, -march=athlon64 already
43 tells gcc these are available to use where it wants/needs to. The others
44 don't therefore provide gcc any more information than what it already has.
45
46 * -fomit-frame-pointer isn't needed on 64-bit amd64 either, as it's turned
47 on for all -O levels on archs (including amd64) where doing so doesn't
48 interfere with debugging. (See the gcc manpage, under -O optimization.)
49 You may wish to continue to specify it for stuff that's compiled for
50 32-bit, however, including parts of gcc, a version of glibc, a version of
51 the (portage) sandbox library, etc.
52
53 * Generally speaking, -fPIC is required on amd64 for ALL LIBRARIES but the
54 ebuilds normally take care of it. Under certain circumstances (like
55 unsupported CFLAGS), the configure scripts will turn it off by mistake, see
56 the above mentioned -fPIC HOWTO link for details, but the solution isn't
57 to add it to your CFLAGS, as that means it will be used for executable
58 applications as well as libraries, and /some/ applications /do/ break with
59 it. Not many, but some, and if it's in your CFLAGS, you WILL have bugs
60 you file closed as INVALID or the like, due to CFLAG abuse. If there's
61 something not working without it, then THAT'S a bug and should be filed as
62 such (unless it's due to use of CFLAGS gcc doesn't support and warns
63 about, thus triggering the configure script detection problem discussed
64 above and in the HOWTO).
65
66 * The profile "disabled" USE flags are simply hard-locked either on or
67 off by the profile, so aren't a USE flag option. It does NOT mean whatever
68 the USE flag controls is actually disabled. Sometimes, as with the
69 multilib USE flag, it can mean it's /enabled/. It just means that the
70 profile is set up to control it, generally for a pretty good reason. In
71 the particular cases you mention, the way Gentoo uses the SSE and similar
72 USE flags is 32-bit specific, enabling 32-bit specific assembler code in
73 the ebuild, for instance. As already mentioned, the AMD64 arch by
74 definition already has these features activated, so no 64-bit USE flags
75 are necessary, and enabling the 32-bit USE flags will cause breakage since
76 it activates 32-bit specific code in many instances. Thus the amd64
77 profiles have a /very/ good reason to hard-lock these USE flags "off". An
78 example where a USE flag is hard-locked ON by a profile would be multilib.
79 The normal AMD64 profiles are all multilib and thus lock this flag ON (tho
80 it's still shown as disabled), while 64-bit-only profiles lock it OFF.
81
82 A couple of other notes:
83
84 Portage now supports per-package CFLAGS and certain other variables as
85 controlled by the environment (as long as they are used in an ebuild.sh
86 phase, not the python phase, since execution is via a bashrc hook).
87 Create /etc/portage/env/<category> as a directory, populated with package
88 or package-version files. The contents of these files will be sourced
89 into the ebuild.sh execution environment for every phase that uses
90 ebuild.sh. CFLAGS and similar variables as found in these files REPLACE
91 (that is, they don't add to, they replace entirely) the default make.conf
92 CFLAGS. You can use this mechanism to specify specific CFLAGS for
93 specific packages, and could thus set -fomit-frame-pointer and other
94 32-bit x86 specific CFLAGS here if desired, avoiding them in your regular
95 make.conf.
96
97 You may wish to read a bit of the archives for this list, in particular,
98 the recent threads on gcc 4.1.1 CFLAGS, where I discuss mine.
99 Specifically, it's likely -O3 is actually /worse/ performing in many
100 instances than -O2 or even -Os (my choice). The reasoning is this: CPU
101 cycles are fairly cheap in a modern processor, while the expense of
102 waiting on main memory in the case of a cache miss is MUCH HIGHER, due to
103 the fact that main memory is clocked so much slower than cache. Smaller
104 code fits in cache better and is thus often faster than larger code, even
105 when the smaller code isn't as theoretically CPU cycle efficient. While
106 there will certainly be certain applications where -O3 is beneficial, I
107 believe if you do actual comparisons, you will find -O2 or -Os faster on a
108 system-wide basis. Of course, it's up to you and much virtual ink has
109 been spilled discussing this issue, but that's just my take on things. If
110 you've actually done speed comparisons on AMD64 or can point to some, I'd
111 certainly be interested, as I've honestly not cared enough about it to do
112 my own, but that's my general take in the absence of specific hard data to
113 the contrary. Rather than optimizing for CPU cycles (-O3), I choose to
114 optimize for better register usage (registers being at full CPU speed,
115 therefore faster even than L1 cache, -frename-registers and etc) size
116 (-Os, disabling loop unrolling), whole and multiple unit optimization
117 (-funit-at-a-time, -combine) and hot/cold partitioning
118 (-freorder-blocks-and-partition, tho it can't be used on C++ code, etc). A
119 few of my flags fail on a very few specific packages, another use for the
120 package specific CFLAGS stuff above.
121
122 --
123 Duncan - List replies preferred. No HTML msgs.
124 "Every nonfree program has a lord, a master --
125 and if you use the program, he is your master." Richard Stallman
126
127 --
128 gentoo-amd64@g.o mailing list

Replies

Subject Author
Re: [gentoo-amd64] Re: First Impressions Daniel Iliev <danny@××××××××.com>