1 |
Daniel Iliev <danny@××××××××.com> posted 451A110B.2080401@××××××××.com, |
2 |
excerpted below, on Wed, 27 Sep 2006 08:50:03 +0300: |
3 |
|
4 |
> So let me start a with 2 newbie questions caused by my first impressions |
5 |
> from the x86_64 world: |
6 |
> |
7 |
> 1) I use CFLAGS="-march=athlon64 -mfpmath=sse -msse -msse2 -msse3 |
8 |
> -m3dnow -mmmx -O3 -fomit-frame-pointer -pipe -fpic". Portage complains |
9 |
> with *red letters* about the fpic flag. Every time I emerge something it |
10 |
> says that "fpic breaks things", but I haven't met a single breakage so |
11 |
> far. Is that a bug? Actually there was an ebuild which could not be |
12 |
> compiled if mysql was compiled w/o "fpic". I'm not 100% sure but AFAIR |
13 |
> it was dev-perl/DBD-mysql. |
14 |
> |
15 |
> 2) I see too many flags that are disabled by the profile - the kind with |
16 |
> the parenthesis around them, like "(-3dnow)". Why? As I mentioned above |
17 |
> I enable some of these through my CFLAGS - e.g. (-mmx), (-mmxext), |
18 |
> (-sse) and (-sse2) and everything works perfect. |
19 |
|
20 |
It seems that you missed some of the Gentoo/AMD64 documentation. |
21 |
Many/most of your questions are answered there. Unfortunately, I'm not |
22 |
aware of a simple easy to use list of everything in one spot, so it's |
23 |
reading a bit of documentation here, a bit more there, etc. |
24 |
|
25 |
The main Gentoo/AMD64 project page. (This would be the logical place for |
26 |
such a list, but it's more the project page, tho it links some of the |
27 |
docs, it's just not as easy to find those links as it could be.) |
28 |
http://amd64.gentoo.org |
29 |
|
30 |
Gentoo/AMD64 FAQ: |
31 |
http://www.gentoo.org/doc/en/gentoo-amd64-faq.xml |
32 |
|
33 |
Gentoo/AMD64 HOWTOs. (There's one on -fPIC here, tho the explanation is |
34 |
a bit developer-centric.) |
35 |
http://www.gentoo.org/proj/en/base/amd64/howtos/index.xml |
36 |
|
37 |
A brief direct answer to your questions follows: |
38 |
|
39 |
* The sse etc CFLAGS are arch dependent. Unlike x86 where the |
40 |
mmx/sse/other-extensions instructions were added as the arch matured, on |
41 |
amd64, they are part of the definition of the arch itself. All x86_64 |
42 |
(amd64) CPUs will have mmx/sse/sse2, etc. Thus, -march=athlon64 already |
43 |
tells gcc these are available to use where it wants/needs to. The others |
44 |
don't therefore provide gcc any more information than what it already has. |
45 |
|
46 |
* -fomit-frame-pointer isn't needed on 64-bit amd64 either, as it's turned |
47 |
on for all -O levels on archs (including amd64) where doing so doesn't |
48 |
interfere with debugging. (See the gcc manpage, under -O optimization.) |
49 |
You may wish to continue to specify it for stuff that's compiled for |
50 |
32-bit, however, including parts of gcc, a version of glibc, a version of |
51 |
the (portage) sandbox library, etc. |
52 |
|
53 |
* Generally speaking, -fPIC is required on amd64 for ALL LIBRARIES but the |
54 |
ebuilds normally take care of it. Under certain circumstances (like |
55 |
unsupported CFLAGS), the configure scripts will turn it off by mistake, see |
56 |
the above mentioned -fPIC HOWTO link for details, but the solution isn't |
57 |
to add it to your CFLAGS, as that means it will be used for executable |
58 |
applications as well as libraries, and /some/ applications /do/ break with |
59 |
it. Not many, but some, and if it's in your CFLAGS, you WILL have bugs |
60 |
you file closed as INVALID or the like, due to CFLAG abuse. If there's |
61 |
something not working without it, then THAT'S a bug and should be filed as |
62 |
such (unless it's due to use of CFLAGS gcc doesn't support and warns |
63 |
about, thus triggering the configure script detection problem discussed |
64 |
above and in the HOWTO). |
65 |
|
66 |
* The profile "disabled" USE flags are simply hard-locked either on or |
67 |
off by the profile, so aren't a USE flag option. It does NOT mean whatever |
68 |
the USE flag controls is actually disabled. Sometimes, as with the |
69 |
multilib USE flag, it can mean it's /enabled/. It just means that the |
70 |
profile is set up to control it, generally for a pretty good reason. In |
71 |
the particular cases you mention, the way Gentoo uses the SSE and similar |
72 |
USE flags is 32-bit specific, enabling 32-bit specific assembler code in |
73 |
the ebuild, for instance. As already mentioned, the AMD64 arch by |
74 |
definition already has these features activated, so no 64-bit USE flags |
75 |
are necessary, and enabling the 32-bit USE flags will cause breakage since |
76 |
it activates 32-bit specific code in many instances. Thus the amd64 |
77 |
profiles have a /very/ good reason to hard-lock these USE flags "off". An |
78 |
example where a USE flag is hard-locked ON by a profile would be multilib. |
79 |
The normal AMD64 profiles are all multilib and thus lock this flag ON (tho |
80 |
it's still shown as disabled), while 64-bit-only profiles lock it OFF. |
81 |
|
82 |
A couple of other notes: |
83 |
|
84 |
Portage now supports per-package CFLAGS and certain other variables as |
85 |
controlled by the environment (as long as they are used in an ebuild.sh |
86 |
phase, not the python phase, since execution is via a bashrc hook). |
87 |
Create /etc/portage/env/<category> as a directory, populated with package |
88 |
or package-version files. The contents of these files will be sourced |
89 |
into the ebuild.sh execution environment for every phase that uses |
90 |
ebuild.sh. CFLAGS and similar variables as found in these files REPLACE |
91 |
(that is, they don't add to, they replace entirely) the default make.conf |
92 |
CFLAGS. You can use this mechanism to specify specific CFLAGS for |
93 |
specific packages, and could thus set -fomit-frame-pointer and other |
94 |
32-bit x86 specific CFLAGS here if desired, avoiding them in your regular |
95 |
make.conf. |
96 |
|
97 |
You may wish to read a bit of the archives for this list, in particular, |
98 |
the recent threads on gcc 4.1.1 CFLAGS, where I discuss mine. |
99 |
Specifically, it's likely -O3 is actually /worse/ performing in many |
100 |
instances than -O2 or even -Os (my choice). The reasoning is this: CPU |
101 |
cycles are fairly cheap in a modern processor, while the expense of |
102 |
waiting on main memory in the case of a cache miss is MUCH HIGHER, due to |
103 |
the fact that main memory is clocked so much slower than cache. Smaller |
104 |
code fits in cache better and is thus often faster than larger code, even |
105 |
when the smaller code isn't as theoretically CPU cycle efficient. While |
106 |
there will certainly be certain applications where -O3 is beneficial, I |
107 |
believe if you do actual comparisons, you will find -O2 or -Os faster on a |
108 |
system-wide basis. Of course, it's up to you and much virtual ink has |
109 |
been spilled discussing this issue, but that's just my take on things. If |
110 |
you've actually done speed comparisons on AMD64 or can point to some, I'd |
111 |
certainly be interested, as I've honestly not cared enough about it to do |
112 |
my own, but that's my general take in the absence of specific hard data to |
113 |
the contrary. Rather than optimizing for CPU cycles (-O3), I choose to |
114 |
optimize for better register usage (registers being at full CPU speed, |
115 |
therefore faster even than L1 cache, -frename-registers and etc) size |
116 |
(-Os, disabling loop unrolling), whole and multiple unit optimization |
117 |
(-funit-at-a-time, -combine) and hot/cold partitioning |
118 |
(-freorder-blocks-and-partition, tho it can't be used on C++ code, etc). A |
119 |
few of my flags fail on a very few specific packages, another use for the |
120 |
package specific CFLAGS stuff above. |
121 |
|
122 |
-- |
123 |
Duncan - List replies preferred. No HTML msgs. |
124 |
"Every nonfree program has a lord, a master -- |
125 |
and if you use the program, he is your master." Richard Stallman |
126 |
|
127 |
-- |
128 |
gentoo-amd64@g.o mailing list |