Daniel Iliev <danny@...> posted 451A110B.2080401@...,
excerpted below, on Wed, 27 Sep 2006 08:50:03 +0300:
> So let me start a with 2 newbie questions caused by my first impressions
> from the x86_64 world:
> 1) I use CFLAGS="-march=athlon64 -mfpmath=sse -msse -msse2 -msse3
> -m3dnow -mmmx -O3 -fomit-frame-pointer -pipe -fpic". Portage complains
> with *red letters* about the fpic flag. Every time I emerge something it
> says that "fpic breaks things", but I haven't met a single breakage so
> far. Is that a bug? Actually there was an ebuild which could not be
> compiled if mysql was compiled w/o "fpic". I'm not 100% sure but AFAIR
> it was dev-perl/DBD-mysql.
> 2) I see too many flags that are disabled by the profile - the kind with
> the parenthesis around them, like "(-3dnow)". Why? As I mentioned above
> I enable some of these through my CFLAGS - e.g. (-mmx), (-mmxext),
> (-sse) and (-sse2) and everything works perfect.
It seems that you missed some of the Gentoo/AMD64 documentation.
Many/most of your questions are answered there. Unfortunately, I'm not
aware of a simple easy to use list of everything in one spot, so it's
reading a bit of documentation here, a bit more there, etc.
The main Gentoo/AMD64 project page. (This would be the logical place for
such a list, but it's more the project page, tho it links some of the
docs, it's just not as easy to find those links as it could be.)
Gentoo/AMD64 HOWTOs. (There's one on -fPIC here, tho the explanation is
a bit developer-centric.)
A brief direct answer to your questions follows:
* The sse etc CFLAGS are arch dependent. Unlike x86 where the
mmx/sse/other-extensions instructions were added as the arch matured, on
amd64, they are part of the definition of the arch itself. All x86_64
(amd64) CPUs will have mmx/sse/sse2, etc. Thus, -march=athlon64 already
tells gcc these are available to use where it wants/needs to. The others
don't therefore provide gcc any more information than what it already has.
* -fomit-frame-pointer isn't needed on 64-bit amd64 either, as it's turned
on for all -O levels on archs (including amd64) where doing so doesn't
interfere with debugging. (See the gcc manpage, under -O optimization.)
You may wish to continue to specify it for stuff that's compiled for
32-bit, however, including parts of gcc, a version of glibc, a version of
the (portage) sandbox library, etc.
* Generally speaking, -fPIC is required on amd64 for ALL LIBRARIES but the
ebuilds normally take care of it. Under certain circumstances (like
unsupported CFLAGS), the configure scripts will turn it off by mistake, see
the above mentioned -fPIC HOWTO link for details, but the solution isn't
to add it to your CFLAGS, as that means it will be used for executable
applications as well as libraries, and /some/ applications /do/ break with
it. Not many, but some, and if it's in your CFLAGS, you WILL have bugs
you file closed as INVALID or the like, due to CFLAG abuse. If there's
something not working without it, then THAT'S a bug and should be filed as
such (unless it's due to use of CFLAGS gcc doesn't support and warns
about, thus triggering the configure script detection problem discussed
above and in the HOWTO).
* The profile "disabled" USE flags are simply hard-locked either on or
off by the profile, so aren't a USE flag option. It does NOT mean whatever
the USE flag controls is actually disabled. Sometimes, as with the
multilib USE flag, it can mean it's /enabled/. It just means that the
profile is set up to control it, generally for a pretty good reason. In
the particular cases you mention, the way Gentoo uses the SSE and similar
USE flags is 32-bit specific, enabling 32-bit specific assembler code in
the ebuild, for instance. As already mentioned, the AMD64 arch by
definition already has these features activated, so no 64-bit USE flags
are necessary, and enabling the 32-bit USE flags will cause breakage since
it activates 32-bit specific code in many instances. Thus the amd64
profiles have a /very/ good reason to hard-lock these USE flags "off". An
example where a USE flag is hard-locked ON by a profile would be multilib.
The normal AMD64 profiles are all multilib and thus lock this flag ON (tho
it's still shown as disabled), while 64-bit-only profiles lock it OFF.
A couple of other notes:
Portage now supports per-package CFLAGS and certain other variables as
controlled by the environment (as long as they are used in an ebuild.sh
phase, not the python phase, since execution is via a bashrc hook).
Create /etc/portage/env/<category> as a directory, populated with package
or package-version files. The contents of these files will be sourced
into the ebuild.sh execution environment for every phase that uses
ebuild.sh. CFLAGS and similar variables as found in these files REPLACE
(that is, they don't add to, they replace entirely) the default make.conf
CFLAGS. You can use this mechanism to specify specific CFLAGS for
specific packages, and could thus set -fomit-frame-pointer and other
32-bit x86 specific CFLAGS here if desired, avoiding them in your regular
You may wish to read a bit of the archives for this list, in particular,
the recent threads on gcc 4.1.1 CFLAGS, where I discuss mine.
Specifically, it's likely -O3 is actually /worse/ performing in many
instances than -O2 or even -Os (my choice). The reasoning is this: CPU
cycles are fairly cheap in a modern processor, while the expense of
waiting on main memory in the case of a cache miss is MUCH HIGHER, due to
the fact that main memory is clocked so much slower than cache. Smaller
code fits in cache better and is thus often faster than larger code, even
when the smaller code isn't as theoretically CPU cycle efficient. While
there will certainly be certain applications where -O3 is beneficial, I
believe if you do actual comparisons, you will find -O2 or -Os faster on a
system-wide basis. Of course, it's up to you and much virtual ink has
been spilled discussing this issue, but that's just my take on things. If
you've actually done speed comparisons on AMD64 or can point to some, I'd
certainly be interested, as I've honestly not cared enough about it to do
my own, but that's my general take in the absence of specific hard data to
the contrary. Rather than optimizing for CPU cycles (-O3), I choose to
optimize for better register usage (registers being at full CPU speed,
therefore faster even than L1 cache, -frename-registers and etc) size
(-Os, disabling loop unrolling), whole and multiple unit optimization
(-funit-at-a-time, -combine) and hot/cold partitioning
(-freorder-blocks-and-partition, tho it can't be used on C++ code, etc). A
few of my flags fail on a very few specific packages, another use for the
package specific CFLAGS stuff above.
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
firstname.lastname@example.org mailing list