Gentoo Archives: gentoo-performance

From: Jan Jitse Venselaar <J.J.Venselaar@×××××××.nl>
To: gentoo-performance@g.o
Subject: Re: [gentoo-performance] Re: getting athlon-xp/geforce4 system "on the top"
Date: Wed, 07 May 2003 15:28:55
In Reply to: [gentoo-performance] Re: getting athlon-xp/geforce4 system "on the top" by MAL
MAL wrote:
> Koby Boy wrote: > >> On Tue, 2003-05-06 at 03:24, eNTi wrote: >> >>> CFLAGS="-mcpu=athlon-xp -O3 -pipe -m3dnow -msse -mmmx -Wall >>> -fomit-frame-pointer" / CHOST="i686-pc-linux-gnu" >> >> >> I've got an Athon-XP 2400 (2 GHz) but I've over clocked it to the >> equivalent of an Athlon-XP 2600 (2.2GHz). My CFLAGS look like this: >> >> CFLAGS="-mcpu=athlon-xp -march=athlon-xp -O3 -fforce-addr >> -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop >> -frerun-loop-opt -falign-functions=4 -foptimize-sibling-calls >> -fexpensive-optimizations -pipe -m3dnow -mmmx -msse -mfpmath=sse,387" > > > -march=athlon-xp implies: > -mcpu=athlon-xp > > -march=athlon-xp implies: > -m3dnow > -mmmx > -msse > > -O3 implies -O2 which implies: > -frerun-loop-opt > -frerun-cse-after-loop > -frerun-loop-opt > -falign-functions > -fexpensive-optimizations > > -mfpmath=sse,387 is still unstable.. asking for trouble! > > Lastly, many people have expressed that -funroll-loops slows down more > code than it speeds up. > > I'd be surprised if you see much improvement over '-O3 -pipe', in 99% of > apps, not worth it's time in debugging broken code :/ > > man gcc *g* > > MAL > > > -- > gentoo-performance@g.o mailing list
Actually, mfpmath=sse,387 works for most programs, but it is actually slower for me than mfpmath=387 on my Athlon-XP, at least in my benchmark, which is lame encoding. -funroll-loops slows it down. setting -falign-functions to something bigger than 4 (the default I believe) does make a small positive difference. -falign-loops and -falign-jumps also should be set to 5 or something like that, for optimal speed. I also tried -malign-double and -m128bit-long-double, which speed up Lame some more, but breaks ABI compatibility and make the code size larger. I know that running only Lame isn't the best way to bench, but I think that Lame uses a large set of functions, and if Lame runs fast, other programs shouldn't be slow. As for debugging vs performance, I'm just a person that likes to tweak performance to the utmost, and some breakage and debugging just adds to the fun I think. FCA -- gentoo-performance@g.o mailing list