Gentoo Logo
Gentoo Spaceship




Note: Due to technical difficulties, the Archives are currently not up to date. GMANE provides an alternative service for most mailing lists.
c.f. bug 424647
List Archive: gentoo-performance
Navigation:
Lists: gentoo-performance: < Prev By Thread Next > < Prev By Date Next >
Headers:
To: gentoo-performance@g.o
From: Jan Jitse Venselaar <J.J.Venselaar@...>
Subject: Re: Re: getting athlon-xp/geforce4 system "on the top"
Date: Wed, 07 May 2003 17:31:22 +0200
MAL wrote:
> Koby Boy wrote:
> 
>> On Tue, 2003-05-06 at 03:24, eNTi wrote:
>>
>>> CFLAGS="-mcpu=athlon-xp -O3 -pipe -m3dnow -msse -mmmx -Wall 
>>> -fomit-frame-pointer" / CHOST="i686-pc-linux-gnu"
>>
>>
>> I've got an Athon-XP 2400 (2 GHz) but I've over clocked it to the
>> equivalent of an Athlon-XP 2600 (2.2GHz).  My CFLAGS look like this:
>>
>> CFLAGS="-mcpu=athlon-xp -march=athlon-xp -O3 -fforce-addr
>> -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop
>> -frerun-loop-opt -falign-functions=4 -foptimize-sibling-calls
>> -fexpensive-optimizations -pipe -m3dnow -mmmx -msse -mfpmath=sse,387"
> 
> 
> -march=athlon-xp implies:
>   -mcpu=athlon-xp
> 
> -march=athlon-xp implies:
>   -m3dnow
>   -mmmx
>   -msse
> 
> -O3 implies -O2 which implies:
>   -frerun-loop-opt
>   -frerun-cse-after-loop
>   -frerun-loop-opt
>   -falign-functions
>   -fexpensive-optimizations
> 
> -mfpmath=sse,387 is still unstable.. asking for trouble!
> 
> Lastly, many people have expressed that -funroll-loops slows down more 
> code than it speeds up.
> 
> I'd be surprised if you see much improvement over '-O3 -pipe', in 99% of 
> apps, not worth it's time in debugging broken code :/
> 
> man gcc *g*
> 
> MAL
> 
> 
> -- 
> gentoo-performance@g.o mailing list
Actually, mfpmath=sse,387 works for most programs, but it is actually 
slower for me than mfpmath=387 on my Athlon-XP, at least in my 
benchmark, which is lame encoding. -funroll-loops slows it down.
setting -falign-functions to something bigger than 4 (the default I 
believe) does make a small positive difference. -falign-loops and 
-falign-jumps also should be set to 5 or something like that, for 
optimal speed.
I also tried -malign-double and -m128bit-long-double, which speed up 
Lame some more, but breaks ABI compatibility and make the code size larger.
I know that running only Lame isn't the best way to bench, but I think 
that Lame uses a large set of functions, and if Lame runs fast, other 
programs shouldn't be slow.
As for debugging vs performance, I'm just a person that likes to tweak 
performance to the utmost, and some breakage and debugging just adds to 
the fun I think.

FCA


--
gentoo-performance@g.o mailing list

References:
getting athlon-xp/geforce4 system "on the top"
-- eNTi
Re: getting athlon-xp/geforce4 system "on the top"
-- Koby Boy
Re: getting athlon-xp/geforce4 system "on the top"
-- MAL
Navigation:
Lists: gentoo-performance: < Prev By Thread Next > < Prev By Date Next >
Previous by thread:
Re: getting athlon-xp/geforce4 system "on the top"
Next by thread:
KDE apps just started S L O W I N G down for no reason...
Previous by date:
Re: getting athlon-xp/geforce4 system "on the top"
Next by date:
KDE apps just started S L O W I N G down for no reason...


Updated Jun 17, 2009

Summary: Archive of the gentoo-performance mailing list.

Donate to support our development efforts.

Copyright 2001-2013 Gentoo Foundation, Inc. Questions, Comments? Contact us.