Gentoo Archives: gentoo-performance

From: Jan Jitse Venselaar <J.J.Venselaar@×××××××.nl>
To: gentoo-performance@g.o
Subject: Re: [gentoo-performance] Re: getting athlon-xp/geforce4 system "on the top"
Date: Wed, 07 May 2003 15:28:55
Message-Id: 3EB926CA.9010602@phys.uu.nl
In Reply to: [gentoo-performance] Re: getting athlon-xp/geforce4 system "on the top" by MAL
1 MAL wrote:
2 > Koby Boy wrote:
3 >
4 >> On Tue, 2003-05-06 at 03:24, eNTi wrote:
5 >>
6 >>> CFLAGS="-mcpu=athlon-xp -O3 -pipe -m3dnow -msse -mmmx -Wall
7 >>> -fomit-frame-pointer" / CHOST="i686-pc-linux-gnu"
8 >>
9 >>
10 >> I've got an Athon-XP 2400 (2 GHz) but I've over clocked it to the
11 >> equivalent of an Athlon-XP 2600 (2.2GHz). My CFLAGS look like this:
12 >>
13 >> CFLAGS="-mcpu=athlon-xp -march=athlon-xp -O3 -fforce-addr
14 >> -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop
15 >> -frerun-loop-opt -falign-functions=4 -foptimize-sibling-calls
16 >> -fexpensive-optimizations -pipe -m3dnow -mmmx -msse -mfpmath=sse,387"
17 >
18 >
19 > -march=athlon-xp implies:
20 > -mcpu=athlon-xp
21 >
22 > -march=athlon-xp implies:
23 > -m3dnow
24 > -mmmx
25 > -msse
26 >
27 > -O3 implies -O2 which implies:
28 > -frerun-loop-opt
29 > -frerun-cse-after-loop
30 > -frerun-loop-opt
31 > -falign-functions
32 > -fexpensive-optimizations
33 >
34 > -mfpmath=sse,387 is still unstable.. asking for trouble!
35 >
36 > Lastly, many people have expressed that -funroll-loops slows down more
37 > code than it speeds up.
38 >
39 > I'd be surprised if you see much improvement over '-O3 -pipe', in 99% of
40 > apps, not worth it's time in debugging broken code :/
41 >
42 > man gcc *g*
43 >
44 > MAL
45 >
46 >
47 > --
48 > gentoo-performance@g.o mailing list
49 Actually, mfpmath=sse,387 works for most programs, but it is actually
50 slower for me than mfpmath=387 on my Athlon-XP, at least in my
51 benchmark, which is lame encoding. -funroll-loops slows it down.
52 setting -falign-functions to something bigger than 4 (the default I
53 believe) does make a small positive difference. -falign-loops and
54 -falign-jumps also should be set to 5 or something like that, for
55 optimal speed.
56 I also tried -malign-double and -m128bit-long-double, which speed up
57 Lame some more, but breaks ABI compatibility and make the code size larger.
58 I know that running only Lame isn't the best way to bench, but I think
59 that Lame uses a large set of functions, and if Lame runs fast, other
60 programs shouldn't be slow.
61 As for debugging vs performance, I'm just a person that likes to tweak
62 performance to the utmost, and some breakage and debugging just adds to
63 the fun I think.
64
65 FCA
66
67
68 --
69 gentoo-performance@g.o mailing list