Gentoo Archives: gentoo-performance

From:	Jan Jitse Venselaar <J.J.Venselaar@×××××××.nl>
To:	gentoo-performance@g.o
Subject:	Re: [gentoo-performance] Re: getting athlon-xp/geforce4 system "on the top"
Date:	Wed, 07 May 2003 15:28:55
Message-Id:	`3EB926CA.9010602@phys.uu.nl`
In Reply to:	[gentoo-performance] Re: getting athlon-xp/geforce4 system "on the top" by MAL

1	MAL wrote:
2	> Koby Boy wrote:
3	>
4	>> On Tue, 2003-05-06 at 03:24, eNTi wrote:
5	>>
6	>>> CFLAGS="-mcpu=athlon-xp -O3 -pipe -m3dnow -msse -mmmx -Wall
7	>>> -fomit-frame-pointer" / CHOST="i686-pc-linux-gnu"
8	>>
9	>>
10	>> I've got an Athon-XP 2400 (2 GHz) but I've over clocked it to the
11	>> equivalent of an Athlon-XP 2600 (2.2GHz). My CFLAGS look like this:
12	>>
13	>> CFLAGS="-mcpu=athlon-xp -march=athlon-xp -O3 -fforce-addr
14	>> -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop
15	>> -frerun-loop-opt -falign-functions=4 -foptimize-sibling-calls
16	>> -fexpensive-optimizations -pipe -m3dnow -mmmx -msse -mfpmath=sse,387"
17	>
18	>
19	> -march=athlon-xp implies:
20	> -mcpu=athlon-xp
21	>
22	> -march=athlon-xp implies:
23	> -m3dnow
24	> -mmmx
25	> -msse
26	>
27	> -O3 implies -O2 which implies:
28	> -frerun-loop-opt
29	> -frerun-cse-after-loop
30	> -frerun-loop-opt
31	> -falign-functions
32	> -fexpensive-optimizations
33	>
34	> -mfpmath=sse,387 is still unstable.. asking for trouble!
35	>
36	> Lastly, many people have expressed that -funroll-loops slows down more
37	> code than it speeds up.
38	>
39	> I'd be surprised if you see much improvement over '-O3 -pipe', in 99% of
40	> apps, not worth it's time in debugging broken code :/
41	>
42	> man gcc g
43	>
44	> MAL
45	>
46	>
47	> --
48	> gentoo-performance@g.o mailing list
49	Actually, mfpmath=sse,387 works for most programs, but it is actually
50	slower for me than mfpmath=387 on my Athlon-XP, at least in my
51	benchmark, which is lame encoding. -funroll-loops slows it down.
52	setting -falign-functions to something bigger than 4 (the default I
53	believe) does make a small positive difference. -falign-loops and
54	-falign-jumps also should be set to 5 or something like that, for
55	optimal speed.
56	I also tried -malign-double and -m128bit-long-double, which speed up
57	Lame some more, but breaks ABI compatibility and make the code size larger.
58	I know that running only Lame isn't the best way to bench, but I think
59	that Lame uses a large set of functions, and if Lame runs fast, other
60	programs shouldn't be slow.
61	As for debugging vs performance, I'm just a person that likes to tweak
62	performance to the utmost, and some breakage and debugging just adds to
63	the fun I think.
64
65	FCA
66
67
68	--
69	gentoo-performance@g.o mailing list

Report Message

Find on MARC Find on Google Groups