1 |
MAL wrote: |
2 |
> Koby Boy wrote: |
3 |
> |
4 |
>> On Tue, 2003-05-06 at 03:24, eNTi wrote: |
5 |
>> |
6 |
>>> CFLAGS="-mcpu=athlon-xp -O3 -pipe -m3dnow -msse -mmmx -Wall |
7 |
>>> -fomit-frame-pointer" / CHOST="i686-pc-linux-gnu" |
8 |
>> |
9 |
>> |
10 |
>> I've got an Athon-XP 2400 (2 GHz) but I've over clocked it to the |
11 |
>> equivalent of an Athlon-XP 2600 (2.2GHz). My CFLAGS look like this: |
12 |
>> |
13 |
>> CFLAGS="-mcpu=athlon-xp -march=athlon-xp -O3 -fforce-addr |
14 |
>> -fomit-frame-pointer -funroll-loops -frerun-cse-after-loop |
15 |
>> -frerun-loop-opt -falign-functions=4 -foptimize-sibling-calls |
16 |
>> -fexpensive-optimizations -pipe -m3dnow -mmmx -msse -mfpmath=sse,387" |
17 |
> |
18 |
> |
19 |
> -march=athlon-xp implies: |
20 |
> -mcpu=athlon-xp |
21 |
> |
22 |
> -march=athlon-xp implies: |
23 |
> -m3dnow |
24 |
> -mmmx |
25 |
> -msse |
26 |
> |
27 |
> -O3 implies -O2 which implies: |
28 |
> -frerun-loop-opt |
29 |
> -frerun-cse-after-loop |
30 |
> -frerun-loop-opt |
31 |
> -falign-functions |
32 |
> -fexpensive-optimizations |
33 |
> |
34 |
> -mfpmath=sse,387 is still unstable.. asking for trouble! |
35 |
> |
36 |
> Lastly, many people have expressed that -funroll-loops slows down more |
37 |
> code than it speeds up. |
38 |
> |
39 |
> I'd be surprised if you see much improvement over '-O3 -pipe', in 99% of |
40 |
> apps, not worth it's time in debugging broken code :/ |
41 |
> |
42 |
> man gcc *g* |
43 |
> |
44 |
> MAL |
45 |
> |
46 |
> |
47 |
> -- |
48 |
> gentoo-performance@g.o mailing list |
49 |
Actually, mfpmath=sse,387 works for most programs, but it is actually |
50 |
slower for me than mfpmath=387 on my Athlon-XP, at least in my |
51 |
benchmark, which is lame encoding. -funroll-loops slows it down. |
52 |
setting -falign-functions to something bigger than 4 (the default I |
53 |
believe) does make a small positive difference. -falign-loops and |
54 |
-falign-jumps also should be set to 5 or something like that, for |
55 |
optimal speed. |
56 |
I also tried -malign-double and -m128bit-long-double, which speed up |
57 |
Lame some more, but breaks ABI compatibility and make the code size larger. |
58 |
I know that running only Lame isn't the best way to bench, but I think |
59 |
that Lame uses a large set of functions, and if Lame runs fast, other |
60 |
programs shouldn't be slow. |
61 |
As for debugging vs performance, I'm just a person that likes to tweak |
62 |
performance to the utmost, and some breakage and debugging just adds to |
63 |
the fun I think. |
64 |
|
65 |
FCA |
66 |
|
67 |
|
68 |
-- |
69 |
gentoo-performance@g.o mailing list |