1 |
Mike Frysinger wrote: |
2 |
> On Saturday 14 October 2006 04:49, Sebastian Bergmann wrote: |
3 |
|
4 |
>> CFLAGS="-march=prescott -O2 -pipe -fomit-frame-pointer" |
5 |
|
6 |
> here's a good reason why gentoo-wiki is not official ... this is wrong. the |
7 |
> duo cpu's are not based on the pentium4 which is what the prescott is |
8 |
|
9 |
That was put there by me. The thing is, while the Core CPUs have more |
10 |
in common with the Pentium-M micro-architecture, -march=pentium-m highly |
11 |
favors generating x87 over SSE/SSE2 instructions, since on a Pentium-M |
12 |
doing SSE/SSE2 was somewhere in the neighbourhood of 30% slower. Core |
13 |
chips have improved decoding and use micro-op fusion to combine up to |
14 |
four SSE instructions. Also, with code doing fp to int conversion or |
15 |
single precision division, SSE scalar ops are the win on Core (though |
16 |
not as big as Netburst) since x87 instructions have to write data to |
17 |
memory and read it again to reduce precision. |
18 |
|
19 |
Based on that I've been doing benchmarks with GCC 4.1 and trunk and I |
20 |
usually find '-march=prescott -mfpmath=sse' to do a bit better than |
21 |
'-march=pentium-m -mfpmath=sse -msse3', and just plain '-march=prescott' |
22 |
to be near identical to plain '-march=pentium-m' (for those ebuilds that |
23 |
call strip-flags ;), though the latter is on average <=1% faster. |
24 |
'-march=pentium-m -msse3' has actually been the worst performer, though |
25 |
I have no idea why it's slower than just '-march=pentium-m'. To be |
26 |
honest I don't really trust GCC's SSE3 support in it's current state. |
27 |
|
28 |
I've looked hard and long for an official answer to this but no one |
29 |
seems to be able to give a concrete reason why one is better than the |
30 |
other, other than "it's based on the Pentium-M". It _is_, but it's |
31 |
still a very different animal. Until I got one I thought it'd be best |
32 |
to document both. |
33 |
|
34 |
|
35 |
--de. |