1 |
Miguel Sousa Filipe wrote: |
2 |
> |
3 |
> A good CFLAGS would be something not very agressive, something like: |
4 |
> -march=<cpuType> -O3 or -O2 and at most -fomit-frame-pointer. |
5 |
> (Scientific workloads can speedup considerably with: -ffast-math) |
6 |
|
7 |
You must be careful on cpuflags. -ffast-math should never be used |
8 |
globally. Applications that benefit from it specify it themselves. The |
9 |
most realistic is -O2. |
10 |
|
11 |
The remark about processors being optimized for i386 is a bit off beat. |
12 |
Actually what -march does is enable additional instructions to be used |
13 |
that are faster (and available) on those processors. Especially mmx{,2}, |
14 |
sse{,2,3} are very beneficial to certain kinds of applications, and can |
15 |
give big speedups. Considering that, however, many applications that are |
16 |
sensitive to this such as mplayer can do run-time cpu detection and use |
17 |
different code (often written in assembler) depending on the cpu at runtime. |
18 |
|
19 |
In general it is not wise to use -O3. Most distributions compile with |
20 |
-O2. -O3 therefore is less well tested and may expose bugs in both the |
21 |
compiler and (more importantly) the applications. |
22 |
|
23 |
Besides the instructions that are available to the compiler (influenced |
24 |
by -march) there is then the issue of instruction scheduling. To put it |
25 |
simply, all modern processors have the possibility to process multiple |
26 |
instructions in parallel when they do not conflict. The precise |
27 |
architectures of this differ and the ability to parallelize depends |
28 |
quite much on the scheduling of the instructions. This is influenced by |
29 |
the -mtune flag (set implicitly by -march). This is an optimization that |
30 |
is most likely also present in a binary distribution. Gentoo can however |
31 |
optimize to the precise processor used. |
32 |
|
33 |
Paul |
34 |
|
35 |
-- |
36 |
gentoo-performance@g.o mailing list |