1 |
Hello, |
2 |
|
3 |
On Sun, 27 Apr 2014 07:23:11 -0400 Rich Freeman wrote: |
4 |
> And yet, in the same paragraph you mention -O3, which is |
5 |
> tantamount to just setting a flag and walking away. That turns |
6 |
> on 14 things you probably don't really need. |
7 |
|
8 |
Why 14 things? According to gcc-4.8.2 manual -O3 enables the |
9 |
following: |
10 |
-finline-functions, -funswitch-loops, -fpredictive-commoning, |
11 |
-fgcse-after-reload, -ftree-vectorize, -fvect-cost-model, |
12 |
-ftree-partial-pre, -fipa-cp-clone. |
13 |
Some of this options triggers another ones, but these 8 things are |
14 |
sufficient to mimic -O3 completely. |
15 |
|
16 |
From my experience only three of them are harmful: |
17 |
-finline-functions and -fipa-cp-clone bloat code size significantly |
18 |
hurting performance due to more CPU cache misses. |
19 |
-ftree-vectorize may be used on amd64 (performance boost is in the |
20 |
range -3.. +5%), but is a complete menace on x86: a lot of ICEs and |
21 |
a lot of segfaults due to stack misalignment and even some working |
22 |
but miscompiled code. While some (but not all) stack alignment |
23 |
issues may be fixed with -mstackrealign, this drops performance |
24 |
enhancement to negative values. |
25 |
|
26 |
All other -O3 option have either no effect or measurable |
27 |
performance enhancements in the range of several percent. |
28 |
|
29 |
Tests were made using multimedia packages (mplayer, ffmpeg, x264) |
30 |
and scientific ones (root, pythia, geant, blas libs). |
31 |
|
32 |
Best regards, |
33 |
Andrew Savchenko |