1 |
Richard Freeman <rich0@g.o> posted 48E759DB.6080905@g.o, |
2 |
excerpted below, on Sat, 04 Oct 2008 07:56:11 -0400: |
3 |
|
4 |
> Ben de Groot wrote: |
5 |
>> |
6 |
>> -Os optimizes for size, while -O2 optimizes for speed. There is no need |
7 |
>> at all to use -Os on a modern desktop machine, and it will run |
8 |
>> comparatively slower than -O2 optimized code, which is probably not |
9 |
>> what you want. |
10 |
>> |
11 |
>> |
12 |
> There are a couple of schools of thought on that, and I think |
13 |
> performance can depend a great deal on what program you're talking |
14 |
> about. |
15 |
> |
16 |
> On any machine, memory is a limited resource. Oh sure, you could just |
17 |
> "spend a little more on decent RAM", but you could also "spend a little |
18 |
> more on a decent CPU" or whatever. For a given amount of money you can |
19 |
> only buy so much hardware, so any dollar spent on RAM is a dollar not |
20 |
> spent on something else. |
21 |
|
22 |
I agree, but stress the limits of the L1/L2(/L3?) caches more than |
23 |
general system memory. You can generally buy more system memory fairly |
24 |
easily, but Lx cache sizes aren't so easy to chance, and are likely to |
25 |
remain quite limited resources for some time. |
26 |
|
27 |
So I generally see the benefits of -Os over -O2, with a few exceptions. |
28 |
-freorder-blocks-and-partition (only in CFLAGS, it doesn't work on C++/ |
29 |
CXXFLAGS) can increase the raw size but manages cache better because it |
30 |
separates code into hot and cold blocks, giving the hot code a better |
31 |
chance at staying in-cache (according to the gcc manpage). There's a |
32 |
couple other similar flags. |
33 |
|
34 |
Of course, to /really/ get performance, one would need to compile with |
35 |
code profiling instrumentation turned on, run the program as you would |
36 |
normally (but profiled) for awhile to generate some profiling history, |
37 |
then recompile using that history to help optimize things. This BTW is |
38 |
one of the reasons I wonder about -ftracer when I see it in someone's |
39 |
CFLAGS. The gcc manpage says it helps other optimization, but then links |
40 |
it to -fprofile-use. How much help it is without the profiling isn't |
41 |
covered, but given the increase in size and the effect of that on caches, |
42 |
it's likely not worth it without the profiling. How many people compile |
43 |
first for profiling, run the program to generate profiles, then recompile |
44 |
using the profile data? Right, not so many, at least for most apps. In |
45 |
that case, why do they have -ftracer in their general CFLAGS? |
46 |
|
47 |
That said, I recently switched to -O2 from my long time -Os. Much of the |
48 |
difference in gcc-3 was due to -funit-at-a-time and similar |
49 |
optimizations, enabled by default early on for -Os, but not for -O2 until |
50 |
gcc-4.something, I believe. Modern gcc is more cache-usage-performance |
51 |
aware than gcc-3 was, and I think most of the remaining differences are |
52 |
like the -freorder-blocks-and-partition thing, they affect CPU cache |
53 |
usage negatively enough that you don't want them enabled except for old |
54 |
machines, embedded, and perhaps the now popular netbook/atom type |
55 |
applications. |
56 |
|
57 |
Talking about which... I just got my Acer Aspire One (32-bit Atom n270 |
58 |
CPU), and intend to do a 32-bit chroot on my main machine and create |
59 |
binpkgs to merge to the AA1. Any idea what sort of CFLAGS to use on it? |
60 |
I know it doesn't have all that fancy branch prediction and prefetch |
61 |
stuff of a normal modern x86_(32/64) CPU. One suggestion I've seen is |
62 |
-march=686, and I'll probably do -Os for it, but what about stuff like |
63 |
-fweb -frename-registers, etc? It does have thru SSE3 at least, so I can |
64 |
enable that too. |
65 |
|
66 |
-- |
67 |
Duncan - List replies preferred. No HTML msgs. |
68 |
"Every nonfree program has a lord, a master -- |
69 |
and if you use the program, he is your master." Richard Stallman |