1 |
neysx 07/06/27 13:28:13 |
2 |
|
3 |
Modified: metadoc.xml |
4 |
Added: gcc-optimization.xml |
5 |
Log: |
6 |
mv co-guide.xml gcc-optimization.xmlco-guide.xml |
7 |
|
8 |
Revision Changes Path |
9 |
1.185 xml/htdocs/doc/en/metadoc.xml |
10 |
|
11 |
file : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/metadoc.xml?rev=1.185&view=markup |
12 |
plain: http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/metadoc.xml?rev=1.185&content-type=text/plain |
13 |
diff : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/metadoc.xml?r1=1.184&r2=1.185 |
14 |
|
15 |
Index: metadoc.xml |
16 |
=================================================================== |
17 |
RCS file: /var/cvsroot/gentoo/xml/htdocs/doc/en/metadoc.xml,v |
18 |
retrieving revision 1.184 |
19 |
retrieving revision 1.185 |
20 |
diff -u -r1.184 -r1.185 |
21 |
--- metadoc.xml 27 Jun 2007 06:04:17 -0000 1.184 |
22 |
+++ metadoc.xml 27 Jun 2007 13:28:13 -0000 1.185 |
23 |
@@ -1,9 +1,9 @@ |
24 |
<?xml version="1.0" encoding="UTF-8"?> |
25 |
-<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/metadoc.xml,v 1.184 2007/06/27 06:04:17 nightmorph Exp $ --> |
26 |
+<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/metadoc.xml,v 1.185 2007/06/27 13:28:13 neysx Exp $ --> |
27 |
<!DOCTYPE metadoc SYSTEM "/dtd/metadoc.dtd"> |
28 |
|
29 |
<metadoc lang="en"> |
30 |
-<version>1.108</version> |
31 |
+<version>1.109</version> |
32 |
<members> |
33 |
<lead>neysx</lead> |
34 |
<member>cam</member> |
35 |
@@ -397,7 +397,7 @@ |
36 |
<file id="zsh">/doc/en/zsh.xml</file> |
37 |
<file id="change-chost">/doc/en/change-chost.xml</file> |
38 |
<file id="xfce-config">/doc/en/xfce-config.xml</file> |
39 |
- <file id="co-guide">/doc/en/co-guide.xml</file> |
40 |
+ <file id="gcc-optimization">/doc/en/gcc-optimization.xml</file> |
41 |
<file id="qa-autofailure">/proj/en/qa/autofailure.xml</file> |
42 |
<file id="qa-automagic">/proj/en/qa/automagic.xml</file> |
43 |
<file id="qa-backtraces">/proj/en/qa/backtraces.xml</file> |
44 |
@@ -803,9 +803,9 @@ |
45 |
<memberof>sysadmin_specific</memberof> |
46 |
<fileid>home-router-howto</fileid> |
47 |
</doc> |
48 |
- <doc id="co-guide"> |
49 |
+ <doc id="gcc-optimization"> |
50 |
<memberof>sysadmin_specific</memberof> |
51 |
- <fileid>co-guide</fileid> |
52 |
+ <fileid>gcc-optimization</fileid> |
53 |
</doc> |
54 |
<doc id="gentoo-dev-handbook"> |
55 |
<memberof>gentoodev</memberof> |
56 |
|
57 |
|
58 |
|
59 |
1.1 xml/htdocs/doc/en/gcc-optimization.xml |
60 |
|
61 |
file : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/gcc-optimization.xml?rev=1.1&view=markup |
62 |
plain: http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/gcc-optimization.xml?rev=1.1&content-type=text/plain |
63 |
|
64 |
Index: gcc-optimization.xml |
65 |
=================================================================== |
66 |
<?xml version='1.0' encoding='UTF-8'?> |
67 |
|
68 |
<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/gcc-optimization.xml,v 1.1 2007/06/27 13:28:13 neysx Exp $ --> |
69 |
|
70 |
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
71 |
|
72 |
<guide link="/doc/en/gcc-optimization.xml"> |
73 |
|
74 |
<title>Compilation Optimization Guide</title> |
75 |
|
76 |
<author title="Author"> |
77 |
<mail link="nightmorph@g.o">Joshua Saddler</mail> |
78 |
</author> |
79 |
|
80 |
<abstract> |
81 |
This guide provides an introduction to optimizing compiled code using safe, sane |
82 |
CFLAGS and CXXFLAGS. It also as describes the theory behind optimizing in |
83 |
general. |
84 |
</abstract> |
85 |
|
86 |
<!-- The content of this document is licensed under the CC-BY-SA license --> |
87 |
<!-- See http://creativecommons.org/licenses/by-sa/2.5 --> |
88 |
<license/> |
89 |
|
90 |
<version>1.0</version> |
91 |
<date>2007-06-26</date> |
92 |
|
93 |
<chapter> |
94 |
<title>Introduction</title> |
95 |
<section> |
96 |
<title>What are CFLAGS and CXXFLAGS?</title> |
97 |
<body> |
98 |
|
99 |
<p> |
100 |
CFLAGS and CXXFLAGS are environment variables that are used to tell the GNU |
101 |
Compiler Collection, <c>gcc</c>, what kinds of switches to use when compiling |
102 |
source code. CFLAGS are for code written in C, while CXXFLAGS are for code |
103 |
written in C++. |
104 |
</p> |
105 |
|
106 |
<p> |
107 |
They can be used to decrease the amount of debug messages for a program, |
108 |
increase error warning levels, and, of course, to optimize the code produced. |
109 |
The <uri |
110 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Invoking-GCC.html#Invoking-GCC">GNU |
111 |
gcc handbook</uri> maintains a complete list of available options and their |
112 |
purposes. |
113 |
</p> |
114 |
|
115 |
</body> |
116 |
</section> |
117 |
<section> |
118 |
<title>How are they used?</title> |
119 |
<body> |
120 |
|
121 |
<p> |
122 |
CFLAGS and CXXFLAGS can be used in two ways. First, they can be used |
123 |
per-program, by directly invoking <c>gcc</c> and then some bit of code you wish |
124 |
to compile. |
125 |
</p> |
126 |
|
127 |
<pre caption="Compiling a program directly"> |
128 |
$ <i>CFLAGS="-march=i686" gcc file.c</i> |
129 |
</pre> |
130 |
|
131 |
<p> |
132 |
However, this should not be done when installing packages found in the Portage |
133 |
tree. Instead, set your CFLAGS and CXXFLAGS in <path>/etc/make.conf</path>. This |
134 |
way all packages will be compiled using the options you specify. |
135 |
</p> |
136 |
|
137 |
<pre caption="CFLAGS in /etc/make.conf"> |
138 |
CFLAGS="-march=athlon64 -O2 -pipe" |
139 |
CXXFLAGS="${CFLAGS}" |
140 |
</pre> |
141 |
|
142 |
<p> |
143 |
As you can see, CXXFLAGS is set to use all the options present in CFLAGS. This |
144 |
is what you'll want almost without fail. You shouldn't ever need to specify |
145 |
additional options in CXXFLAGS. |
146 |
</p> |
147 |
|
148 |
<impo> |
149 |
Portage cannot use CFLAGS on a per-package basis, nor is there any supported |
150 |
method of forcing it to do so. The flags you set in <path>/etc/make.conf</path> |
151 |
will be used for <e>all</e> packages you install. |
152 |
</impo> |
153 |
|
154 |
</body> |
155 |
</section> |
156 |
<section> |
157 |
<title>Misconceptions</title> |
158 |
<body> |
159 |
|
160 |
<p> |
161 |
While CFLAGS and CXXFLAGS can be very effective means of getting source code to |
162 |
produce smaller and/or faster binaries, they can also impair the function of |
163 |
your code, bloat its size, slow down its execution time, or even cause |
164 |
compilation failures! |
165 |
</p> |
166 |
|
167 |
<p> |
168 |
CFLAGS are not a magic bullet; they will not automatically make your system run |
169 |
any faster or your binaries to take up less space on disk. Adding more and more |
170 |
flags in an attempt to optimize (or "rice") your system is a sure recipe for |
171 |
failure. There is a point at which you will reach diminishing returns. |
172 |
</p> |
173 |
|
174 |
<p> |
175 |
Despite the bragging you'll find on the internet, aggressive CFLAGS and CXXFLAGS |
176 |
are far more likely to harm your programs than do them any good. Keep in mind |
177 |
that the reason the flags exist in the first place is because they are designed |
178 |
to be used at specific places for specific purposes. Just because one particular |
179 |
CFLAG is good for one bit of code doesn't mean that it is suited to compiling |
180 |
everything you will ever install on your machine! |
181 |
</p> |
182 |
|
183 |
</body> |
184 |
</section> |
185 |
<section> |
186 |
<title>Ready?</title> |
187 |
<body> |
188 |
|
189 |
<p> |
190 |
Now that you're aware of some of the risks involved, let's take a look at some |
191 |
sane, safe optimizations for your computer. These will hold you in good stead |
192 |
and will endear you to developers the next time you report a problem on <uri |
193 |
link="http://bugs.gentoo.org">Bugzilla</uri>. (Developers will usually request |
194 |
that you recompile a package with minimal CFLAGS to see if the problem persists. |
195 |
Remember, aggressive flags can ruin code.) |
196 |
</p> |
197 |
|
198 |
</body> |
199 |
</section> |
200 |
</chapter> |
201 |
|
202 |
<chapter> |
203 |
<title>Optimizing</title> |
204 |
<section> |
205 |
<title>The basics</title> |
206 |
<body> |
207 |
|
208 |
<p> |
209 |
The goal behind using CFLAGS and CXXFLAGS is to create code tailor-made to your |
210 |
system; it should function perfectly while being lean and fast, if possible. |
211 |
Sometimes these conditions are mutually exclusive, so we'll stick with |
212 |
combinations known to work well. Ideally, they are the best available for any |
213 |
CPU architecture. We'll mention the aggressive flags later so you know what to |
214 |
look out for. We won't discuss every option listed on the <c>gcc</c> manual |
215 |
(there are hundreds), but we'll cover the basic, most common flags. |
216 |
</p> |
217 |
|
218 |
<note> |
219 |
Whenever you're not sure what a flag actually does, refer to the relevant |
220 |
chapter of the <uri |
221 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">gcc |
222 |
manual</uri>. If you're still stumped, try Google, or check out the <c>gcc</c> |
223 |
<uri link="http://gcc.gnu.org/lists.html">mailing lists</uri>. |
224 |
</note> |
225 |
|
226 |
</body> |
227 |
</section> |
228 |
<section> |
229 |
<title>-march</title> |
230 |
<body> |
231 |
|
232 |
<p> |
233 |
The first and most important option is <c>-march</c>. This tells the compiler |
234 |
what code it should produce for your processor <uri |
235 |
link="http://en.wikipedia.org/wiki/Microarchitecture">architecture</uri> (or |
236 |
<e>arch</e>); it says that it should produce code for a certain kind of CPU. |
237 |
Different CPUs have different capabilities, support different instruction sets, |
238 |
and have different ways of executing code. The <c>-march</c> flag will instruct |
239 |
the compiler to produce code specifically for your CPU, with all its |
240 |
capabilities, features, instruction sets, quirks, and so on. |
241 |
</p> |
242 |
|
243 |
<p> |
244 |
Even though the CHOST variable in <path>/etc/make.conf</path> specifies the |
245 |
general architecture used, <c>-march</c> should still be used so that programs |
246 |
can be optimized for your specific processor. |
247 |
</p> |
248 |
|
249 |
<p> |
250 |
What kind of CPU do you have? To find out, run the following command: |
251 |
</p> |
252 |
|
253 |
<pre caption="Examining CPU information"> |
254 |
$ <i>cat /proc/cpuinfo</i> |
255 |
</pre> |
256 |
|
257 |
<p> |
258 |
Now let's see <c>-march</c> in action. This example is for an older Pentium III |
259 |
chip: |
260 |
</p> |
261 |
|
262 |
<pre caption="/etc/make.conf: Pentium III"> |
263 |
CFLAGS="-march=pentium3" |
264 |
CXXFLAGS="${CFLAGS}" |
265 |
</pre> |
266 |
|
267 |
<p> |
268 |
Here's another one for a 64-bit Sparc CPU: |
269 |
</p> |
270 |
|
271 |
<pre caption="/etc/make.conf: Sparc"> |
272 |
CFLAGS="-march=ultrasparc" |
273 |
CXXFLAGS="${CFLAGS}" |
274 |
</pre> |
275 |
|
276 |
|
277 |
<p> |
278 |
Also available are the <c>-mcpu</c> and <c>-mtune</c> flags. Either of these |
279 |
should <e>only</e> be used when there is no available <c>-march</c> option. |
280 |
What's the difference between them? <c>-march</c> is much more specific about |
281 |
which processor features will be used when compiling code; it is a better |
282 |
choice. <c>-mcpu</c> will produce much more generic code less optimized for your |
283 |
machine. <c>-mtune</c> is even more generic than <c>-mcpu</c>. Whenever |
284 |
possible, use <c>-march</c>. For some less common architectures such as PowerPC |
285 |
and Alpha, <c>-mcpu</c> must be used. |
286 |
</p> |
287 |
|
288 |
<note> |
289 |
For more suggested <c>-march</c> settings, please read chapter 5 of the |
290 |
appropriate <uri link="/doc/en/handbook/">Gentoo Installation Handbook</uri> |
291 |
for your arch. Also, read the <c>gcc</c> manual's list of <uri |
292 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Submodel-Options.html#Submodel-Options">architecture-specific |
293 |
options</uri>, as well as more detailed explanations about the differences |
294 |
between <c>-march</c>, <c>-mcpu</c>, and <c>-mtune</c>. This is quite helpful |
295 |
for determining which <c>-march</c> setting you should use, especially since on |
296 |
some architectures, such as x86, <c>-mcpu</c> is deprecated and <c>-mtune</c> |
297 |
should be used instead. |
298 |
</note> |
299 |
|
300 |
</body> |
301 |
</section> |
302 |
<section> |
303 |
<title>-O</title> |
304 |
<body> |
305 |
|
306 |
<p> |
307 |
Next up is the <c>-O</c> variable. This controls the overall level of |
308 |
optimization. This makes the code compilation take somewhat more time, and can |
309 |
take up much more memory, especially as you increase the level of optimization. |
310 |
</p> |
311 |
|
312 |
<p> |
313 |
There are five <c>-O</c> settings: <c>-O0</c>, <c>-O1</c>, <c>-O2</c>, |
314 |
<c>-O3</c>, and <c>-Os</c>. You should use only one of them in |
315 |
<path>/etc/make.conf</path>. |
316 |
</p> |
317 |
|
318 |
<p> |
319 |
The with the exception of <c>-O0</c>, the <c>-O</c> settings each activate |
320 |
several additional flags, so be sure to read the gcc manual's chapter on <uri |
321 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">optimization |
322 |
options</uri> to learn which flags are activated at each <c>-O</c> level, as |
323 |
well as some explanations as to what they do. |
324 |
</p> |
325 |
|
326 |
<p> |
327 |
Let's examine each optimization level: |
328 |
</p> |
329 |
|
330 |
<ul> |
331 |
<li> |
332 |
<c>-O0</c>: This level (that's the letter "O" followed by a zero) turns off |
333 |
optimization entirely and is the default if no <c>-O</c> level is specified |
334 |
in CFLAGS or CXXFLAGS. Your code will not be optimized; it's not normally |
335 |
desired. |
336 |
</li> |
337 |
<li> |
338 |
<c>-O1</c>: This is the most basic optimization level. The compiler will try |
339 |
to produce faster, smaller code without taking much compilation time. |
340 |
It's pretty basic, but it should get the job done all the time. |
341 |
</li> |
342 |
<li> |
343 |
<c>-O2</c>: A step up from <c>-O1</c>. This is the <e>recommended</e> level |
344 |
of optimization unless you have special needs (such as <c>-Os</c>, as will |
345 |
be explained shortly). <c>-O2</c> will activate a few more flags in addition |
346 |
to the ones activated by <c>-O1</c>. With <c>-O2</c>, the compiler will |
347 |
attempt to increase code performance without compromising on size, and |
348 |
without taking too much compilation time. |
349 |
</li> |
350 |
<li> |
351 |
<c>-O3</c>: This is the highest level of optimization possible, and also the |
352 |
riskiest. It will take a longer time to compile your code with this option, |
353 |
and in fact it <e>should not be used system-wide with <c>gcc</c> 4.x</e>. |
354 |
The behavior of <c>gcc</c> has changed significantly since version 3.x. In |
355 |
3.x, <c>-O3</c> has been shown to lead to marginally faster execution times |
356 |
over <c>-O2</c>, but this is no longer the case with <c>gcc</c> 4.x. |
357 |
Compiling all your packages with <c>-O3</c> <e>will</e> result in larger |
358 |
binaries that require more memory, and will significantly increase the odds |
359 |
of compilation failure or unexpected program behavior (including errors). |
360 |
The downsides outweigh the benefits; remember the principle of diminishing |
361 |
returns. <b>Using <c>-O3</c> is not recommended for <c>gcc</c> 4.x.</b> |
362 |
</li> |
363 |
<li> |
364 |
<c>-Os</c>: This level will optimize your code for size. It activates all |
365 |
<c>-O2</c> options that don't increase the size of the generated code. It's |
366 |
useful for machines that have extremely limited disk storage space and/or |
367 |
have CPUs with small cache sizes. |
368 |
</li> |
369 |
</ul> |
370 |
|
371 |
<p> |
372 |
As previously mentioned, <c>-O2</c> is the recommended optimization level. If |
373 |
package compilations error out, check to make sure that you aren't using |
374 |
<c>-O3</c>. As a fallback option, try setting your CFLAGS and CXXFLAGS to a |
375 |
lower optimization level, such as <c>-O1</c> or <c>-Os</c> and recompile the |
376 |
package. |
377 |
</p> |
378 |
|
379 |
</body> |
380 |
</section> |
381 |
<section> |
382 |
<title>-pipe</title> |
383 |
<body> |
384 |
|
385 |
<p> |
386 |
A fun, safe flag to use is <c>-pipe</c>. This flag actually has no effect on the |
387 |
generated code, but it makes the compilation process faster. It tells the |
388 |
compiler to use pipes instead of temporary files during the different stages of |
389 |
compilation. |
390 |
</p> |
391 |
|
392 |
</body> |
393 |
</section> |
394 |
<section> |
395 |
<title>-fomit-frame-pointer</title> |
396 |
<body> |
397 |
|
398 |
<p> |
399 |
This is a very common flag designed to reduce generated code size. It is turned |
400 |
on at all levels of <c>-O</c> (except <c>-O0</c>) on architectures where doing |
401 |
so does not interfere with debugging (such as x86-64), but you may need to |
402 |
activate it yourself by adding it to your flags. Though the GNU <c>gcc</c> |
403 |
manual does not specify all architectures it is turned on by using <c>-O</c>, |
404 |
you will need to explicity activate it on x86. However, using this flag will |
405 |
make debugging hard to impossible. |
406 |
</p> |
407 |
|
408 |
<p> |
409 |
In particular, it makes troubleshooting applications written in Java much |
410 |
harder, though Java is not the only code affected by using this flag. So while |
411 |
the flag can help, it can also make debugging harder. If you don't plan to do |
412 |
much debugging and haven't added any other debugging-related CFLAGS such as |
413 |
<c>-ggdb</c> (and you aren't installing packages with the <c>debug</c> USE |
414 |
flag), then try using <c>-fomit-frame-pointer</c>. |
415 |
</p> |
416 |
|
417 |
<impo> |
418 |
Do <e>not</e> combine <c>-fomit-frame-pointer</c> with the similar flag |
419 |
<c>-momit-leaf-frame-pointer</c>. Using the latter flag is discouraged, as |
420 |
<c>-fomit-frame-pointer</c> already does the job properly. Furthermore, |
421 |
<c>-momit-leaf-frame-pointer</c> has been shown to negatively impact code |
422 |
performance. |
423 |
<!-- |
424 |
source for this info: |
425 |
http://www.coyotegulch.com/products/acovea/aco5p4gcc40.html |
426 |
--> |
427 |
</impo> |
428 |
|
429 |
</body> |
430 |
</section> |
431 |
<section> |
432 |
<title>-msse, -msse2, -msse3, -mmmx, -m3dnow</title> |
433 |
<body> |
434 |
|
435 |
<p> |
436 |
These flags enable the <uri |
437 |
link="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</uri>, <uri |
438 |
link="http://en.wikipedia.org/wiki/SSE2">SSE2</uri>, <uri |
439 |
link="http://en.wikipedia.org/wiki/SSSE3">SSE3</uri>, <uri |
440 |
link="http://en.wikipedia.org/wiki/MMX">MMX</uri>, and <uri |
441 |
link="http://en.wikipedia.org/wiki/3dnow">3DNow!</uri> instruction sets for x86 |
442 |
and x86-64 architectures. These are useful primarily in multimedia, gaming, and |
443 |
other floating point-intensive computing tasks, though they also contain several |
444 |
other mathematical enhancements. These instruction sets are found in more modern |
445 |
CPUs. |
446 |
</p> |
447 |
|
448 |
<impo> |
449 |
Be sure to check if your CPU supports these by running <c>cat /proc/cpuinfo</c>. |
450 |
The output will include any supported additional instruction sets. Note that |
451 |
<b>pni</b> is just a different name for SSE3. |
452 |
</impo> |
453 |
|
454 |
<p> |
455 |
You normally don't need to add any of these flags to <path>/etc/make.conf</path> |
456 |
as long as you are using the correct <c>-march</c> (for example, |
457 |
<c>-march=nocona</c> implies <c>-msse3</c>). Some notable exceptions are newer |
458 |
VIA and AMD64 CPUs that support instructions not implied by <c>-march</c> (such |
459 |
as SSE3). For CPUs like these you'll need to enable additional flags where |
460 |
appropriate after checking the output of <c>cat /proc/cpuinfo</c>. |
461 |
</p> |
462 |
|
463 |
<note> |
464 |
You should check the <uri |
465 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options">list</uri> |
466 |
of x86 and x86-64-specific flags to see which of these instruction sets are |
467 |
activated by the proper CPU type flag. If an instruction is listed, then you |
468 |
don't need to specify it; it will be turned on by using the proper <c>-march</c> |
469 |
setting. |
470 |
</note> |
471 |
|
472 |
</body> |
473 |
</section> |
474 |
</chapter> |
475 |
|
476 |
<chapter> |
477 |
<title>Optimization FAQs</title> |
478 |
<section> |
479 |
<title>But I get better performance with -funroll-loops -fomg-optimize!</title> |
480 |
<body> |
481 |
|
482 |
<p> |
483 |
No, you only <e>think</e> you do because someone has convinced you that more |
484 |
flags are better. Aggressive flags will only hurt your applications when used |
485 |
system-wide. Even the <c>gcc</c> <uri |
486 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">manual</uri> |
487 |
says that using <c>-funroll-loops</c> and <c>-funroll-all-loops</c> makes code |
488 |
larger and run more slowly. Yet for some reason, these two flags, along with |
489 |
<c>-ffast-math</c>, <c>-fforce-mem</c>, <c>-fforce-addr</c>, and similar flags, |
490 |
continue to be very popular among ricers who want the biggest bragging rights. |
491 |
</p> |
492 |
|
493 |
<p> |
494 |
The truth of the matter is that they are dangerously aggressive flags. Take a |
495 |
good look around the <uri link="http://forums.gentoo.org">Gentoo Forums</uri> |
496 |
and <uri link="http://bugs.gentoo.org">Bugzilla</uri> to see what those flags |
497 |
do: nothing good! |
498 |
</p> |
499 |
|
500 |
<p> |
501 |
You don't need to use those flags globally in CFLAGS or CXXFLAGS. They will only |
502 |
hurt performance. They may make you sound like you have a high-performance |
503 |
system running on the bleeding edge, but they don't do anything but bloat your |
504 |
code and get your bugs marked INVALID or WONTFIX. |
505 |
</p> |
506 |
|
507 |
<p> |
508 |
You don't need dangerous flags like these. <b>Don't use them</b>. Stick to the |
509 |
basics: <c>-march</c>, <c>-O</c>, and <c>-pipe</c>. |
510 |
</p> |
511 |
|
512 |
</body> |
513 |
</section> |
514 |
<section> |
515 |
<title>What about -O levels higher than 3?</title> |
516 |
<body> |
517 |
|
518 |
<p> |
519 |
Some users boast about even better performance obtained by using <c>-O4</c>, |
520 |
<c>-O9</c>, and so on, but the reality is that <c>-O</c> levels higher than 3 |
521 |
have no effect. The compiler may accept CFLAGS like <c>-O4</c>, but it actually |
522 |
doesn't do anything with them. It only performs the optimizations for |
523 |
<c>-O3</c>, nothing more. |
524 |
</p> |
525 |
|
526 |
<p> |
527 |
Need more proof? Examine the <c>gcc</c> <uri |
528 |
link="http://gcc.gnu.org/viewcvs/trunk/gcc/opts.c?revision=124622&view=markup">source |
529 |
code</uri>: |
530 |
</p> |
531 |
|
532 |
<pre caption="-O source code"> |
533 |
if (optimize >= 3) |
534 |
{ |
535 |
flag_inline_functions = 1; |
536 |
flag_unswitch_loops = 1; |
537 |
flag_gcse_after_reload = 1; |
538 |
/* Allow even more virtual operators. */ |
539 |
set_param_value ("max-aliased-vops", 1000); |
540 |
set_param_value ("avg-aliased-vops", 3); |
541 |
} |
542 |
</pre> |
543 |
|
544 |
<p> |
545 |
As you can see, any value higher than 3 is treated as just <c>-O3</c>. |
546 |
</p> |
547 |
|
548 |
</body> |
549 |
</section> |
550 |
<section> |
551 |
<title>What about redundant flags?</title> |
552 |
<body> |
553 |
|
554 |
<p> |
555 |
Oftentimes CFLAGS and CXXFLAGS that are turned on at various <c>-O</c> levels |
556 |
are specified redundantly in <path>/etc/make.conf</path>. Sometimes this is done |
557 |
out of ignorance, but it is also done to avoid flag filtering or flag replacing. |
558 |
</p> |
559 |
|
560 |
<p> |
561 |
Flag filtering/replacing is done in many of the ebuilds in the Portage tree. It |
562 |
is usually done because packages fail to compile at certain <c>-O</c> levels, or |
563 |
when the source code is too sensitive for any additional flags to be used. The |
564 |
ebuild will either filter out some or all CFLAGS and CXXFLAGS, or it may replace |
565 |
<c>-O</c> with a different level. |
566 |
</p> |
567 |
|
568 |
<p> |
569 |
The <uri |
570 |
link="http://devmanual.gentoo.org/ebuild-writing/functions/src_compile/build-environment/index.html">Gentoo |
571 |
Developer Manual</uri> outlines where and how flag filtering/replacing works. |
572 |
</p> |
573 |
|
574 |
<p> |
575 |
It's possible to circumvent <c>-O</c> filtering by redundantly listing the flags |
576 |
for a certain level, such as <c>-O3</c>, by doing things like: |
577 |
</p> |
578 |
|
579 |
<pre caption="Specifying redundant CFLAGS"> |
580 |
CFLAGS="-O3 -finline-functions -funswitch-loops" |
581 |
</pre> |
582 |
|
583 |
<p> |
584 |
However, <brite>this is not a smart thing to do</brite>. CFLAGS are filtered for |
585 |
a reason! When flags are filtered, it means that it is unsafe to build a package |
586 |
with those flags. Clearly, it is <e>not</e> safe to compile your whole system |
587 |
with <c>-O3</c> if some of the flags turned on by that level will cause problems |
588 |
with certain packages. Therefore, you shouldn't try to "outsmart" the developers |
589 |
who maintain those packages. <e>Trust the developers</e>. Flag filtering and |
590 |
replacing is done for your benefit! If an ebuild specifies alternative flags, |
591 |
then don't try to get around it. |
592 |
</p> |
593 |
|
594 |
<p> |
595 |
You will most likely continue to run into problems when you build a package with |
596 |
unacceptable flags. When you report your troubles on Bugzilla, the flags you use |
597 |
in <path>/etc/make.conf</path> will be readily visible and you will be told to |
598 |
recompile without those flags. Save yourself the trouble of recompiling by not |
599 |
using redundant flags in the first place! Don't just automatically assume that |
600 |
you know better than the developers. |
601 |
</p> |
602 |
|
603 |
</body> |
604 |
</section> |
605 |
<section> |
606 |
<title>What about LDFLAGS?</title> |
607 |
<body> |
608 |
|
609 |
<p> |
610 |
Don't use them. You may have heard that they can speed up application load times |
611 |
or reduce binary size, but in reality, LDFLAGS are more likely to make your |
612 |
applications stop working. They are not supported, and you can expect to have |
613 |
your bugs closed and marked INVALID if you report errors with packages while |
614 |
using LDFLAGS. At the very least you will have to recompile all affected |
615 |
packages without setting LDFLAGS. |
616 |
</p> |
617 |
|
618 |
</body> |
619 |
</section> |
620 |
</chapter> |
621 |
|
622 |
<chapter> |
623 |
<title>Resources</title> |
624 |
<section> |
625 |
<body> |
626 |
|
627 |
<p> |
628 |
The following resources are of some help in further understanding optimization: |
629 |
</p> |
630 |
|
631 |
<ul> |
632 |
<li> |
633 |
The <uri link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/">GNU gcc |
634 |
manual</uri> |
635 |
</li> |
636 |
<li> |
637 |
Chapter 5 of the <uri link="/doc/en/handbook/">Gentoo Installation |
638 |
Handbooks</uri> |
639 |
</li> |
640 |
<li><c>man make.conf</c></li> |
641 |
<li><uri link="http://en.wikipedia.org">Wikipedia</uri></li> |
642 |
<li> |
643 |
<uri link="http://www.coyotegulch.com/products/acovea/">Acovea</uri>, a |
644 |
benchmarking and test suite that can be useful for determining how different |
645 |
compiler flags interact and affect generated code, though its code |
646 |
suggestions are not appropriate for system-wide use. It is available in |
647 |
Portage: <c>emerge acovea</c>. |
648 |
</li> |
649 |
<li>The <uri link="http://forums.gentoo.org">Gentoo Forums</uri></li> |
650 |
</ul> |
651 |
|
652 |
</body> |
653 |
</section> |
654 |
</chapter> |
655 |
</guide> |
656 |
|
657 |
|
658 |
|
659 |
-- |
660 |
gentoo-doc-cvs@g.o mailing list |