1 |
nightmorph 07/06/27 06:04:17 |
2 |
|
3 |
Modified: metadoc.xml |
4 |
Added: co-guide.xml |
5 |
Log: |
6 |
added compilation optimization guide, bug 68282 (at last). |
7 |
|
8 |
Revision Changes Path |
9 |
1.184 xml/htdocs/doc/en/metadoc.xml |
10 |
|
11 |
file : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/metadoc.xml?rev=1.184&view=markup |
12 |
plain: http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/metadoc.xml?rev=1.184&content-type=text/plain |
13 |
diff : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/metadoc.xml?r1=1.183&r2=1.184 |
14 |
|
15 |
Index: metadoc.xml |
16 |
=================================================================== |
17 |
RCS file: /var/cvsroot/gentoo/xml/htdocs/doc/en/metadoc.xml,v |
18 |
retrieving revision 1.183 |
19 |
retrieving revision 1.184 |
20 |
diff -u -r1.183 -r1.184 |
21 |
--- metadoc.xml 3 Jun 2007 16:35:40 -0000 1.183 |
22 |
+++ metadoc.xml 27 Jun 2007 06:04:17 -0000 1.184 |
23 |
@@ -1,9 +1,9 @@ |
24 |
<?xml version="1.0" encoding="UTF-8"?> |
25 |
-<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/metadoc.xml,v 1.183 2007/06/03 16:35:40 neysx Exp $ --> |
26 |
+<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/metadoc.xml,v 1.184 2007/06/27 06:04:17 nightmorph Exp $ --> |
27 |
<!DOCTYPE metadoc SYSTEM "/dtd/metadoc.dtd"> |
28 |
|
29 |
<metadoc lang="en"> |
30 |
-<version>1.107</version> |
31 |
+<version>1.108</version> |
32 |
<members> |
33 |
<lead>neysx</lead> |
34 |
<member>cam</member> |
35 |
@@ -397,6 +397,7 @@ |
36 |
<file id="zsh">/doc/en/zsh.xml</file> |
37 |
<file id="change-chost">/doc/en/change-chost.xml</file> |
38 |
<file id="xfce-config">/doc/en/xfce-config.xml</file> |
39 |
+ <file id="co-guide">/doc/en/co-guide.xml</file> |
40 |
<file id="qa-autofailure">/proj/en/qa/autofailure.xml</file> |
41 |
<file id="qa-automagic">/proj/en/qa/automagic.xml</file> |
42 |
<file id="qa-backtraces">/proj/en/qa/backtraces.xml</file> |
43 |
@@ -802,6 +803,10 @@ |
44 |
<memberof>sysadmin_specific</memberof> |
45 |
<fileid>home-router-howto</fileid> |
46 |
</doc> |
47 |
+ <doc id="co-guide"> |
48 |
+ <memberof>sysadmin_specific</memberof> |
49 |
+ <fileid>co-guide</fileid> |
50 |
+ </doc> |
51 |
<doc id="gentoo-dev-handbook"> |
52 |
<memberof>gentoodev</memberof> |
53 |
<memberof>project_devrel</memberof> |
54 |
|
55 |
|
56 |
|
57 |
1.1 xml/htdocs/doc/en/co-guide.xml |
58 |
|
59 |
file : http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/co-guide.xml?rev=1.1&view=markup |
60 |
plain: http://sources.gentoo.org/viewcvs.py/gentoo/xml/htdocs/doc/en/co-guide.xml?rev=1.1&content-type=text/plain |
61 |
|
62 |
Index: co-guide.xml |
63 |
=================================================================== |
64 |
<?xml version='1.0' encoding='UTF-8'?> |
65 |
|
66 |
<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/co-guide.xml,v 1.1 2007/06/27 06:04:17 nightmorph Exp $ --> |
67 |
|
68 |
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> |
69 |
|
70 |
<guide link="/doc/en/co-guide.xml"> |
71 |
|
72 |
<title>Compilation Optimization Guide</title> |
73 |
|
74 |
<author title="Author"> |
75 |
<mail link="nightmorph@g.o">Joshua Saddler</mail> |
76 |
</author> |
77 |
|
78 |
<abstract> |
79 |
This guide provides an introduction to optimizing compiled code using safe, sane |
80 |
CFLAGS and CXXFLAGS. It also as describes the theory behind optimizing in |
81 |
general. |
82 |
</abstract> |
83 |
|
84 |
<!-- The content of this document is licensed under the CC-BY-SA license --> |
85 |
<!-- See http://creativecommons.org/licenses/by-sa/2.5 --> |
86 |
<license/> |
87 |
|
88 |
<version>1.0</version> |
89 |
<date>2007-06-26</date> |
90 |
|
91 |
<chapter> |
92 |
<title>Introduction</title> |
93 |
<section> |
94 |
<title>What are CFLAGS and CXXFLAGS?</title> |
95 |
<body> |
96 |
|
97 |
<p> |
98 |
CFLAGS and CXXFLAGS are environment variables that are used to tell the GNU |
99 |
Compiler Collection, <c>gcc</c>, what kinds of switches to use when compiling |
100 |
source code. CFLAGS are for code written in C, while CXXFLAGS are for code |
101 |
written in C++. |
102 |
</p> |
103 |
|
104 |
<p> |
105 |
They can be used to decrease the amount of debug messages for a program, |
106 |
increase error warning levels, and, of course, to optimize the code produced. |
107 |
The <uri |
108 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Invoking-GCC.html#Invoking-GCC">GNU |
109 |
gcc handbook</uri> maintains a complete list of available options and their |
110 |
purposes. |
111 |
</p> |
112 |
|
113 |
</body> |
114 |
</section> |
115 |
<section> |
116 |
<title>How are they used?</title> |
117 |
<body> |
118 |
|
119 |
<p> |
120 |
CFLAGS and CXXFLAGS can be used in two ways. First, they can be used |
121 |
per-program, by directly invoking <c>gcc</c> and then some bit of code you wish |
122 |
to compile. |
123 |
</p> |
124 |
|
125 |
<pre caption="Compiling a program directly"> |
126 |
$ <i>CFLAGS="-march=i686" gcc file.c</i> |
127 |
</pre> |
128 |
|
129 |
<p> |
130 |
However, this should not be done when installing packages found in the Portage |
131 |
tree. Instead, set your CFLAGS and CXXFLAGS in <path>/etc/make.conf</path>. This |
132 |
way all packages will be compiled using the options you specify. |
133 |
</p> |
134 |
|
135 |
<pre caption="CFLAGS in /etc/make.conf"> |
136 |
CFLAGS="-march=athlon64 -O2 -pipe" |
137 |
CXXFLAGS="${CFLAGS}" |
138 |
</pre> |
139 |
|
140 |
<p> |
141 |
As you can see, CXXFLAGS is set to use all the options present in CFLAGS. This |
142 |
is what you'll want almost without fail. You shouldn't ever need to specify |
143 |
additional options in CXXFLAGS. |
144 |
</p> |
145 |
|
146 |
<impo> |
147 |
Portage cannot use CFLAGS on a per-package basis, nor is there any supported |
148 |
method of forcing it to do so. The flags you set in <path>/etc/make.conf</path> |
149 |
will be used for <e>all</e> packages you install. |
150 |
</impo> |
151 |
|
152 |
</body> |
153 |
</section> |
154 |
<section> |
155 |
<title>Misconceptions</title> |
156 |
<body> |
157 |
|
158 |
<p> |
159 |
While CFLAGS and CXXFLAGS can be very effective means of getting source code to |
160 |
produce smaller and/or faster binaries, they can also impair the function of |
161 |
your code, bloat its size, slow down its execution time, or even cause |
162 |
compilation failures! |
163 |
</p> |
164 |
|
165 |
<p> |
166 |
CFLAGS are not a magic bullet; they will not automatically make your system run |
167 |
any faster or your binaries to take up less space on disk. Adding more and more |
168 |
flags in an attempt to optimize (or "rice") your system is a sure recipe for |
169 |
failure. There is a point at which you will reach diminishing returns. |
170 |
</p> |
171 |
|
172 |
<p> |
173 |
Despite the bragging you'll find on the internet, aggressive CFLAGS and CXXFLAGS |
174 |
are far more likely to harm your programs than do them any good. Keep in mind |
175 |
that the reason the flags exist in the first place is because they are designed |
176 |
to be used at specific places for specific purposes. Just because one particular |
177 |
CFLAG is good for one bit of code doesn't mean that it is suited to compiling |
178 |
everything you will ever install on your machine! |
179 |
</p> |
180 |
|
181 |
</body> |
182 |
</section> |
183 |
<section> |
184 |
<title>Ready?</title> |
185 |
<body> |
186 |
|
187 |
<p> |
188 |
Now that you're aware of some of the risks involved, let's take a look at some |
189 |
sane, safe optimizations for your computer. These will hold you in good stead |
190 |
and will endear you to developers the next time you report a problem on <uri |
191 |
link="http://bugs.gentoo.org">Bugzilla</uri>. (Developers will usually request |
192 |
that you recompile a package with minimal CFLAGS to see if the problem persists. |
193 |
Remember, aggressive flags can ruin code.) |
194 |
</p> |
195 |
|
196 |
</body> |
197 |
</section> |
198 |
</chapter> |
199 |
|
200 |
<chapter> |
201 |
<title>Optimizing</title> |
202 |
<section> |
203 |
<title>The basics</title> |
204 |
<body> |
205 |
|
206 |
<p> |
207 |
The goal behind using CFLAGS and CXXFLAGS is to create code tailor-made to your |
208 |
system; it should function perfectly while being lean and fast, if possible. |
209 |
Sometimes these conditions are mutually exclusive, so we'll stick with |
210 |
combinations known to work well. Ideally, they are the best available for any |
211 |
CPU architecture. We'll mention the aggressive flags later so you know what to |
212 |
look out for. We won't discuss every option listed on the <c>gcc</c> manual |
213 |
(there are hundreds), but we'll cover the basic, most common flags. |
214 |
</p> |
215 |
|
216 |
<note> |
217 |
Whenever you're not sure what a flag actually does, refer to the relevant |
218 |
chapter of the <uri |
219 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">gcc |
220 |
manual</uri>. If you're still stumped, try Google, or check out the <c>gcc</c> |
221 |
<uri link="http://gcc.gnu.org/lists.html">mailing lists</uri>. |
222 |
</note> |
223 |
|
224 |
</body> |
225 |
</section> |
226 |
<section> |
227 |
<title>-march</title> |
228 |
<body> |
229 |
|
230 |
<p> |
231 |
The first and most important option is <c>-march</c>. This tells the compiler |
232 |
what code it should produce for your processor <uri |
233 |
link="http://en.wikipedia.org/wiki/Microarchitecture">architecture</uri> (or |
234 |
<e>arch</e>); it says that it should produce code for a certain kind of CPU. |
235 |
Different CPUs have different capabilities, support different instruction sets, |
236 |
and have different ways of executing code. The <c>-march</c> flag will instruct |
237 |
the compiler to produce code specifically for your CPU, with all its |
238 |
capabilities, features, instruction sets, quirks, and so on. |
239 |
</p> |
240 |
|
241 |
<p> |
242 |
Even though the CHOST variable in <path>/etc/make.conf</path> specifies the |
243 |
general architecture used, <c>-march</c> should still be used so that programs |
244 |
can be optimized for your specific processor. |
245 |
</p> |
246 |
|
247 |
<p> |
248 |
What kind of CPU do you have? To find out, run the following command: |
249 |
</p> |
250 |
|
251 |
<pre caption="Examining CPU information"> |
252 |
$ <i>cat /proc/cpuinfo</i> |
253 |
</pre> |
254 |
|
255 |
<p> |
256 |
Now let's see <c>-march</c> in action. This example is for an older Pentium III |
257 |
chip: |
258 |
</p> |
259 |
|
260 |
<pre caption="/etc/make.conf: Pentium III"> |
261 |
CFLAGS="-march=pentium3" |
262 |
CXXFLAGS="${CFLAGS}" |
263 |
</pre> |
264 |
|
265 |
<p> |
266 |
Here's another one for a 64-bit Sparc CPU: |
267 |
</p> |
268 |
|
269 |
<pre caption="/etc/make.conf: Sparc"> |
270 |
CFLAGS="-march=ultrasparc" |
271 |
CXXFLAGS="${CFLAGS}" |
272 |
</pre> |
273 |
|
274 |
|
275 |
<p> |
276 |
Also available are the <c>-mcpu</c> and <c>-mtune</c> flags. Either of these |
277 |
should <e>only</e> be used when there is no available <c>-march</c> option. |
278 |
What's the difference between them? <c>-march</c> is much more specific about |
279 |
which processor features will be used when compiling code; it is a better |
280 |
choice. <c>-mcpu</c> will produce much more generic code less optimized for your |
281 |
machine. <c>-mtune</c> is even more generic than <c>-mcpu</c>. Whenever |
282 |
possible, use <c>-march</c>. For some less common architectures such as PowerPC |
283 |
and Alpha, <c>-mcpu</c> must be used. |
284 |
</p> |
285 |
|
286 |
<note> |
287 |
For more suggested <c>-march</c> settings, please read chapter 5 of the |
288 |
appropriate <uri link="/doc/en/handbook/index.xml">Gentoo Installation |
289 |
Handbook</uri> for your arch. Also, read the <c>gcc</c> manual's list of <uri |
290 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Submodel-Options.html#Submodel-Options">architecture-specific |
291 |
options</uri>, as well as more detailed explanations about the differences |
292 |
between <c>-march</c>, <c>-mcpu</c>, and <c>-mtune</c>. This is quite helpful |
293 |
for determining which <c>-march</c> setting you should use, especially since on |
294 |
some architectures, such as x86, <c>-mcpu</c> is deprecated and <c>-mtune</c> |
295 |
should be used instead. |
296 |
</note> |
297 |
|
298 |
</body> |
299 |
</section> |
300 |
<section> |
301 |
<title>-O</title> |
302 |
<body> |
303 |
|
304 |
<p> |
305 |
Next up is the <c>-O</c> variable. This controls the overall level of |
306 |
optimization. This makes the code compilation take somewhat more time, and can |
307 |
take up much more memory, especially as you increase the level of optimization. |
308 |
</p> |
309 |
|
310 |
<p> |
311 |
There are five <c>-O</c> settings: <c>-O0</c>, <c>-O1</c>, <c>-O2</c>, |
312 |
<c>-O3</c>, and <c>-Os</c>. You should use only one of them in |
313 |
<path>/etc/make.conf</path>. |
314 |
</p> |
315 |
|
316 |
<p> |
317 |
The with the exception of <c>-O0</c>, the <c>-O</c> settings each activate |
318 |
several additional flags, so be sure to read the gcc manual's chapter on <uri |
319 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">optimization |
320 |
options</uri> to learn which flags are activated at each <c>-O</c> level, as |
321 |
well as some explanations as to what they do. |
322 |
</p> |
323 |
|
324 |
<p> |
325 |
Let's examine each optimization level: |
326 |
</p> |
327 |
|
328 |
<ul> |
329 |
<li> |
330 |
<c>-O0</c>: This level (that's the letter "O" followed by a zero) turns off |
331 |
optimization entirely and is the default if no <c>-O</c> level is specified |
332 |
in CFLAGS or CXXFLAGS. Your code will not be optimized; it's not normally |
333 |
desired. |
334 |
</li> |
335 |
<li> |
336 |
<c>-O1</c>: This is the most basic optimization level. The compiler will try |
337 |
to produce faster, smaller code without taking much compilation time. |
338 |
It's pretty basic, but it should get the job done all the time. |
339 |
</li> |
340 |
<li> |
341 |
<c>-O2</c>: A step up from <c>-O1</c>. This is the <e>recommended</e> level |
342 |
of optimization unless you have special needs (such as <c>-Os</c>, as will |
343 |
be explained shortly). <c>-O2</c> will activate a few more flags in addition |
344 |
to the ones activated by <c>-O1</c>. With <c>-O2</c>, the compiler will |
345 |
attempt to increase code performance without compromising on size, and |
346 |
without taking too much compilation time. |
347 |
</li> |
348 |
<li> |
349 |
<c>-O3</c>: This is the highest level of optimization possible, and also the |
350 |
riskiest. It will take a longer time to compile your code with this option, |
351 |
and in fact it <e>should not be used system-wide with <c>gcc</c> 4.x</e>. |
352 |
The behavior of <c>gcc</c> has changed significantly since version 3.x. In |
353 |
3.x, <c>-O3</c> has been shown to lead to marginally faster execution times |
354 |
over <c>-O2</c>, but this is no longer the case with <c>gcc</c> 4.x. |
355 |
Compiling all your packages with <c>-O3</c> <e>will</e> result in larger |
356 |
binaries that require more memory, and will significantly increase the odds |
357 |
of compilation failure or unexpected program behavior (including errors). |
358 |
The downsides outweigh the benefits; remember the principle of diminishing |
359 |
returns. <b>Using <c>-O3</c> is not recommended for <c>gcc</c> 4.x.</b> |
360 |
</li> |
361 |
<li> |
362 |
<c>-Os</c>: This level will optimize your code for size. It activates all |
363 |
<c>-O2</c> options that don't increase the size of the generated code. It's |
364 |
useful for machines that have extremely limited disk storage space and/or |
365 |
have CPUs with small cache sizes. |
366 |
</li> |
367 |
</ul> |
368 |
|
369 |
<p> |
370 |
As previously mentioned, <c>-O2</c> is the recommended optimization level. If |
371 |
package compilations error out, check to make sure that you aren't using |
372 |
<c>-O3</c>. As a fallback option, try setting your CFLAGS and CXXFLAGS to a |
373 |
lower optimization level, such as <c>-O1</c> or <c>-Os</c> and recompile the |
374 |
package. |
375 |
</p> |
376 |
|
377 |
</body> |
378 |
</section> |
379 |
<section> |
380 |
<title>-pipe</title> |
381 |
<body> |
382 |
|
383 |
<p> |
384 |
A fun, safe flag to use is <c>-pipe</c>. This flag actually has no effect on the |
385 |
generated code, but it makes the compilation process faster. It tells the |
386 |
compiler to use pipes instead of temporary files during the different stages of |
387 |
compilation. |
388 |
</p> |
389 |
|
390 |
</body> |
391 |
</section> |
392 |
<section> |
393 |
<title>-fomit-frame-pointer</title> |
394 |
<body> |
395 |
|
396 |
<p> |
397 |
This is a very common flag designed to reduce generated code size. It is turned |
398 |
on at all levels of <c>-O</c> (except <c>-O0</c>) on architectures where doing |
399 |
so does not interfere with debugging (such as x86-64), but you may need to |
400 |
activate it yourself by adding it to your flags. Though the GNU <c>gcc</c> |
401 |
manual does not specify all architectures it is turned on by using <c>-O</c>, |
402 |
you will need to explicity activate it on x86. However, using this flag will |
403 |
make debugging hard to impossible. |
404 |
</p> |
405 |
|
406 |
<p> |
407 |
In particular, it makes troubleshooting applications written in Java much |
408 |
harder, though Java is not the only code affected by using this flag. So while |
409 |
the flag can help, it can also make debugging harder. If you don't plan to do |
410 |
much debugging and haven't added any other debugging-related CFLAGS such as |
411 |
<c>-ggdb</c> (and you aren't installing packages with the <c>debug</c> USE |
412 |
flag), then try using <c>-fomit-frame-pointer</c>. |
413 |
</p> |
414 |
|
415 |
<impo> |
416 |
Do <e>not</e> combine <c>-fomit-frame-pointer</c> with the similar flag |
417 |
<c>-momit-leaf-frame-pointer</c>. Using the latter flag is discouraged, as |
418 |
<c>-fomit-frame-pointer</c> already does the job properly. Furthermore, |
419 |
<c>-momit-leaf-frame-pointer</c> has been shown to negatively impact code |
420 |
performance. |
421 |
<!-- |
422 |
source for this info: |
423 |
http://www.coyotegulch.com/products/acovea/aco5p4gcc40.html |
424 |
--> |
425 |
</impo> |
426 |
|
427 |
</body> |
428 |
</section> |
429 |
<section> |
430 |
<title>-msse, -msse2, -msse3, -mmmx, -m3dnow</title> |
431 |
<body> |
432 |
|
433 |
<p> |
434 |
These flags enable the <uri |
435 |
link="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</uri>, <uri |
436 |
link="http://en.wikipedia.org/wiki/SSE2">SSE2</uri>, <uri |
437 |
link="http://en.wikipedia.org/wiki/SSSE3">SSE3</uri>, <uri |
438 |
link="http://en.wikipedia.org/wiki/MMX">MMX</uri>, and <uri |
439 |
link="http://en.wikipedia.org/wiki/3dnow">3DNow!</uri> instruction sets for x86 |
440 |
and x86-64 architectures. These are useful primarily in multimedia, gaming, and |
441 |
other floating point-intensive computing tasks, though they also contain several |
442 |
other mathematical enhancements. These instruction sets are found in more modern |
443 |
CPUs. |
444 |
</p> |
445 |
|
446 |
<impo> |
447 |
Be sure to check if your CPU supports these by running <c>cat /proc/cpuinfo</c>. |
448 |
The output will include any supported additional instruction sets. Note that |
449 |
<b>pni</b> is just a different name for SSE3. |
450 |
</impo> |
451 |
|
452 |
<p> |
453 |
You normally don't need to add any of these flags to <path>/etc/make.conf</path> |
454 |
as long as you are using the correct <c>-march</c> (for example, |
455 |
<c>-march=nocona</c> implies <c>-msse3</c>). Some notable exceptions are newer |
456 |
VIA and AMD64 CPUs that support instructions not implied by <c>-march</c> (such |
457 |
as SSE3). For CPUs like these you'll need to enable additional flags where |
458 |
appropriate after checking the output of <c>cat /proc/cpuinfo</c>. |
459 |
</p> |
460 |
|
461 |
<note> |
462 |
You should check the <uri |
463 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options">list</uri> |
464 |
of x86 and x86-64-specific flags to see which of these instruction sets are |
465 |
activated by the proper CPU type flag. If an instruction is listed, then you |
466 |
don't need to specify it; it will be turned on by using the proper <c>-march</c> |
467 |
setting. |
468 |
</note> |
469 |
|
470 |
</body> |
471 |
</section> |
472 |
</chapter> |
473 |
|
474 |
<chapter> |
475 |
<title>Optimization FAQs</title> |
476 |
<section> |
477 |
<title>But I get better performance with -funroll-loops -fomg-optimize!</title> |
478 |
<body> |
479 |
|
480 |
<p> |
481 |
No, you only <e>think</e> you do because someone has convinced you that more |
482 |
flags are better. Aggressive flags will only hurt your applications when used |
483 |
system-wide. Even the <c>gcc</c> <uri |
484 |
link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html#Optimize-Options">manual</uri> |
485 |
says that using <c>-funroll-loops</c> and <c>-funroll-all-loops</c> makes code |
486 |
larger and run more slowly. Yet for some reason, these two flags, along with |
487 |
<c>-ffast-math</c>, <c>-fforce-mem</c>, <c>-fforce-addr</c>, and similar flags, |
488 |
continue to be very popular among ricers who want the biggest bragging rights. |
489 |
</p> |
490 |
|
491 |
<p> |
492 |
The truth of the matter is that they are dangerously aggressive flags. Take a |
493 |
good look around the <uri link="http://forums.gentoo.org">Gentoo Forums</uri> |
494 |
and <uri link="http://bugs.gentoo.org">Bugzilla</uri> to see what those flags |
495 |
do: nothing good! |
496 |
</p> |
497 |
|
498 |
<p> |
499 |
You don't need to use those flags globally in CFLAGS or CXXFLAGS. They will only |
500 |
hurt performance. They may make you sound like you have a high-performance |
501 |
system running on the bleeding edge, but they don't do anything but bloat your |
502 |
code and get your bugs marked INVALID or WONTFIX. |
503 |
</p> |
504 |
|
505 |
<p> |
506 |
You don't need dangerous flags like these. <b>Don't use them</b>. Stick to the |
507 |
basics: <c>-march</c>, <c>-O</c>, and <c>-pipe</c>. |
508 |
</p> |
509 |
|
510 |
</body> |
511 |
</section> |
512 |
<section> |
513 |
<title>What about -O levels higher than 3?</title> |
514 |
<body> |
515 |
|
516 |
<p> |
517 |
Some users boast about even better performance obtained by using <c>-O4</c>, |
518 |
<c>-O9</c>, and so on, but the reality is that <c>-O</c> levels higher than 3 |
519 |
have no effect. The compiler may accept CFLAGS like <c>-O4</c>, but it actually |
520 |
doesn't do anything with them. It only performs the optimizations for |
521 |
<c>-O3</c>, nothing more. |
522 |
</p> |
523 |
|
524 |
<p> |
525 |
Need more proof? Examine the <c>gcc</c> <uri |
526 |
link="http://gcc.gnu.org/viewcvs/trunk/gcc/opts.c?revision=124622&view=markup">source |
527 |
code</uri>: |
528 |
</p> |
529 |
|
530 |
<pre caption="-O source code"> |
531 |
if (optimize >= 3) |
532 |
{ |
533 |
flag_inline_functions = 1; |
534 |
flag_unswitch_loops = 1; |
535 |
flag_gcse_after_reload = 1; |
536 |
/* Allow even more virtual operators. */ |
537 |
set_param_value ("max-aliased-vops", 1000); |
538 |
set_param_value ("avg-aliased-vops", 3); |
539 |
} |
540 |
</pre> |
541 |
|
542 |
<p> |
543 |
As you can see, any value higher than 3 is treated as just <c>-O3</c>. |
544 |
</p> |
545 |
|
546 |
</body> |
547 |
</section> |
548 |
<section> |
549 |
<title>What about redundant flags?</title> |
550 |
<body> |
551 |
|
552 |
<p> |
553 |
Oftentimes CFLAGS and CXXFLAGS that are turned on at various <c>-O</c> levels |
554 |
are specified redundantly in <path>/etc/make.conf</path>. Sometimes this is done |
555 |
out of ignorance, but it is also done to avoid flag filtering or flag replacing. |
556 |
</p> |
557 |
|
558 |
<p> |
559 |
Flag filtering/replacing is done in many of the ebuilds in the Portage tree. It |
560 |
is usually done because packages fail to compile at certain <c>-O</c> levels, or |
561 |
when the source code is too sensitive for any additional flags to be used. The |
562 |
ebuild will either filter out some or all CFLAGS and CXXFLAGS, or it may replace |
563 |
<c>-O</c> with a different level. |
564 |
</p> |
565 |
|
566 |
<p> |
567 |
The <uri |
568 |
link="http://devmanual.gentoo.org/ebuild-writing/functions/src_compile/build-environment/index.html">Gentoo |
569 |
Developer Manual</uri> outlines where and how flag filtering/replacing works. |
570 |
</p> |
571 |
|
572 |
<p> |
573 |
It's possible to circumvent <c>-O</c> filtering by redundantly listing the flags |
574 |
for a certain level, such as <c>-O3</c>, by doing things like: |
575 |
</p> |
576 |
|
577 |
<pre caption="Specifying redundant CFLAGS"> |
578 |
CFLAGS="-O3 -finline-functions -funswitch-loops" |
579 |
</pre> |
580 |
|
581 |
<p> |
582 |
However, <brite>this is not a smart thing to do</brite>. CFLAGS are filtered for |
583 |
a reason! When flags are filtered, it means that it is unsafe to build a package |
584 |
with those flags. Clearly, it is <e>not</e> safe to compile your whole system |
585 |
with <c>-O3</c> if some of the flags turned on by that level will cause problems |
586 |
with certain packages. Therefore, you shouldn't try to "outsmart" the developers |
587 |
who maintain those packages. <e>Trust the developers</e>. Flag filtering and |
588 |
replacing is done for your benefit! If an ebuild specifies alternative flags, |
589 |
then don't try to get around it. |
590 |
</p> |
591 |
|
592 |
<p> |
593 |
You will most likely continue to run into problems when you build a package with |
594 |
unacceptable flags. When you report your troubles on Bugzilla, the flags you use |
595 |
in <path>/etc/make.conf</path> will be readily visible and you will be told to |
596 |
recompile without those flags. Save yourself the trouble of recompiling by not |
597 |
using redundant flags in the first place! Don't just automatically assume that |
598 |
you know better than the developers. |
599 |
</p> |
600 |
|
601 |
</body> |
602 |
</section> |
603 |
<section> |
604 |
<title>What about LDFLAGS?</title> |
605 |
<body> |
606 |
|
607 |
<p> |
608 |
Don't use them. You may have heard that they can speed up application load times |
609 |
or reduce binary size, but in reality, LDFLAGS are more likely to make your |
610 |
applications stop working. They are not supported, and you can expect to have |
611 |
your bugs closed and marked INVALID if you report errors with packages while |
612 |
using LDFLAGS. At the very least you will have to recompile all affected |
613 |
packages without setting LDFLAGS. |
614 |
</p> |
615 |
|
616 |
</body> |
617 |
</section> |
618 |
</chapter> |
619 |
|
620 |
<chapter> |
621 |
<title>Resources</title> |
622 |
<section> |
623 |
<body> |
624 |
|
625 |
<p> |
626 |
The following resources are of some help in further understanding optimization: |
627 |
</p> |
628 |
|
629 |
<ul> |
630 |
<li> |
631 |
The <uri link="http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/">GNU gcc |
632 |
manual</uri> |
633 |
</li> |
634 |
<li> |
635 |
Chapter 5 of the <uri link="/doc/en/handbook/">Gentoo Installation |
636 |
Handbooks</uri> |
637 |
</li> |
638 |
<li><c>man make.conf</c></li> |
639 |
<li><uri link="http://en.wikipedia.org">Wikipedia</uri></li> |
640 |
<li> |
641 |
<uri link="http://www.coyotegulch.com/products/acovea/">Acovea</uri>, a |
642 |
benchmarking and test suite that can be useful for determining how different |
643 |
compiler flags interact and affect generated code, though its code |
644 |
suggestions are not appropriate for system-wide use. It is available in |
645 |
Portage: <c>emerge acovea</c>. |
646 |
</li> |
647 |
<li>The <uri link="http://forums.gentoo.org">Gentoo Forums</uri></li> |
648 |
</ul> |
649 |
|
650 |
</body> |
651 |
</section> |
652 |
</chapter> |
653 |
</guide> |
654 |
|
655 |
|
656 |
|
657 |
-- |
658 |
gentoo-doc-cvs@g.o mailing list |