1 |
Patrick Lauer posted <1134650885.4634.57.camel@localhost>, excerpted |
2 |
below, on Thu, 15 Dec 2005 13:48:05 +0100: |
3 |
|
4 |
> I was wondering if there are any sane ways to optimize the performance |
5 |
> of a Gentoo system. |
6 |
|
7 |
This really belongs on user, or perhaps on the appropriate purposed list, |
8 |
desktop or hardened or whatever, not on devel. That said, some |
9 |
comments... (I can't resist. <g>) |
10 |
|
11 |
> Overoptimization (the well known "-O9 -fomgomg" CFLAGS etc.) tends to |
12 |
> make things unstable, which is of course not what we want. The "easy" |
13 |
> way out would be buying faster hardware, but that is usually not an |
14 |
> option ;-) |
15 |
> |
16 |
> So ... what can be done to get the stable maximum out of your hardware? |
17 |
> |
18 |
> In my experience (x86 centric - do other arches have different |
19 |
> "problems"?) the following is stable, but not necessarily the optimum: |
20 |
|
21 |
The general rules are the same, but there are architectural differences |
22 |
that often change the details. I /think/ it was MIPS that has extremely |
23 |
slow i/o (I saw that mentioned in the split-kde-ebuilds debate, they said |
24 |
it could cause compile times to double -- a big thing for something as big |
25 |
as KDE). x86 (32-bit) has a relatively small number of CPU registers, |
26 |
compared to most other archs (amd64 in 64-bit mode increased the number |
27 |
dramatically, tho it's the same for 32-bit mode for compatibility |
28 |
reasons), and this has a big effect on register use strategy. |
29 |
|
30 |
That said, in the general case, the -march switch normally chooses pretty |
31 |
good defaults for the target arch. Modifying them a whole lot from that, |
32 |
other than to cover special cases, or with the general -Ox optimization |
33 |
switches, is therefore often counterproductive and/or problematic. |
34 |
|
35 |
> - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on |
36 |
> average the best, -O3 is often slower and can cause bugs |
37 |
|
38 |
A lot of folks don't realize the effect of cache memory on optimizations. |
39 |
I'll be brief here, but particularly for things like the kernel that stay |
40 |
in memory, -Os can at times work wonders, because it means more of the |
41 |
working set stays in a cache closer to the CPU, and the additional speed |
42 |
in retrieving that code far outweighs the compromises made to |
43 |
optimizations to shrink it to size. Conversely, media streaming or |
44 |
encoding apps are constantly throwing out old data and fetching new data, |
45 |
and the optimizations are often more effective for them, so they work |
46 |
better with -O2 or even -O3. |
47 |
|
48 |
There have been occasional problems with -Os, generally because it isn't |
49 |
used as much and gets less testing, so earlier in a gcc cycle series. |
50 |
However, I run -Os here (amd64) by default, and haven't seen any issues |
51 |
that went away if I reverted to -O2, over the couple years I've been |
52 |
running Gentoo. (Actually, that has been the case, even when I've edited |
53 |
ebuilds to remove their stripflags calls and the like. Glibc and xorg |
54 |
both stripflags including -Os. xorg seemed to benefit here from -Os after |
55 |
I removed the stripflags call, while glibc worked but seemed slower. Note |
56 |
that editing ebuilds means if it breaks, you get to keep the pieces!) |
57 |
|
58 |
For gcc, -pipe doesn't improve program optimization, but will make |
59 |
compiling faster. -fomit-frame-pointers makes smaller applications if |
60 |
you aren't debugging. Those are both common enough to be fairly safe. |
61 |
-frename-registers and -fweb may also be useful. (-fweb ceases to be so on |
62 |
gcc4, however, because it is implemented differently.) -funit-at-a-time |
63 |
(new to gcc-3.4, so don't try it with gcc-3.3) may also be worth looking |
64 |
into, altho it's already enabled by -Os. These latter flags are less |
65 |
commonly used, however, thus less well tested, and may therefore cause |
66 |
very occasional problems. (-funit-at-a-time was known to do so early in |
67 |
the 3.4 cycle, but those issues should have been long ago dealt with by |
68 |
now.) I consider those /reasonably/ conservative, and it's what I run. |
69 |
If I were running a server, however, I'd probably only run -O2 and the |
70 |
first two (-pipe and -fomit-frame-pointers). |
71 |
|
72 |
Do some research on -Os, in any case. It could be well worth your time. |
73 |
|
74 |
> - check that all IDE disks use DMA mode, otherwise they are limited to |
75 |
> ~16M/s with a huge CPU usage penalty. Sometimes (application-specific) |
76 |
> increasing the readahead with hdparm gives a huge throughput boost. |
77 |
|
78 |
This suggestion does involve hardware, but not a real heavy cost, and the |
79 |
performance boost may be worth it. Consider running a RAID system. I |
80 |
recently switched to RAID, a four-disk setup, raid1/mirrored for /boot, |
81 |
raid6 (for redundancy) for most of the system, raid0/striped (for speed) |
82 |
for /tmp, the portage dir, etc, stuff that was either temporary anyway, or |
83 |
could easily be redownloaded. (Swap can also be striped, set equal |
84 |
partitions on each disk and set equal priority for them in fstab.) I was |
85 |
very pleasantly surprised at how much of a difference it made! |
86 |
|
87 |
Cost, as I said, is reasonable, particularly if you have disks laying |
88 |
around or can buy them used. Even buying say three 80-gig drives and |
89 |
doing what I did only with a raid5 is reasonable, at the price of hard |
90 |
drives these days. Unfortunately, if your board is still PATA, you can |
91 |
only run a single disk per IDE channel or it bogs down, so you may need to |
92 |
buy a PCI IDE expansion board which will add to the cost. If you have |
93 |
onboard SATA and are buying new disks so can buy SATA anyway (my case), |
94 |
that should do just fine, as SATA runs a dedicated channel to each |
95 |
drive anyway. SCSI is a higher cost option, ruled out here, but SATA |
96 |
works very nicely, certainly so for me. |
97 |
|
98 |
> - kernel tweaks like setting swappiness or using a different I/O |
99 |
> scheduler (CFQ, deadline) should help, but I'm not aware of any "real" |
100 |
> benchmarks |
101 |
|
102 |
Again, a reasonable new-hardware suggestion. When purchasing a new system |
103 |
or considering an upgrade, more memory is often the most effective |
104 |
optimization you can make (with the raid suggestion above very close to |
105 |
it). Slower CPU and more memory, up to a gig or so, is almost always |
106 |
better than the reverse, because hard drive access is WAYYY slower than |
107 |
even cheap/slow memory. At a gig of memory, running with swap disabled is |
108 |
actually a practical option, altho it might not be faster and there are a |
109 |
certain memory zone management considerations. Usual X/KDE desktop usage |
110 |
will run perhaps a third of a gig. That means half to 2/3 gig for cache, |
111 |
which is "comfortable". Naturally, if you take the RAID suggestion above, |
112 |
this one isn't quite as critical, because drive latency will be lower so |
113 |
reliance on swap isn't as painful, and a big cache not nearly as critical |
114 |
to good performance. A gig to two gig can still be useful, but the |
115 |
cost/performance tradeoff isn't as good, and the money will likely be |
116 |
better spent elsewhere. |
117 |
|
118 |
Note that with a gig of memory and a striped swap, I have swappiness upped |
119 |
to 100 to force the most unused app memory to swap, and I literally can't |
120 |
tell when it starts swapping at all, except by watching the used swap |
121 |
graph on ksysguard. None at all of the slowdowns I had previously |
122 |
associated with swapping, back when I had a single drive and a half-gig of |
123 |
memory. |
124 |
|
125 |
> - using a "smarter" filesystem can dramatically improve performance at |
126 |
> the potential cost of reliability. As data on FS reliability is hard to |
127 |
> find from unbiased sources this becomes a religious issue ... migrating |
128 |
> from ext3 to reiserfs makes "emerge sync" extremely much faster, but is |
129 |
> reiserfs sustainable? |
130 |
|
131 |
I run reiserfs here on everything. However, some don't consider it |
132 |
extremely stable. I keep second-copy partitions as backups of stuff I |
133 |
want to ensure is safe, for that reason and others (fat-finger deleting, |
134 |
anyone?). Bottom line, reiserfs is certainly safe "enough", if you have a |
135 |
decent backup system in place, and you follow it regularly, as you should. |
136 |
I can't see how anyone can reasonably disagree with that, filesystem |
137 |
religious zealousy or not. |
138 |
|
139 |
In any case, note that you can simply redownload your portage tree anyway, |
140 |
and with the speed and size benefits of reiserfs (size only if you don't |
141 |
have notail in your config), even the ones least likely to trust the |
142 |
integrity of reiserfs should see the benefit of putting your portage tree |
143 |
on it. /tmp and/or /var/tmp may equally benefit, for the same reasons. An |
144 |
exception might be if you regularly put huge files (700 meg CD and |
145 |
multi-gig DVD images to burn, would be one example) on the partition. In |
146 |
that case, jfs or xfs (don't remember which, but one's optimized for large |
147 |
files) might be preferable. |
148 |
|
149 |
As I said, I run reiserfs for everything here, but I also have backup |
150 |
images of stuff I know I want to keep. |
151 |
|
152 |
> Are there any application-specific tweaks |
153 |
|
154 |
As I mentioned, -O3 is often best for multimedia stuff, |
155 |
encoders/decoders/streamers and the like, while -O2, or often, -Os, is |
156 |
better for most things. |
157 |
|
158 |
|
159 |
-- |
160 |
Duncan - List replies preferred. No HTML msgs. |
161 |
"Every nonfree program has a lord, a master -- |
162 |
and if you use the program, he is your master." Richard Stallman in |
163 |
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html |
164 |
|
165 |
|
166 |
-- |
167 |
gentoo-dev@g.o mailing list |