Gentoo Archives: gentoo-dev

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Re: Optimizing performance
Date: Thu, 15 Dec 2005 14:53:47
Message-Id: pan.2005.12.15.14.43.40.649371@cox.net
In Reply to: [gentoo-dev] Optimizing performance by Patrick Lauer
1 Patrick Lauer posted <1134650885.4634.57.camel@localhost>, excerpted
2 below, on Thu, 15 Dec 2005 13:48:05 +0100:
3
4 > I was wondering if there are any sane ways to optimize the performance
5 > of a Gentoo system.
6
7 This really belongs on user, or perhaps on the appropriate purposed list,
8 desktop or hardened or whatever, not on devel. That said, some
9 comments... (I can't resist. <g>)
10
11 > Overoptimization (the well known "-O9 -fomgomg" CFLAGS etc.) tends to
12 > make things unstable, which is of course not what we want. The "easy"
13 > way out would be buying faster hardware, but that is usually not an
14 > option ;-)
15 >
16 > So ... what can be done to get the stable maximum out of your hardware?
17 >
18 > In my experience (x86 centric - do other arches have different
19 > "problems"?) the following is stable, but not necessarily the optimum:
20
21 The general rules are the same, but there are architectural differences
22 that often change the details. I /think/ it was MIPS that has extremely
23 slow i/o (I saw that mentioned in the split-kde-ebuilds debate, they said
24 it could cause compile times to double -- a big thing for something as big
25 as KDE). x86 (32-bit) has a relatively small number of CPU registers,
26 compared to most other archs (amd64 in 64-bit mode increased the number
27 dramatically, tho it's the same for 32-bit mode for compatibility
28 reasons), and this has a big effect on register use strategy.
29
30 That said, in the general case, the -march switch normally chooses pretty
31 good defaults for the target arch. Modifying them a whole lot from that,
32 other than to cover special cases, or with the general -Ox optimization
33 switches, is therefore often counterproductive and/or problematic.
34
35 > - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on
36 > average the best, -O3 is often slower and can cause bugs
37
38 A lot of folks don't realize the effect of cache memory on optimizations.
39 I'll be brief here, but particularly for things like the kernel that stay
40 in memory, -Os can at times work wonders, because it means more of the
41 working set stays in a cache closer to the CPU, and the additional speed
42 in retrieving that code far outweighs the compromises made to
43 optimizations to shrink it to size. Conversely, media streaming or
44 encoding apps are constantly throwing out old data and fetching new data,
45 and the optimizations are often more effective for them, so they work
46 better with -O2 or even -O3.
47
48 There have been occasional problems with -Os, generally because it isn't
49 used as much and gets less testing, so earlier in a gcc cycle series.
50 However, I run -Os here (amd64) by default, and haven't seen any issues
51 that went away if I reverted to -O2, over the couple years I've been
52 running Gentoo. (Actually, that has been the case, even when I've edited
53 ebuilds to remove their stripflags calls and the like. Glibc and xorg
54 both stripflags including -Os. xorg seemed to benefit here from -Os after
55 I removed the stripflags call, while glibc worked but seemed slower. Note
56 that editing ebuilds means if it breaks, you get to keep the pieces!)
57
58 For gcc, -pipe doesn't improve program optimization, but will make
59 compiling faster. -fomit-frame-pointers makes smaller applications if
60 you aren't debugging. Those are both common enough to be fairly safe.
61 -frename-registers and -fweb may also be useful. (-fweb ceases to be so on
62 gcc4, however, because it is implemented differently.) -funit-at-a-time
63 (new to gcc-3.4, so don't try it with gcc-3.3) may also be worth looking
64 into, altho it's already enabled by -Os. These latter flags are less
65 commonly used, however, thus less well tested, and may therefore cause
66 very occasional problems. (-funit-at-a-time was known to do so early in
67 the 3.4 cycle, but those issues should have been long ago dealt with by
68 now.) I consider those /reasonably/ conservative, and it's what I run.
69 If I were running a server, however, I'd probably only run -O2 and the
70 first two (-pipe and -fomit-frame-pointers).
71
72 Do some research on -Os, in any case. It could be well worth your time.
73
74 > - check that all IDE disks use DMA mode, otherwise they are limited to
75 > ~16M/s with a huge CPU usage penalty. Sometimes (application-specific)
76 > increasing the readahead with hdparm gives a huge throughput boost.
77
78 This suggestion does involve hardware, but not a real heavy cost, and the
79 performance boost may be worth it. Consider running a RAID system. I
80 recently switched to RAID, a four-disk setup, raid1/mirrored for /boot,
81 raid6 (for redundancy) for most of the system, raid0/striped (for speed)
82 for /tmp, the portage dir, etc, stuff that was either temporary anyway, or
83 could easily be redownloaded. (Swap can also be striped, set equal
84 partitions on each disk and set equal priority for them in fstab.) I was
85 very pleasantly surprised at how much of a difference it made!
86
87 Cost, as I said, is reasonable, particularly if you have disks laying
88 around or can buy them used. Even buying say three 80-gig drives and
89 doing what I did only with a raid5 is reasonable, at the price of hard
90 drives these days. Unfortunately, if your board is still PATA, you can
91 only run a single disk per IDE channel or it bogs down, so you may need to
92 buy a PCI IDE expansion board which will add to the cost. If you have
93 onboard SATA and are buying new disks so can buy SATA anyway (my case),
94 that should do just fine, as SATA runs a dedicated channel to each
95 drive anyway. SCSI is a higher cost option, ruled out here, but SATA
96 works very nicely, certainly so for me.
97
98 > - kernel tweaks like setting swappiness or using a different I/O
99 > scheduler (CFQ, deadline) should help, but I'm not aware of any "real"
100 > benchmarks
101
102 Again, a reasonable new-hardware suggestion. When purchasing a new system
103 or considering an upgrade, more memory is often the most effective
104 optimization you can make (with the raid suggestion above very close to
105 it). Slower CPU and more memory, up to a gig or so, is almost always
106 better than the reverse, because hard drive access is WAYYY slower than
107 even cheap/slow memory. At a gig of memory, running with swap disabled is
108 actually a practical option, altho it might not be faster and there are a
109 certain memory zone management considerations. Usual X/KDE desktop usage
110 will run perhaps a third of a gig. That means half to 2/3 gig for cache,
111 which is "comfortable". Naturally, if you take the RAID suggestion above,
112 this one isn't quite as critical, because drive latency will be lower so
113 reliance on swap isn't as painful, and a big cache not nearly as critical
114 to good performance. A gig to two gig can still be useful, but the
115 cost/performance tradeoff isn't as good, and the money will likely be
116 better spent elsewhere.
117
118 Note that with a gig of memory and a striped swap, I have swappiness upped
119 to 100 to force the most unused app memory to swap, and I literally can't
120 tell when it starts swapping at all, except by watching the used swap
121 graph on ksysguard. None at all of the slowdowns I had previously
122 associated with swapping, back when I had a single drive and a half-gig of
123 memory.
124
125 > - using a "smarter" filesystem can dramatically improve performance at
126 > the potential cost of reliability. As data on FS reliability is hard to
127 > find from unbiased sources this becomes a religious issue ... migrating
128 > from ext3 to reiserfs makes "emerge sync" extremely much faster, but is
129 > reiserfs sustainable?
130
131 I run reiserfs here on everything. However, some don't consider it
132 extremely stable. I keep second-copy partitions as backups of stuff I
133 want to ensure is safe, for that reason and others (fat-finger deleting,
134 anyone?). Bottom line, reiserfs is certainly safe "enough", if you have a
135 decent backup system in place, and you follow it regularly, as you should.
136 I can't see how anyone can reasonably disagree with that, filesystem
137 religious zealousy or not.
138
139 In any case, note that you can simply redownload your portage tree anyway,
140 and with the speed and size benefits of reiserfs (size only if you don't
141 have notail in your config), even the ones least likely to trust the
142 integrity of reiserfs should see the benefit of putting your portage tree
143 on it. /tmp and/or /var/tmp may equally benefit, for the same reasons. An
144 exception might be if you regularly put huge files (700 meg CD and
145 multi-gig DVD images to burn, would be one example) on the partition. In
146 that case, jfs or xfs (don't remember which, but one's optimized for large
147 files) might be preferable.
148
149 As I said, I run reiserfs for everything here, but I also have backup
150 images of stuff I know I want to keep.
151
152 > Are there any application-specific tweaks
153
154 As I mentioned, -O3 is often best for multimedia stuff,
155 encoders/decoders/streamers and the like, while -O2, or often, -Os, is
156 better for most things.
157
158
159 --
160 Duncan - List replies preferred. No HTML msgs.
161 "Every nonfree program has a lord, a master --
162 and if you use the program, he is your master." Richard Stallman in
163 http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
164
165
166 --
167 gentoo-dev@g.o mailing list

Replies

Subject Author
Re: [gentoo-dev] Re: Optimizing performance Patrick Lauer <patrick@g.o>