1 |
On Thu, 2005-12-15 at 13:48 +0100, Patrick Lauer wrote: |
2 |
> - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on |
3 |
> average the best, -O3 is often slower and can cause bugs |
4 |
|
5 |
-O2 -march=$your_cpu_family -pipe -fomit-frame-pointer |
6 |
|
7 |
-pipe |
8 |
Use pipes rather than temporary files for communication between |
9 |
the various stages of compilation. This fails to work on some |
10 |
systems where the assembler is unable to read from a pipe; but |
11 |
the GNU assembler has no trouble. |
12 |
|
13 |
-O also turns on -fomit-frame-pointer on machines where doing so does |
14 |
not interfere with debugging. |
15 |
|
16 |
(However, x86 is not one of these machines, so you can turn it on if you |
17 |
are not a developer doing debugging for a slight additional speed |
18 |
increase) |
19 |
|
20 |
-fomit-frame-pointer |
21 |
Don't keep the frame pointer in a register for functions that |
22 |
don't need one. This avoids the instructions to save, set up and |
23 |
restore frame pointers; it also makes an extra register |
24 |
available in many functions. |
25 |
|
26 |
> - don't do anything with ASFLAGS, LDFLAGS. This causes weird random |
27 |
> breakage (e.g. LDFLAGS="-O1" causes prelink to fail with "absurd" |
28 |
> errors) and doesn't give a noticeable performance boost |
29 |
|
30 |
Correct. |
31 |
|
32 |
Also, running prelink can improve speed at the cost of disk space. |
33 |
|
34 |
> - check that all IDE disks use DMA mode, otherwise they are limited to |
35 |
> ~16M/s with a huge CPU usage penalty. Sometimes (application-specific) |
36 |
> increasing the readahead with hdparm gives a huge throughput boost. |
37 |
|
38 |
I typically use the same hdparm settings as listed in the Handbook: |
39 |
|
40 |
disc0_args="-d1 -A1 -m16 -u1 -a64 -c1" |
41 |
cdrom0_args="-d1 -c1" |
42 |
|
43 |
> - kernel tweaks like preempt may increase the responsiveness of the |
44 |
> system, but often reduce throughput and may have unexpected sideeffects |
45 |
> like random audio stutter as well as random kernel crashes ;-) |
46 |
|
47 |
This is especially true on non-x86 architectures. |
48 |
|
49 |
> - kernel tweaks like setting swappiness or using a different I/O |
50 |
> scheduler (CFQ, deadline) should help, but I'm not aware of any "real" |
51 |
> benchmarks except microbenchmarks (can create 1M files 10% faster!!!!! - |
52 |
> yes, but how does it behave with a normal workload?) |
53 |
|
54 |
CFQ is much worse for a desktop system. I tend to like deadline for |
55 |
playing games. These can probably make a bit more difference than a new |
56 |
-fomg-itsofast-and-broken-math added to CFLAGS. |
57 |
|
58 |
> - using a "smarter" filesystem can dramatically improve performance at |
59 |
> the potential cost of reliability. As data on FS reliability is hard to |
60 |
> find from unbiased sources this becomes a religious issue ... migrating |
61 |
> from ext3 to reiserfs makes "emerge sync" extremely much faster, but is |
62 |
> reiserfs sustainable? |
63 |
|
64 |
Well, reiserfs 3 isn't so bad on architectures where it doesn't vomit |
65 |
all over itself immediately. Also, resierfs loses much of its luster if |
66 |
you're running ext3 with dir_index. There was a tip in the GWN about |
67 |
turning on dir_index on an already formatted file system. If formatting |
68 |
a new one, just use mkfs.ext2 -J -O dir_index /dev/$whatever to create |
69 |
your file system. |
70 |
|
71 |
> Are there any application-specific tweaks (e.g. "use the prefork MPM |
72 |
> with apache2")? What is known to break things, what has usually |
73 |
> beneficial behaviour? Are there any useful benchmarks that show the |
74 |
> performance difference between different settings? |
75 |
|
76 |
Well, turning on SBA and Fast Writes on Nvidia always helps. As for |
77 |
benchmarks, I think the issue is it depends entirely on usage. Having |
78 |
something that is 30% faster on paper isn't very useful if you never do |
79 |
it the way the benchmark does. I wish I had more numbers/examples here, |
80 |
but there isn't really much in the way of decent benchmarks published |
81 |
and readily available. Hopefully some other people will know of more of |
82 |
them than I do. |
83 |
|
84 |
-- |
85 |
Chris Gianelloni |
86 |
Release Engineering - Strategic Lead |
87 |
x86 Architecture Team |
88 |
Games - Developer |
89 |
Gentoo Linux |