1 |
I just switched our old SuSE-based server to Gentoo (2.6.14-hardened-r7) |
2 |
and am experiencing some problems, the most annoying of which is |
3 |
abysmally bad NFS performance when the server is even moderately loaded: |
4 |
| msbethke ~ $ time (touch x; rm x) |
5 |
| |
6 |
| real 0m59.841s |
7 |
| user 0m0.008s |
8 |
| sys 0m0.036s |
9 |
This is on a client, with the server unpacking a 6 GB gzip file. |
10 |
The slowness is the same on SuSE and Gentoo based clients. The previous |
11 |
installation handled the same thing without any problems, which I'd |
12 |
certainly expect from a dual Xeon @3 GHz with $ GB RAM, a Compaq |
13 |
SmartArray 642 U320 hostadpater and some 200 GB in a RAID5, connected |
14 |
to the clients via GBit ethernet. |
15 |
|
16 |
To test whether only open/create/remove operations are affected, I tried |
17 |
dd: |
18 |
| msbethke ~ $ time dd if=/dev/zero of=test bs=1M count=100 |
19 |
| dd: closing output file `test': Input/output error |
20 |
| |
21 |
| real 0m50.500s |
22 |
| user 0m0.012s |
23 |
| sys 0m1.136s |
24 |
| msbethke ~ $ ll test |
25 |
| -rw------- 1 msbethke users 104857600 2006-04-19 18:17 test |
26 |
Definitely not good for GBit, but not so bad either considering it will |
27 |
have taken half a minute just to open that file. The file is complete |
28 |
despite the I/O error but the error is definitely related to the server |
29 |
load, it never happens normally (and I get 9-11s for the 100 MB). |
30 |
|
31 |
I have 16 nfsd processes running but the problem is there even if only a |
32 |
single client is active. nfsstat on the server shows a huge number of |
33 |
read operations (I never used it before---is that too much for a server |
34 |
that's been running under very moderate load from half a dozen clients |
35 |
that were doing mostly word processing and programming?) but otherwise |
36 |
it looks fine to me: |
37 |
| # nfsstat -s |
38 |
| Server rpc stats: |
39 |
| calls badcalls badauth badclnt xdrcall |
40 |
| 433459545 0 0 0 0 |
41 |
| [snip unused NFSv2] |
42 |
| Server nfs v3: |
43 |
| null getattr setattr lookup access readlink |
44 |
| 846 0% 6556617 1% 24798 0% 332257 0% 1258235 0% 855 0% |
45 |
| read write create mkdir symlink mknod |
46 |
| 424885404 98% 148929 0% 11172 0% 30 0% 137 0% 10 0% |
47 |
| remove rmdir rename link readdir readdirplus |
48 |
| 8882 0% 11 0% 6539 0% 2418 0% 889 0% 33264 0% |
49 |
| fsstat fsinfo pathconf commit |
50 |
| 2919 0% 1209 0% 0 0% 19881 0% |
51 |
|
52 |
On the client, however, I get some retransmissions and very strange |
53 |
read/write values compared to the server's. I thought of 32-bit overflow |
54 |
but the value is obviously longer, I can drive it beyond 2^32 on the |
55 |
server. Here's the client: |
56 |
| # nfsstat -c |
57 |
| Client rpc stats: |
58 |
| calls retrans authrefrsh |
59 |
| 2691313 1493 0 |
60 |
| [snip unused NFSv2] |
61 |
| Client nfs v3: |
62 |
| null getattr setattr lookup access readlink |
63 |
| 0 0% 2292223 85% 848 0% 76917 2% 260072 9% 96 0% |
64 |
| read write create mkdir symlink mknod |
65 |
| 3595 0% 48255 1% 930 0% 4 0% 5 0% 0 0% |
66 |
| remove rmdir rename link readdir readdirplus |
67 |
| 1076 0% 0 0% 1265 0% 292 0% 0 0% 3055 0% |
68 |
| fsstat fsinfo pathconf commit |
69 |
| 2067 0% 141 0% 0 0% 331 0% |
70 |
|
71 |
I noticed a few things about the setup: the SA 642 adapter still has a |
72 |
stoneage firmware, V1.30, but we never saw a need to upgrade as it |
73 |
worked nicely with the kernel 2.4.21 cciss driver. Any know issues with |
74 |
the 2.6 kernel with this one? I just flashed the latest version and will |
75 |
try rebooting tonight. |
76 |
Another one is that the kernel is still set to 250 Hz ticks, that was |
77 |
fine on the P4/HT test system where I built it. Would this really suck |
78 |
so badly on a real SMP machine? Anyway, 100 Hz should be fine for a |
79 |
server. |
80 |
And one parameter I haven't tried to tweak is the IO scheduler. I seem |
81 |
to remember a recommendation to use noop for RAID5 as the cylinder |
82 |
numbers are completely virtual anyway so the actual head scheduling |
83 |
should be left to the controller. Any opinions on this? |
84 |
|
85 |
cheers & TIA |
86 |
Matthias |
87 |
-- |
88 |
I prefer encrypted and signed messages. KeyID: FAC37665 |
89 |
Fingerprint: 8C16 3F0A A6FC DF0D 19B0 8DEF 48D9 1700 FAC3 7665 |