1 |
Hi Jeroen, |
2 |
|
3 |
To make a long story short (I'll tell it anyway for the archives' sake), |
4 |
the problem seems solved so far. After recompiling the kernel with 100Hz |
5 |
ticks I get a slight increase in latency when doing the same things as |
6 |
yesterday on the server but it feels just as responsive as before. |
7 |
Lacking any benchmarks from before the change, that's the best measure I |
8 |
have. It might have been the firmware upgrade, but I doubt it. If I have |
9 |
to reboot any time soon, maybe I'll do another test to verify it. |
10 |
|
11 |
on Wednesday, 2006-04-19 at 20:42:09, you wrote: |
12 |
> >The slowness is the same on SuSE and Gentoo based clients. The previous |
13 |
> >installation handled the same thing without any problems, which I'd |
14 |
> >certainly expect from a dual Xeon @3 GHz with $ GB RAM, a Compaq |
15 |
> >SmartArray 642 U320 hostadpater and some 200 GB in a RAID5, connected |
16 |
> >to the clients via GBit ethernet. |
17 |
> > |
18 |
> RAID-5? Ouch. |
19 |
> RAID-10 offers a much better raw performance; since individual mirrors |
20 |
> are striped, you get at least 4/3 the seek performance of a 4-disk |
21 |
|
22 |
Yeah, but also at 2/3 the capacity. I know RAID5 isn't exactly |
23 |
top-notch, but as long as the controller takes care of the checksumming |
24 |
and distribution and the CPU doesn't have to, it's good enough for our |
25 |
site. That's mostly students doing their exercises, webbrowsing, some |
26 |
programming, usually all with small datasets. The biggest databases are |
27 |
about two gigs and the disks write at just above 40 MB/s. |
28 |
|
29 |
> >Definitely not good for GBit, but not so bad either considering it will |
30 |
> >have taken half a minute just to open that file. The file is complete |
31 |
> >despite the I/O error but the error is definitely related to the server |
32 |
> >load, it never happens normally (and I get 9-11s for the 100 MB). |
33 |
> > |
34 |
> LoadAvg of over 10 for I/O only ? That is a serious problem. |
35 |
> I repeat, that is a *problem*, not bad performance. |
36 |
|
37 |
Huh? No, 9 to 11 seconds, i.e. ~10 MB/s. I don't see a way how this |
38 |
benchmark could possibly bring my load up that much, after all it's just |
39 |
one process on the client and one on the server. |
40 |
|
41 |
> Since you say the box has 4GB of RAM, what happens when you do a linear |
42 |
> read of 2 or 3 GB of data, first uncached and then cached ? |
43 |
> That should not be affected by the I/O subsystem at all. |
44 |
|
45 |
Writing gives me said 40 MB/s, reading it back (dd to /dev/null in 1 MB |
46 |
chunks) is 32 MB/s uncached (*slower* than writes? Hm. controller |
47 |
caching maybe...) , ~850 MB/s cached. |
48 |
|
49 |
> Also, test your network speed by running netperf or ioperf between |
50 |
> client and server. |
51 |
> Get some baseline values for maximum performance first! |
52 |
|
53 |
I didn't test it as the only thing I changed was the server software and |
54 |
it was just fine before. And it *is* fine as long as the server disks |
55 |
aren't busy. Theoretically it could be that the Broadcom NIC driver |
56 |
started sucking donkey balls in kernel 2.6, but ssh and stuff are just |
57 |
fine and speedy (~30 MB/s for a single stream of zeroes). |
58 |
|
59 |
> And more bla I don't understand about NFS - what about the basics ? |
60 |
> Which versions are the server and client running ? |
61 |
> Since both could run either v2 or v3 and in-kernel or userspace, that's |
62 |
> 4 x 4 = 16 possible combinations right there - and that is assuming they |
63 |
> both run the *same* minor versions of the NFS software. |
64 |
|
65 |
It's v3, that's why I snipped the unused v2 portions of nfsstat output. |
66 |
Both server and client are in-kernel---the client could only be |
67 |
userspace via FUSE, right?---and the latest stable versions, |
68 |
nfs-utils-1.0.6-r6, gentoo-sources-2.6.15-r1 on the client and |
69 |
hardened-sources-2.6.14-r7 on the server. |
70 |
|
71 |
> >And one parameter I haven't tried to tweak is the IO scheduler. I seem |
72 |
> >to remember a recommendation to use noop for RAID5 as the cylinder |
73 |
> >numbers are completely virtual anyway so the actual head scheduling |
74 |
> >should be left to the controller. Any opinions on this? |
75 |
> > |
76 |
> I have never heard of the I/O scheduler being able to influence or get |
77 |
> data directly from disks. |
78 |
> In fact, as far as I know that is not even possible with IDE or SCSI, |
79 |
> which both have their own abstraction layers. |
80 |
> What you probably mean is the way the scheduler is allowed to interface |
81 |
> with the disk subsystem - which is solely determined by the disk |
82 |
> subsystem itself. |
83 |
|
84 |
OK, that was a bit misleading, I meant that even assuming things about |
85 |
the flat file the scheduler sees of the disk like that offsets in the |
86 |
file sort of linearly correspond to cylinders---which is what it does to |
87 |
implement things like the elevator algorithm---are virtually always |
88 |
right for simple drives but may not be for a RAID. |
89 |
|
90 |
> I'd recommend reading the specs for the raid controller - twice. |
91 |
> Also dive into the module source if you're up for it - it can reveal a |
92 |
> lot more than just plugging it in and adding disks. |
93 |
|
94 |
Ugh...I read the O'Reilly book on Linux device drivers so I know some |
95 |
basics of it (up to kernel 2.4 that is) but I'd rather not touch the |
96 |
200+ KB of the cciss source as my first real project, especially not |
97 |
when my only test hardware is the production server... |
98 |
|
99 |
cheers! |
100 |
Matthias |
101 |
-- |
102 |
I prefer encrypted and signed messages. KeyID: FAC37665 |
103 |
Fingerprint: 8C16 3F0A A6FC DF0D 19B0 8DEF 48D9 1700 FAC3 7665 |