Gentoo Archives: gentoo-server

From: Ramon van Alteren <ramon@××××××××××.nl>
To: gentoo-server@l.g.o
Subject: [gentoo-server] NFS share hangs, NFS server accessible
Date: Thu, 08 Dec 2005 18:06:29
Message-Id: 439875C7.3040204@vanalteren.nl
1 Hi,
2
3 I've been having problems with the NFS-mount on one of my loadbalancers
4 and I can't seem to find a good solutions,
5 so I thought I might ask here for any (debugging) tips.
6
7 Currently I mount our tools&scripts repository over NFS on all our servers.
8 I'm using the following mount options for that:
9 timeo=16,intr,lock,rsize=16384,wsize=16384,tcp
10
11 Originally these client-side mount options (apart from the tcp one) were not
12 present, which caused a lot of problems due to hanging nfs-shares and
13 uninterruptable processes. Hence the intr/timeo values.
14 With a hanging nfs mount I'm referring to an unresponsive mounted
15 filesystem which hangs any process trying to access it.
16 This includes tab-completion for bash etc.
17
18 Changing these nfs options has alleviated the problems over the cluster
19 immensly, where we used to have at least 2-5 servers hanging in limbo
20 every day, this has dropped considerably to maybe one per week.
21
22 With one exception though :-(
23
24 Our loadbalancer has troubles with this nfs-mount roughly every two or
25 three days.
26 The mount simply hangs. Because of the intr option I can actually kill
27 all processes using the filesystem and umount/mount the share. After
28 this procedure it will start to function normally.
29
30 I'm very certain it isn't a nfs-server problem as I have the same
31 filesystem mounted on approx. 80 servers and on those machines the
32 filesystem is accesible. Testing access to the nfs-server confirms this,
33 I can mount the same share on a different mount-point without any problems.
34
35 The relevant entry in /etc/exports on the nfs-server is:
36 /mnt/raid2/tools/ 10.10.0.0/24(rw,async,no_root_squash)
37
38 I can't find anything relevant in the logfiles when this hang occurs,
39 tips on specific things to look for would be appreciated.
40
41 One of the reasons that this problem is highly annoying is that I run a
42 script from our tools share to sync the load-balancer config between my
43 primary and secundary loadbalancer. The moment the share hangs, this
44 script obviously doesn't work either.
45 Ergo I run the risk of having my load-balancers out-of-sync which I
46 would prefer to avoid.
47
48 Maybe this script is causing the problem ?
49 It's run every 5 minutes from cron.
50 Or maybe the combination async with frequent access is causing the
51 problem ?
52
53 Googling turns up lots of info on nfs. I've read through a bunch of
54 howto's and performance advice.
55 Most of it is pretty out-dated and refers to 2.2 linux kernels :-(
56 Some of it had some good advice, but none of it solved the problem so far.
57
58 All servers run gentoo-linux, 2.6.x kernels and a recent install of
59 mount-binaries
60 * equery belongs $(which mount) =>sys-apps/util-linux-2.12r on client
61 * equery belongs `which rpc.nfsd` => net-fs/nfs-utils-1.0.6-r6 on the server
62
63 The following options are passed to the nfs daemons at start:
64 # Number of servers to be started up by default
65 RPCNFSDCOUNT=128
66
67 Any help, advice or pointers on where to look would be very appreciated.
68 Thanx,
69
70 Ramon
71
72 -
73 Change what you're saying,
74 Don't change what you said
75
76 The Eels
77
78 --
79 gentoo-server@g.o mailing list