Gentoo Archives: gentoo-server

From:	Ramon van Alteren <ramon@××××××××××.nl>
To:	gentoo-server@l.g.o
Subject:	[gentoo-server] NFS share hangs, NFS server accessible
Date:	Thu, 08 Dec 2005 18:06:29
Message-Id:	`439875C7.3040204@vanalteren.nl`

1	Hi,
2
3	I've been having problems with the NFS-mount on one of my loadbalancers
4	and I can't seem to find a good solutions,
5	so I thought I might ask here for any (debugging) tips.
6
7	Currently I mount our tools&scripts repository over NFS on all our servers.
8	I'm using the following mount options for that:
9	timeo=16,intr,lock,rsize=16384,wsize=16384,tcp
10
11	Originally these client-side mount options (apart from the tcp one) were not
12	present, which caused a lot of problems due to hanging nfs-shares and
13	uninterruptable processes. Hence the intr/timeo values.
14	With a hanging nfs mount I'm referring to an unresponsive mounted
15	filesystem which hangs any process trying to access it.
16	This includes tab-completion for bash etc.
17
18	Changing these nfs options has alleviated the problems over the cluster
19	immensly, where we used to have at least 2-5 servers hanging in limbo
20	every day, this has dropped considerably to maybe one per week.
21
22	With one exception though :-(
23
24	Our loadbalancer has troubles with this nfs-mount roughly every two or
25	three days.
26	The mount simply hangs. Because of the intr option I can actually kill
27	all processes using the filesystem and umount/mount the share. After
28	this procedure it will start to function normally.
29
30	I'm very certain it isn't a nfs-server problem as I have the same
31	filesystem mounted on approx. 80 servers and on those machines the
32	filesystem is accesible. Testing access to the nfs-server confirms this,
33	I can mount the same share on a different mount-point without any problems.
34
35	The relevant entry in /etc/exports on the nfs-server is:
36	/mnt/raid2/tools/ 10.10.0.0/24(rw,async,no_root_squash)
37
38	I can't find anything relevant in the logfiles when this hang occurs,
39	tips on specific things to look for would be appreciated.
40
41	One of the reasons that this problem is highly annoying is that I run a
42	script from our tools share to sync the load-balancer config between my
43	primary and secundary loadbalancer. The moment the share hangs, this
44	script obviously doesn't work either.
45	Ergo I run the risk of having my load-balancers out-of-sync which I
46	would prefer to avoid.
47
48	Maybe this script is causing the problem ?
49	It's run every 5 minutes from cron.
50	Or maybe the combination async with frequent access is causing the
51	problem ?
52
53	Googling turns up lots of info on nfs. I've read through a bunch of
54	howto's and performance advice.
55	Most of it is pretty out-dated and refers to 2.2 linux kernels :-(
56	Some of it had some good advice, but none of it solved the problem so far.
57
58	All servers run gentoo-linux, 2.6.x kernels and a recent install of
59	mount-binaries
60	* equery belongs $(which mount) =>sys-apps/util-linux-2.12r on client
61	* equery belongs `which rpc.nfsd` => net-fs/nfs-utils-1.0.6-r6 on the server
62
63	The following options are passed to the nfs daemons at start:
64	# Number of servers to be started up by default
65	RPCNFSDCOUNT=128
66
67	Any help, advice or pointers on where to look would be very appreciated.
68	Thanx,
69
70	Ramon
71
72	-
73	Change what you're saying,
74	Don't change what you said
75
76	The Eels
77
78	--
79	gentoo-server@g.o mailing list

Report Message

Find on MARC Find on Google Groups