1 |
Hi, |
2 |
|
3 |
I've been having problems with the NFS-mount on one of my loadbalancers |
4 |
and I can't seem to find a good solutions, |
5 |
so I thought I might ask here for any (debugging) tips. |
6 |
|
7 |
Currently I mount our tools&scripts repository over NFS on all our servers. |
8 |
I'm using the following mount options for that: |
9 |
timeo=16,intr,lock,rsize=16384,wsize=16384,tcp |
10 |
|
11 |
Originally these client-side mount options (apart from the tcp one) were not |
12 |
present, which caused a lot of problems due to hanging nfs-shares and |
13 |
uninterruptable processes. Hence the intr/timeo values. |
14 |
With a hanging nfs mount I'm referring to an unresponsive mounted |
15 |
filesystem which hangs any process trying to access it. |
16 |
This includes tab-completion for bash etc. |
17 |
|
18 |
Changing these nfs options has alleviated the problems over the cluster |
19 |
immensly, where we used to have at least 2-5 servers hanging in limbo |
20 |
every day, this has dropped considerably to maybe one per week. |
21 |
|
22 |
With one exception though :-( |
23 |
|
24 |
Our loadbalancer has troubles with this nfs-mount roughly every two or |
25 |
three days. |
26 |
The mount simply hangs. Because of the intr option I can actually kill |
27 |
all processes using the filesystem and umount/mount the share. After |
28 |
this procedure it will start to function normally. |
29 |
|
30 |
I'm very certain it isn't a nfs-server problem as I have the same |
31 |
filesystem mounted on approx. 80 servers and on those machines the |
32 |
filesystem is accesible. Testing access to the nfs-server confirms this, |
33 |
I can mount the same share on a different mount-point without any problems. |
34 |
|
35 |
The relevant entry in /etc/exports on the nfs-server is: |
36 |
/mnt/raid2/tools/ 10.10.0.0/24(rw,async,no_root_squash) |
37 |
|
38 |
I can't find anything relevant in the logfiles when this hang occurs, |
39 |
tips on specific things to look for would be appreciated. |
40 |
|
41 |
One of the reasons that this problem is highly annoying is that I run a |
42 |
script from our tools share to sync the load-balancer config between my |
43 |
primary and secundary loadbalancer. The moment the share hangs, this |
44 |
script obviously doesn't work either. |
45 |
Ergo I run the risk of having my load-balancers out-of-sync which I |
46 |
would prefer to avoid. |
47 |
|
48 |
Maybe this script is causing the problem ? |
49 |
It's run every 5 minutes from cron. |
50 |
Or maybe the combination async with frequent access is causing the |
51 |
problem ? |
52 |
|
53 |
Googling turns up lots of info on nfs. I've read through a bunch of |
54 |
howto's and performance advice. |
55 |
Most of it is pretty out-dated and refers to 2.2 linux kernels :-( |
56 |
Some of it had some good advice, but none of it solved the problem so far. |
57 |
|
58 |
All servers run gentoo-linux, 2.6.x kernels and a recent install of |
59 |
mount-binaries |
60 |
* equery belongs $(which mount) =>sys-apps/util-linux-2.12r on client |
61 |
* equery belongs `which rpc.nfsd` => net-fs/nfs-utils-1.0.6-r6 on the server |
62 |
|
63 |
The following options are passed to the nfs daemons at start: |
64 |
# Number of servers to be started up by default |
65 |
RPCNFSDCOUNT=128 |
66 |
|
67 |
Any help, advice or pointers on where to look would be very appreciated. |
68 |
Thanx, |
69 |
|
70 |
Ramon |
71 |
|
72 |
- |
73 |
Change what you're saying, |
74 |
Don't change what you said |
75 |
|
76 |
The Eels |
77 |
|
78 |
-- |
79 |
gentoo-server@g.o mailing list |