Re: Stale NFS file handle Solaris 10 VCS 5

Vic_Engle · ‎02-17-2007

Hello List,

I'm having a problem where I have a 3 node cluster with several global filesystems mounted on each node. I have an NFS service group which can failover from one node to the other and everytime I failover I get "Stale NFS file handle" errors on the clients.

On the VCS servers the raw and block devices mounted on each node have the same major/minor numbers and vxio has the same number assigned in /etc/name_to_major on each node.

I am using the commands "file" and "ls -l" to see the major/minor numbers of the devices.

Can anyone point me in the right direction?

Thanks,
Vic

Gene_Henriksen · ‎02-17-2007

Did you do this:
To keep NFS daemons under VCS control
◆ Disable SMF for nfsd and mountd. svccfg delete -f svc:/network/nfs/server:default
◆ Disable SMF for nfsmapid. svccfg delete -f svc:/network/nfs/mapid:default

Also vxio and vxspec have to have the same numbers.

Are you using the NFSRestart resource?

Vic_Engle · ‎02-17-2007

Thanks for the assis Gene. I did delete the 2 nfs services when I did the initial setup and vxio along with vxspec have the same major assigned on each node. I am not using the nfs restarter resource. Would that just restart nfs on the existing node? Right now I just want my failover to be as transparent to clients as possible. Do I need the restarter resource for failover?

Thanks,
Vic

Gene_Henriksen · ‎02-17-2007

The NFSRestart handles the recovery of NFS record locks. If you configure it, it will keep record locks in a directory in shared storage and copy them onto the failover server and then restart nfs daemons so they recognize the locks held by clients.

In the "old" days, stale file handle was usually cause by the issues you have already addressed. The file handle is composed, in part, of the major and minor numbers and a variation in those would make the file handle unusable.

I assume you are failing over the IP address also. Make sure that the IP resource comes up after the Share so that they cannot attempt a reconnect before the Share is available.

Vic_Engle · ‎02-17-2007

Thanks again Gene. The problem appears to have been the IP address coming online before the share. So I linked the IP to the share and no more problems. I'll check out that restarter resource when I'm in the office but the link was all I needed for the testing I'm doing today.

Regards,
Vic

Gene_Henriksen · ‎02-17-2007

That is good. Glad I could help. The client was connecting before the share was available, so the file handle didn't relate to anything shared. Thanks for the points.

VOX

Stale NFS file handle Solaris 10 VCS 5