01-31-2011 06:18 AM
Hi everybody,
I'm running VERITAS Cluster Server 5.1.00.2 with clustered NFS (nfs4 enabled) on RedHat 5.6. When I mount an export with nfs4 from a client and start a resource failover during a filecopy to this share the copy fails with "Input/output error" message.
In /var/log/messages on the node where I failover to I see:
kernel: nfs4_cb: server 10.31.1.12 not responding, timed out
The mentioned IP is the one of the nfs-client. As I could see in tcpdump nfs4 uses callbacks and initiates the connection from nfs-server to client with the node ip. The node ip switches during the failover, so this might be the reason why the connection crashes.
Has anyone a working installation with nfs4 failover and can give me a hint how to avoid the described problem?
If you need futher information about the setup please let me know.
Regards
Markus
Solved! Go to Solution.
02-11-2011 08:37 AM
A few days ago I updated VCS to 5.1SP1. Since SP1 the NFSRestart resource has an additional value called "Lower" and the resource chain is a bit different. That made me hopefull that my problem had already been addressed. And in fact it is. The "new" NFSRestart resource does exactly what I figured out has to be done during a failover, which is restarting nfsd.
Thanks a lot for you help and advises.
01-31-2011 07:13 PM
Hi Markus,
Did you configured nfsrestart resources in your cluster.... If yes, can you post the main.cf to verify on dependency tree of resource..
In case you haven't configured nfsrestart resource, would be worth to have a look:
https://sort.symantec.com/public/documents/sfha/5.1sp1/linux/productguides/pdf/vcs_bundled_agents_51sp1_lin.pdf
check out page no. 133
Gaurav
02-01-2011 05:34 AM
Hi Gaurav,
I use the nfsrestart resource. As recomended I've one parallel group NFS:
group NFS (
SystemList = { vcs-1-node-1 = 0, vcs-1-node-2 = 1 }
Parallel = 1
AutoStartList = { vcs-1-node-2, vcs-1-node-1 }
)
NFS NFS_NFS (
Nproc = 64
NFSv4Support = 1
)
NIC NFS_NIC (
Device = bond0
Mii = 0
NetworkHosts = { "10.10.0.1" }
)
Phantom NFS_Phantom (
)
Share NFS_Share_root (
PathName = "/cluster/nfs"
Client = "10.10.0.0/24"
OtherClients = { "10.10.1.0/24" }
Options = "ro, fsid=0"
NFSRes = NFS_NFS
)
NFS_Share_root requires NFS_NFS
// resource dependency tree
//
// group NFS
// {
// NIC NFS_NIC
// Phantom NFS_Phantom
// Share NFS_Share_root
// {
// NFS NFS_NFS
// }
// }
and multiple NFS-Service Groups that do mounting and exporting via NFS. Here for example NFS-Service1:
group NFS-Service1 ( SystemList = { vcs-1-node-1 = 0, vcs-1-node-2 = 1 } AutoStartList = { vcs-1-node-2, vcs-1-node-1 } PreOnline @vcs-1-node-1 = 1 PreOnline @vcs-1-node-2 = 1 ) DiskGroup NFS-Service1_DiskGroup_DiskGroup_nfs1 ( DiskGroup = DiskGroup_nfs1 ) IP NFS-Service1_IP_10-10-0-12 ( Device = bond0 Address = "10.10.0.12" NetMask = "255.255.255.224" ) Mount NFS-Service1_Mount_Volume_cust_471102 ( MountPoint = "/cluster/nfs/471102" BlockDevice = "/dev/vx/dsk/DiskGroup_nfs1/Volume_cust_471102" FSType = vxfs FsckOpt = "-y" ) Mount NFS-Service1_Mount_Volume_dummy-nfs1 ( MountPoint = "/cluster/nfs/dummy-nfs1" BlockDevice = "/dev/vx/dsk/DiskGroup_nfs1/Volume_dummy-nfs1" FSType = vxfs FsckOpt = "-y" ) Mount NFS-Service1_Mount_Volume_lock ( MountPoint = "/cluster/nfs/lock-nfs1" BlockDevice = "/dev/vx/dsk/DiskGroup_nfs1/Volume_lock" FSType = vxfs FsckOpt = "-y" ) NFSRestart NFS-Service1_NFSRestart_NFSRestart ( NFSRes = NFS_NFS LocksPathName = "/cluster/nfs/lock-nfs1" NFSLockFailover = 1 ) Proxy NFS-Service1_Proxy_NFS ( TargetResName = NFS_NFS ) Proxy NFS-Service1_Proxy_NIC ( TargetResName = NFS_NIC ) Share NFS-Service1_Share_cust_471102-0 ( PathName = "/cluster/nfs/471102" OtherClients = { "10.10.1.184/29" } Options = "rw,no_root_squash,nohide" NFSRes = NFS_NFS ) Share NFS-Service1_Share_dummy-nfs1 ( PathName = "/cluster/nfs/dummy-nfs1" Client = "10.10.0.0/16" Options = "ro,no_root_squash,nohide" NFSRes = NFS_NFS ) Volume NFS-Service1_Volume_Volume_cust_471102 ( DiskGroup = DiskGroup_nfs1 Volume = Volume_cust_471102 ) Volume NFS-Service1_Volume_Volume_dummy-nfs1 ( DiskGroup = DiskGroup_nfs1 Volume = Volume_dummy-nfs1 ) Volume NFS-Service1_Volume_Volume_lock ( DiskGroup = DiskGroup_nfs1 Volume = Volume_lock ) requires group NFS online local firm NFS-Service1_IP_10-10-0-12 requires NFS-Service1_Proxy_NIC NFS-Service1_IP_10-10-0-12 requires NFS-Service1_Share_cust_471102-0 NFS-Service1_IP_10-10-0-12 requires NFS-Service1_Share_dummy-nfs1 NFS-Service1_Mount_Volume_cust_471102 requires NFS-Service1_Volume_Volume_cust_471102 NFS-Service1_Mount_Volume_dummy-nfs1 requires NFS-Service1_Volume_Volume_dummy-nfs1 NFS-Service1_Mount_Volume_lock requires NFS-Service1_Volume_Volume_lock NFS-Service1_NFSRestart_NFSRestart requires NFS-Service1_IP_10-10-0-12 NFS-Service1_NFSRestart_NFSRestart requires NFS-Service1_Mount_Volume_lock NFS-Service1_Share_cust_471102-0 requires NFS-Service1_Mount_Volume_cust_471102 NFS-Service1_Share_cust_471102-0 requires NFS-Service1_Proxy_NFS NFS-Service1_Share_dummy-nfs1 requires NFS-Service1_Mount_Volume_dummy-nfs1 NFS-Service1_Share_dummy-nfs1 requires NFS-Service1_Proxy_NFS NFS-Service1_Volume_Volume_cust_471102 requires NFS-Service1_DiskGroup_DiskGroup_nfs1 NFS-Service1_Volume_Volume_dummy-nfs1 requires NFS-Service1_DiskGroup_DiskGroup_nfs1 NFS-Service1_Volume_Volume_lock requires NFS-Service1_DiskGroup_DiskGroup_nfs1 // resource dependency tree // // group NFS-Service1 // { // NFSRestart NFS-Service1_NFSRestart_NFSRestart // { // Mount NFS-Service1_Mount_Volume_lock // { // Volume NFS-Service1_Volume_Volume_lock // { // DiskGroup NFS-Service1_DiskGroup_DiskGroup_nfs1 // } // } // IP NFS-Service1_IP_10-10-0-12 // { // Share NFS-Service1_Share_dummy-nfs1 // { // Mount NFS-Service1_Mount_Volume_dummy-nfs1 // { // Volume NFS-Service1_Volume_Volume_dummy-nfs1 // { // DiskGroup NFS-Service1_DiskGroup_DiskGroup_nfs1 // } // } // Proxy NFS-Service1_Proxy_NFS // } // Proxy NFS-Service1_Proxy_NIC // Share NFS-Service1_Share_cust_471102-0 // { // Mount NFS-Service1_Mount_Volume_cust_471102 // { // Volume NFS-Service1_Volume_Volume_cust_471102 // { // DiskGroup NFS-Service1_DiskGroup_DiskGroup_nfs1 // } // } // Proxy NFS-Service1_Proxy_NFS // } // } // } // }
02-03-2011 05:47 AM
Also make sure you're minor and major numbers of the disks match between the systems.
02-03-2011 06:04 AM
Thanks for your replies. Today I recieved a hint from my frindly veritas trainer and he gave me the thought-provoking impulse. He also mentioned the major/minor numbers. So I checked on both systems and found them equal. But he also reminded me that minor numbers greater 255 might cause some trouble with nfs. By default vxvm uses minor numbers far bigger than 255, so you have to reminor. You can check with vxprint -g <DiskGroup> -vF %minor <Volume>
To reminor use: vxdg -g <DiskGroup> reminor 100
Although it is no exactly my problem I found this article helpfull for me:
http://www.symantec.com/business/support/index?page=content&id=TECH148225&key=15107&actp=LIST
After that the failover from one node to an other works perfectly also during a file transfer with nfs3 and nfs4.
02-08-2011 01:48 AM
After a few more tests I had to find that the minor number wasn't the root of my problem. It was rather a combination of circumstances that the failover worked after the minor number tuning.
Now I've disabled all resources in VCS and failoverd manually. The result is, that the failover works fine when the nfsd is restartet on the new node. Unfortunately VCS doesn't restart nfsd when the resourcegroup fails over.
Has anyone a hint how to achive the nfsd restart in VCS or is this a bug?
02-10-2011 03:08 AM
Hi Markus,
You need to copy the triggers nfs_postoffline (and nfs_preonline if you want NFS locks to failover) from /opt/VRTSvcs/bin/sample_triggers to /opt/VRTSvcs/bin/triggers.
In 5.0 these used to be in /opt/VRTSvcs/bin/triggers by default, but they run certain hares commands which are very inefficient for large configs (more than 50 service groups) and the triggers could hang the had daemon for several seconds. So in, I think 5.0RP2, they were moved as the majority of customers do not use NFS shares and a note was added to the RP notes to say you needed to copy them in place. It seems as this info is missing from 5.1 as I cannot find this in the bundled agents guide, release notes, or admin guide. All of the guides mention the triggers to some extent, but do not explity mention they need to be copied in place - this really needs to go bundled agents guide, so perhaps someone from Symantec who reads this, can action this.
Mike
02-11-2011 08:37 AM
A few days ago I updated VCS to 5.1SP1. Since SP1 the NFSRestart resource has an additional value called "Lower" and the resource chain is a bit different. That made me hopefull that my problem had already been addressed. And in fact it is. The "new" NFSRestart resource does exactly what I figured out has to be done during a failover, which is restarting nfsd.
Thanks a lot for you help and advises.
02-11-2011 08:44 AM
Did SP1 copy triggers into /opt/VRTSvcs/bin/triggers (see my previous comment) or had you already done this manually ( or is it working without these triggers being there).
Mike
02-11-2011 08:57 AM
Hi Mike,
as far as I can see SP1 comes without nfs triggers. But I tried with the triggers you mentioned in the previous version, but it didn't solve my problem. My current triggers dir looks like that:
ls -la /opt/VRTSvcs/bin/triggers
total 40
drwxrwxr-x 2 root sys 4096 Feb 9 16:46 .
drwxr-xr-x 52 root root 4096 Oct 5 04:30 ..
-rwxr----- 1 root root 2313 Oct 1 09:23 dump_tunables
-rwxr----- 1 root root 2319 Oct 1 09:23 globalcounter_not_updated
-rwxr--r-- 1 root sys 2295 Oct 5 04:30 postoffline
-rwxr--r-- 1 root sys 3574 Oct 5 04:30 postonline
-rwxr--r-- 1 root sys 7092 Oct 5 04:30 preonline
-rwxr----- 1 root root 7499 Oct 1 09:23 violation
My nfs-service group has preonline disabled:
# hagrp -display NFS-Service1 -attr PreOnline
#Group Attribute System Value
NFS-Service1 PreOnline vcs-101010-1-node-1 0
NFS-Service1 PreOnline vcs-101010-1-node-2 0
Since SP1 you have to use two NFSRestart resources. One of them with the "Lower" option. It is quiete well described in the agent notes.
Regards
Markus