05-05-2012 02:05 AM
Guys,
I'm just new with VCS and i am wondering what could be the problem. We are testing a failover procedure of a cluster server1 and server2. It seem like a configuration problem but i not sure what to change at all to make this work.
The scenario is we shutdown server1 and wait if it will fail in server2. But does not due to trying to release the mount point. Here's the error below.
2012/05/04 19:51:51 VCS NOTICE V-16-1-10322 System server1 (Node '0') changed state from RUNNING to LEAVING
2012/05/04 19:51:51 VCS NOTICE V-16-1-10300 Initiating Offline of Resource mgtip (Owner: unknown, Group: mpSG) on System serve1
2012/05/04 19:51:51 VCS NOTICE V-16-1-10300 Initiating Offline of Resource mpip (Owner: unknown, Group: mpSG) on System server1
2012/05/04 19:51:51 VCS NOTICE V-16-1-10300 Initiating Offline of Resource vol_mout (Owner: unknown, Group: mpSG) on System server1
2012/05/04 19:51:51 VCS NOTICE V-16-10031-5512 (server1) Mount:vol_mout:offline:Trying force umount with signal 15...
2012/05/04 19:51:51 VCS NOTICE V-16-10031-5512 (server) Mount:vol_mout:offline:Trying force umount with signal 9...
2012/05/04 19:51:52 VCS INFO V-16-1-10305 Resource mgtip (Owner: unknown, Group: mpSG) is offline on server1 (VCS initiated)
2012/05/04 19:51:52 VCS INFO V-16-1-10305 Resource mpip (Owner: unknown, Group: mpSG) is offline on server1 (VCS initiated)
2012/05/04 19:56:53 VCS INFO V-16-2-13003 (server1) Resource(vol_mout): Output of the timed out operation (offline)
umount: /opt/mwx: device is busy
umount: /opt/mwx: device is busy
/opt/mwx: 2239 2243 13679 14342c 14427c 14527c 14530c 14544c 14562c 28743c 31389 31508 31514 31530 31552 32123 32125
umount: /opt/mwx: device is busy
umount: /opt/mwx: device is busy
/opt/mwx: 2243 13679 14427c 14530c 14562c 28743c 31389 31514 31552
umount: /opt/mwx: device is busy
umount: /opt/mwx: device is busy
2012/05/04 19:56:53 VCS WARNING V-16-2-13011 (server1) Resource(nyx_mout): offline procedure did not complete within the expected time.
2012/05/04 19:56:53 VCS ERROR V-16-2-13063 (server1) Agent is calling clean for resource(vol_mout) because offline did not complete within the expected time.
2012/05/04 19:56:53 VCS NOTICE V-16-10031-5512 (server1) Mount:vol_mout:clean:Trying force umount with signal 9...
2012/05/04 19:57:45 VCS INFO V-16-1-10077 Received new cluster membership
05-05-2012 06:36 AM
The OS version, VCS version and main.cf snippet will help to understand the issue correctly. Also please enable debug logs for Mount with "hatype -modify Mount LogDbg -add 1 2 3 4 5" and provide the engine log and MountAgent log.
05-06-2012 05:51 AM
Hello,
I see two simple possibilties here:
1 . the application which is using those mount points is not getting a clean shutdown & that is why leaving processes on that mount point (check if your application offline script is working as expected)
2. even after the application is shutdown, process exist, a force unmount though should be able to clean up the mount points which is not happening ...
Can you give us OS version / VCS version details, also paste main.cf
There could be a possibility that mount agent is taking time to clean up, for a try you can increase the offlinetimeout value for Mount agent & test, you can do this by
hatype -modify Mount OfflineTimeout 600
Above can be a test where you can find a way for solution ....
Gaurav
05-15-2012 09:00 AM
As said before, the VCS version and OS versionn infromation will help, as well as the main.cf (located in /etc/VRTSvcs/conf/config).
Check the processes listed from the logs, as noted by these two lines:
/opt/mwx: 2239 2243 13679 14342c 14427c 14527c 14530c 14544c 14562c 28743c 31389 31508 31514 31530 31552 32123 32125
/opt/mwx: 2243 13679 14427c 14530c 14562c 28743c 31389 31514 31552
The attempt to offline the processes got rid of some of them, but there are still processes running. Check those processes and see if they need to be shut down within VCS.
05-16-2012 09:12 PM
In my environment I am using SFHA 5.1 SP1 on SuSE Linux 11 SP1. I was also facing the same kind of issue. I downloaded hotfix from below location and applied it. After applying the HF my issue got resolved.