cancel
Showing results for 
Search instead for 
Did you mean: 

Possible Mount agent bug(s) in 5.0MP3

andyts
Level 2
We have installed SF HA 5.0MP3 on Sparc Solaris 10. There is a new service group with a single DiskGroup and a number of Mount resources. I have created it yesterday, and switched it back and forth several times, without any problem. Today, however, switch did not work. All filesystems failed to be unmounted. Upon investigation, I found the following in the engine_A.log 2009/05/08 09:45:46 VCS NOTICE V-16-1-10300 Initiating Offline of Resource hspnfs3_mnt_patches (Owner: unknown, Group: HSPNFS3_grp) on System hqub5503c 2009/05/08 09:45:46 VCS ERROR V-16-2-13064 (hqub5503c) Agent is calling clean for resource(hspnfs3_mnt_patches) because the resource is up even after offline completed. 2009/05/08 09:45:46 VCS ERROR V-16-2-13070 (hqub5503c) Resource(hspnfs3_mnt_patches) - clean not implemented. 2009/05/08 09:46:47 VCS ERROR V-16-2-13077 (hqub5503c) Agent is unable to offline resource(hspnfs3_mnt_patches). Administrative intervention may be required. (Same sequence of messages were created for every FS). Further, I found a document http://seer.entsupport.symantec.com/docs/313851.htm that describes the same situation. From this doc: "This is for cases where VCS service groups having DiskGroup resources configured with UnMountVolumes attribute set and the volumes are mounted outside of VCS control (this is not very common)." In my case, UnMountVolumes is NOT set, and all filesystems had been mounted using VCS as part of switching the service group. This leads me to think of two possible bugs. 1. Under certain conditions, VCS fails to unmount filesystems mounted using VCS. I do not know what these conditions are. 2. VCS fails to force unmounting the filesystems after calling "clean" procedure. This is because /opt/VRTSvcs/bin/Mount/clean script does not have umount option `-o mntunlock=VCS' which I think it should. Maybe this should be conditional, something like "if VxFSMountLock is set". As a workaround, I am going to disable VxFSMountLock in all Mount resources. If extra information/debugging is needed, please let me know. Thanks, Andy
3 REPLIES 3

Eric_Gao
Level 4
have you ever tried to manually unmount the file system?  what is the error message?

whenever online/clean failed to operate, they will dump the output do engine log,  while in your case, there is no output?  is the logfile was fully pasted here?

Just wondering what is the error message while agent tired to offline the file system.

andyts
Level 2
Manual mount/unmount were working fine.  There were no extra messages related to this filesystem.

I ran more tests trying to mount/unmount this and other filesystems in the same service group, and they all behaved the same way.  IState of the Mount resource would be in the "waiting to go online" (or offline) forever, and resource would never change state to FAULTED.  If I mount or unmount the filesystem manually, VCS recognized that.

The latest twist on this was that eventually, HAD crashed on both nodes simultaneously, for no obvious reason.  It  was restarted by HASHADOW, and all problems disappeared.  So maybe the problem was in the engine itself, but strangely, only Mount agent/resources were affected.

Regards,

Andy

Eric_Gao
Level 4
That makes sense.  It really sounds to me a had problem, but could have something to do with agent binaries as well.

Is the vcs software fully installed? 

I checked here in a 5.0MP3 environment,  you might compared the file size and timestamp to determine if some got weird.  There is no problem in my environment.

/opt/VRTSvcs/bin
bash #ls -ltr had*
-rwxr-x--- 1 root sys 5612716 Aug 30 2008 had
-rwxr-xr-x 1 root sys 1247184 Aug 30 2008 hadebug
-rwxr--r-- 1 root sys 1171944 Aug 30 2008 hadiscover
bash #


bash #pwd
/opt/VRTSvcs/bin/Mount
bash #ls -ltr
total 420
-rwxr--r-- 1 root sys 4735 Jun 11 2005 VCSSysArch.pm
-rwxr--r-- 1 root sys 35365 Jun 11 2005 VCSProc.pm
-rwxr--r-- 1 root sys 9404 Jun 11 2005 VcsDefines.pm
-rwxr--r-- 1 root sys 3250 Dec 31 2005 VCSMountNFS.pm
-rwxr--r-- 1 root sys 14215 Jan 31 2006 VCSSys.pm
-rwxr--r-- 1 root sys 20735 Jan 31 2006 nfshelper.pm
-rwxr--r-- 1 root sys 2174 Apr 21 2006 MountDiscovery.pl
-rwxr--r-- 1 root sys 2365 Sep 8 2006 info
-rwxr--r-- 1 root sys 6086 Apr 24 2008 offline
-rwxr--r-- 1 root sys 7345 May 31 2008 VCSMountRaw.pm
-rwxr--r-- 1 root sys 9197 May 31 2008 clean
-rwxr--r-- 1 root sys 7757 Jul 12 2008 online
-rwxr--r-- 1 root sys 62300 Aug 30 2008 MountAgent
-rwxr--r-- 1 root sys 12736 Aug 30 2008 MountDiscover.so
-rw-r--r-- 1 root sys 8885 Aug 30 2008 Mount.xml
drwxrwxr-x 2 root sys 512 Feb 3 17:37 actions
bash #