In VCS Disk group resource showing as offline but every thing fine from Veritas level.
Hi,
We have one rac cluster with two nodes. On the slave node all of a suddent, application went down. As per engine_A.log, the vcs recognized the disk group resource as offline but actually all the volumes and disk group is accessible on both the nodes.
We are able to bring the application working only after restarting the service group ( offline and online). VRTSvcs and VRTSvxvm versions are as mentioned below. Please let us know what could be the problem for this issue.
engine_A.log update.
2011/02/20 11:04:14 VCS INFO V-16-1-10307 Resource oratruescp-vdg (Owner: unknown, Group: oratruescp-psg) is offline on HOST 2 (Not initiated by VCS
)
$ pkginfo -l VRTSvcs
PKGINST: VRTSvcs
NAME: Veritas Cluster Server by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.0
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Veritas Cluster Server by Symantec
PSTAMP: Veritas-5.0MP1-11/29/06-17:15:00
INSTDATE: May 29 2008 11:08
STATUS: completely installed
FILES: 160 installed pathnames
22 shared pathnames
2 linked files
45 directories
83 executables
142180 blocks used (approx)
$ pkginfo -l VRTSvxvm
PKGINST: VRTSvxvm
NAME: Binaries for VERITAS Volume Manager by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.0,REV=05.11.2006.17.55
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Virtual Disk Subsystem
PSTAMP: Veritas-5.0_MP1_RP3.2:2007-08-28
INSTDATE: May 29 2008 12:03
HOTLINE: http://support.veritas.com/phonesup/phonesup_ddProduct_.htm
EMAIL: support@veritas.com
STATUS: completely installed
FILES: 965 installed pathnames
30 shared pathnames
13 linked files
106 directories
407 executables
391560 blocks used (approx)
The 5.0 CVMVolDg agent does a "dd" read of the volumes specified by the CVMVolume attribute to determine the reource is online (as oppose to just seeing if diskgroup is imported like Diskgroup agent) and therefore if the read of any of these volumes fails then resource will fail. I have not seen "dd read" fail before, but I have seen it timeout - if this is issue you will see something in the engine_A.log like:
Monitor timed out (you will see this 4 times a 1 minute intervals, assuming default type attibutes)
Then I think you will see something like "Monitor timed out 4 times so as FaultOnMonitorTimeout=4, resource faulting)
Then you will see "Resource offline - Not initiated by VCS"
I have seen this happen when a backup kicked in and it effectively the performance so much that the dd's timed out - in particular if there are lots volume specified by the CVMVolume attribute as the aget doesn't have time to read all the volumes
If you have more than one volume specified in CVMVolume attribute, I would recommend changing this so it contains just one volume.
Mike