02-22-2011 12:38 PM
Hi,
We have one rac cluster with two nodes. On the slave node all of a suddent, application went down. As per engine_A.log, the vcs recognized the disk group resource as offline but actually all the volumes and disk group is accessible on both the nodes.
We are able to bring the application working only after restarting the service group ( offline and online). VRTSvcs and VRTSvxvm versions are as mentioned below. Please let us know what could be the problem for this issue.
engine_A.log update.
2011/02/20 11:04:14 VCS INFO V-16-1-10307 Resource oratruescp-vdg (Owner: unknown, Group: oratruescp-psg) is offline on HOST 2 (Not initiated by VCS
)
$ pkginfo -l VRTSvcs
PKGINST: VRTSvcs
NAME: Veritas Cluster Server by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.0
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Veritas Cluster Server by Symantec
PSTAMP: Veritas-5.0MP1-11/29/06-17:15:00
INSTDATE: May 29 2008 11:08
STATUS: completely installed
FILES: 160 installed pathnames
22 shared pathnames
2 linked files
45 directories
83 executables
142180 blocks used (approx)
$ pkginfo -l VRTSvxvm
PKGINST: VRTSvxvm
NAME: Binaries for VERITAS Volume Manager by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.0,REV=05.11.2006.17.55
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Virtual Disk Subsystem
PSTAMP: Veritas-5.0_MP1_RP3.2:2007-08-28
INSTDATE: May 29 2008 12:03
HOTLINE: http://support.veritas.com/phonesup/phonesup_ddProduct_.htm
EMAIL: support@veritas.com
STATUS: completely installed
FILES: 965 installed pathnames
30 shared pathnames
13 linked files
106 directories
407 executables
391560 blocks used (approx)
Solved! Go to Solution.
02-22-2011 01:57 PM
The 5.0 CVMVolDg agent does a "dd" read of the volumes specified by the CVMVolume attribute to determine the reource is online (as oppose to just seeing if diskgroup is imported like Diskgroup agent) and therefore if the read of any of these volumes fails then resource will fail. I have not seen "dd read" fail before, but I have seen it timeout - if this is issue you will see something in the engine_A.log like:
Monitor timed out (you will see this 4 times a 1 minute intervals, assuming default type attibutes)
Then I think you will see something like "Monitor timed out 4 times so as FaultOnMonitorTimeout=4, resource faulting)
Then you will see "Resource offline - Not initiated by VCS"
I have seen this happen when a backup kicked in and it effectively the performance so much that the dd's timed out - in particular if there are lots volume specified by the CVMVolume attribute as the aget doesn't have time to read all the volumes
If you have more than one volume specified in CVMVolume attribute, I would recommend changing this so it contains just one volume.
Mike
02-22-2011 01:57 PM
The 5.0 CVMVolDg agent does a "dd" read of the volumes specified by the CVMVolume attribute to determine the reource is online (as oppose to just seeing if diskgroup is imported like Diskgroup agent) and therefore if the read of any of these volumes fails then resource will fail. I have not seen "dd read" fail before, but I have seen it timeout - if this is issue you will see something in the engine_A.log like:
Monitor timed out (you will see this 4 times a 1 minute intervals, assuming default type attibutes)
Then I think you will see something like "Monitor timed out 4 times so as FaultOnMonitorTimeout=4, resource faulting)
Then you will see "Resource offline - Not initiated by VCS"
I have seen this happen when a backup kicked in and it effectively the performance so much that the dd's timed out - in particular if there are lots volume specified by the CVMVolume attribute as the aget doesn't have time to read all the volumes
If you have more than one volume specified in CVMVolume attribute, I would recommend changing this so it contains just one volume.
Mike
02-23-2011 09:50 AM
Hi Mike,
Thanks for immediate response. You pointed exactly the same problem faced by me. In my setup, all the shared volumes are mentioned in the CVMVolume attribute.
Not sure why they added all the volumes in the CVMVolume. Will see another setups also prior to removing the volumes from the CVMVolume.
02-28-2011 08:20 PM
Hi,
As explained by Mike, CVMVolDg agent does dd test to make sure shared volumes are active.
We normally suggest to add all the critical volumes to this list, thus any fault on critical volumes
are immediatly acted on.
We can increase the monitortimeout value of CVMVolDg resource if this issue is repeating.
Below commands will increase the values from 1 min to 2 mins.
# haconf -makerw
# hatype -modify CVMVolDg MonitorTimeout 120
# hatype -modify CVMVolDg MonitorInterval 120
# haconf -dump -makero
Regards
Srini
03-02-2011 04:31 AM
Hi Srini,
In our setup all the volumes are mentioned in the attribute and they are around 10TB in size. Do you still suggest just changing the monitor interval and Time out are sufficient ?
Mike,Srini,
Request you to share any document, which provides detailed description about the "DD on volumes" mentioned in the CVMVolume attribute.
Thanks in Advance.