Solved: HA solution is not working when IP resource faults...

Sridhar_sri · ‎04-26-2009

HI ,
I have configured my cluster (using VCS HA 5.0 package) as mentioned below.

main.cf file:

include "types.cf"
include "PRODUCTTypes.cf"
include "VVRTypes.cf"

cluster PRODUCT_HA_Cluster (
UserNames = { Administrator = JifCfeEffFfh }
ClusterAddress = "10.77.213.181"
Administrators = { Administrator }
)

system bundle-sunfirev490-2 (
)

system idu-test-sf440-1 (
)

group App (
SystemList = { bundle-sunfirev490-2 = 0, idu-test-sf440-1 = 1 }
AutoStartList = { bundle-sunfirev490-2, idu-test-sf440-1 }
)

IP App_Ip (
Device = ce0
Address = "10.77.213.181"
)

PRODUCT PRODUCTAgent (
Enabled = 0
)

Mount product1_Mount (
MountPoint = "<<Mount point1>>"
BlockDevice = "/dev/vx/dsk/datadg/product1"
FSType = vxfs
FsckOpt = "-y"
)

Mount product2_Mount (
MountPoint = "<<Mount point 2>>"
BlockDevice = "/dev/vx/dsk/datadg/product2 "
FSType = vxfs
FsckOpt = "-y"
)

Proxy App_NIC_proxy (
TargetResName = csgnic
)

RVGPrimary App_RVGPrimary (
RvgResourceName = App_RVG
AutoResync = 1
)

Volume product1 (
Volume = product1
DiskGroup = datadg
)

Volume product2 (
Volume = product2
DiskGroup = datadg
)

App_Ip requires App_NIC_proxy
PRODUCTAgent requires App_Ip
PRODUCTAgent requires product1_Mount
PRODUCTAgent requires product2_Mount
product1 requires App_RVGPrimary
product1_Mount requires product1
product2 requires App_RVGPrimary
product2_Mount requires product2

// resource dependency tree
//
// group App
// {
// PRODUCT PRODUCTAgent
// {
// Mount product1_Mount
// {
// Volume product1
// {
// RVGPrimary App_RVGPrimary
// }
// }
// Mount product2_Mount
// {
// Volume product2
// {
// RVGPrimary App_RVGPrimary
// }
// }
// IP App_Ip
// {
// Proxy App_NIC_proxy
// }
// }
// }

group App_Eval_IP (
SystemList = { idu-test-sf440-1 = 0, bundle-sunfirev490-2 = 1 }
AutoStartList = { idu-test-sf440-1, bundle-sunfirev490-2 }
)

IP App_Eval_IP_Resource (
Device = ce0
Address = "10.77.213.181"
)

// resource dependency tree
//
// group App_Eval_IP
// {
// IP App_Eval_IP_Resource
// }

group App_Rep (
SystemList = { bundle-sunfirev490-2 = 0, idu-test-sf440-1 = 1 }
AutoStartList = { bundle-sunfirev490-2, idu-test-sf440-1 }
)

IP App_Rep_IP (
Device = ce0
Address = "10.77.213.181"
)

Proxy App_Rep_Nic_Proxy (
TargetResName = csgnic
)

RVG App_RVG (
RVG = PRODUCT_HA_RVG
DiskGroup = datadg
)

App_RVG requires App_Rep_IP
App_Rep_IP requires App_Rep_Nic_Proxy

// resource dependency tree
//
// group App_Rep
// {
// RVG App_RVG
// {
// IP App_Rep_IP
// {
// Proxy App_Rep_Nic_Proxy
// }
// }
// }

group App_datadg (
SystemList = { bundle-sunfirev490-2 = 0, idu-test-sf440-1 = 1 }
Parallel = 1
AutoStartList = { bundle-sunfirev490-2, idu-test-sf440-1 }
)

DiskGroup datadg (
DiskGroup = datadg
)

// resource dependency tree
//
// group App_datadg
// {
// DiskGroup datadg
// }

group ClusterService (
SystemList = { bundle-sunfirev490-2 = 0, idu-test-sf440-1 = 1 }
AutoStartList = { bundle-sunfirev490-2, idu-test-sf440-1 }
OnlineRetryInterval = 120
)

IP webip (
Device = ce0
Address = "10.77.213.181"
NetMask = "255.255.255.192"
)

NIC csgnic (
Device = ce0
)

NotifierMngr ntfr (
SmtpServer = "<<SMTP server>>"
SmtpRecipients = { "<<emailID>>" = Error }
)

VRTSWebApp VCSweb (
Critical = 0
AppName = cmc
InstallDir = "/opt/VRTSweb/VERITAS"
TimeForOnline = 5
RestartLimit = 3
)

VCSweb requires webip
ntfr requires csgnic
webip requires csgnic

// resource dependency tree
//
// group ClusterService
// {
// VRTSWebApp VCSweb
// {
// IP webip
// {
// NIC csgnic
// }
// }
// NotifierMngr ntfr
// {
// NIC csgnic
// }
// }

producttypes.cf file:
type product (
static int OfflineTimeout = 1000
static int OnlineTimeout = 1200
static str ArgList[] = { NMSROOT, holdOff }
static int AMF{} = { Mode=3, MonitorInterval=180, OfflineMonitorInterval=300, RegisterRetryLimit=5 }
str NMSROOT = "<<localtion>>"
int holdOff = 1200
)

In this setup for my customised agent resource alone, i have modfied the OnlineTimeout attribute as 1200 secs . This is because the process in the monitoring list of my agent resource will take 20 minutes to come up. No problem with my agent in monitoring / Online activities. If in case my IP resource faults, my agent takes 20 mins to bring down all process. within this time all resource tries to come up in secondary server and fails as because it is still going down in primary server. I tried with setting offline timeout value as 1000 secs also, but no use.

Are there any way to make all resources / service group to wait until my agent and its dependent resource goes down in primary server

Here cluster service group alone created at the time of cluster installation itself. Rest of the resources & service groups are created manually by me using Veritas cluster manager java console.

Please provide ur valuable solutions at the earliest.

Thanks in advance,
Sri

Sridhar_sri · ‎04-27-2009

HI,

can u tell me what u exactly want me to look for here?

there were files like sample_vvr,sample_vxtf,sample_nfs in this directory.. in which way it will be useful to solve this .?

there were no issues with my agent when taking the process ONLINE / OFFLINE etc.,

Can u please clarify on this ?

Unknowingly solution has been tagged for this query , how to untag it gurus ?

With Regards,
Sri

View solution in original post

Gaurav_S · ‎04-27-2009

Hello,

You can use preonline triggers... check for sample_triggers directory located in /etc/VRTSvcs/conf directory....

Gaurav

Sridhar_sri · ‎04-27-2009

HI,

can u tell me what u exactly want me to look for here?

there were files like sample_vvr,sample_vxtf,sample_nfs in this directory.. in which way it will be useful to solve this .?

there were no issues with my agent when taking the process ONLINE / OFFLINE etc.,

Can u please clarify on this ?

Unknowingly solution has been tagged for this query , how to untag it gurus ?

With Regards,
Sri

Kimberley · ‎04-27-2009

Hi Sri,

The gurus are at a loss on this one... I investigated and it turns out that there is no way to 'untag' a Solution. You found a glitch in the system, and we've added it to the fix list. Thanks so much for pointing this out!

Much appreciated,
Kimberley

VOX

HA solution is not working when IP resource faults in the primary server