When IP resource faults , HA solution test case is...

Sridhar_sri · ‎04-29-2009

As i my earlier post has been marked as solution and unable to remove it , i am reposting it. Please delete this post.

Hi Gurus,
I have made a cluster setup with VCS HA 5.0 package. Created volumes with the help of hard disks in solaris servers. Configured the replication network with the help of VEA. Created cluster service group with Notifier and Cluster management console resource configured in it. Manually created 2 service groups called as App_Rep and App service group. Here App_Rep service group consits of resources which is used for monitoring the strength of the replication network. App service group is for mounting the volume and to bring up the process in that volume. To bring up the process i have created a customised agent which makes the actions like online, offline, clean and monitor with the help of their respective perl files. No problem with these resources, it acts as expected.

This is my main.cf file.As i dont find option to attach the file (even in zipped format), i am pasting here.

I have named the volumes as product 1 and product 2. I need these 2 volumes for my process 2 come up. Hence i am replicating both the volumes to the secondary server and created corresponding volumes and mount resources using cluster manager java console.

Here PRODUCTTypes.cf is the file created for my customised agent resource.

Here Productagent is the name i given for my customised product resource in Cluster manager java console

include "types.cf"
include "PRODUCTTypes.cf"
include "VVRTypes.cf"

cluster PRODUCT_HA_Cluster (
UserNames = { Administrator = JifCfeEffFfh }
ClusterAddress = "10.77.213.181"
Administrators = { Administrator }
)

system bundle-sunfirev490-2 (
)

system idu-test-sf440-1 (
)

group App (
SystemList = { bundle-sunfirev490-2 = 0, idu-test-sf440-1 = 1 }
AutoStartList = { bundle-sunfirev490-2, idu-test-sf440-1 }
)

IP App_Ip (
Device = ce0
Address = "10.77.213.181"
)

PRODUCT PRODUCTAgent (
Enabled = 0
)

Mount product1_Mount (
MountPoint = "<<Mount point1>>"
BlockDevice = "/dev/vx/dsk/datadg/product1"
FSType = vxfs
FsckOpt = "-y"
)

Mount product2_Mount (
MountPoint = "<<Mount point 2>>"
BlockDevice = "/dev/vx/dsk/datadg/product2 "
FSType = vxfs
FsckOpt = "-y"
)

Proxy App_NIC_proxy (
TargetResName = csgnic
)

RVGPrimary App_RVGPrimary (
RvgResourceName = App_RVG
AutoResync = 1
)

Volume product1 (
Volume = product1
DiskGroup = datadg
)

Volume product2 (
Volume = product2
DiskGroup = datadg
)

App_Ip requires App_NIC_proxy
PRODUCTAgent requires App_Ip
PRODUCTAgent requires product1_Mount
PRODUCTAgent requires product2_Mount
product1 requires App_RVGPrimary
product1_Mount requires product1
product2 requires App_RVGPrimary
product2_Mount requires product2

// resource dependency tree
//
// group App
// {
// PRODUCT PRODUCTAgent
// {
// Mount product1_Mount
// {
// Volume product1
// {
// RVGPrimary App_RVGPrimary
// }
// }
// Mount product2_Mount
// {
// Volume product2
// {
// RVGPrimary App_RVGPrimary
// }
// }
// IP App_Ip
// {
// Proxy App_NIC_proxy
// }
// }
// }

group App_Eval_IP (
SystemList = { idu-test-sf440-1 = 0, bundle-sunfirev490-2 = 1 }
AutoStartList = { idu-test-sf440-1, bundle-sunfirev490-2 }
)

IP App_Eval_IP_Resource (
Device = ce0
Address = "10.77.213.181"
)

// resource dependency tree
//
// group App_Eval_IP
// {
// IP App_Eval_IP_Resource
// }

group App_Rep (
SystemList = { bundle-sunfirev490-2 = 0, idu-test-sf440-1 = 1 }
AutoStartList = { bundle-sunfirev490-2, idu-test-sf440-1 }
)

IP App_Rep_IP (
Device = ce0
Address = "10.77.213.181"
)

Proxy App_Rep_Nic_Proxy (
TargetResName = csgnic
)

RVG App_RVG (
RVG = PRODUCT_HA_RVG
DiskGroup = datadg
)

App_RVG requires App_Rep_IP
App_Rep_IP requires App_Rep_Nic_Proxy

// resource dependency tree
//
// group App_Rep
// {
// RVG App_RVG
// {
// IP App_Rep_IP
// {
// Proxy App_Rep_Nic_Proxy
// }
// }
// }

group App_datadg (
SystemList = { bundle-sunfirev490-2 = 0, idu-test-sf440-1 = 1 }
Parallel = 1
AutoStartList = { bundle-sunfirev490-2, idu-test-sf440-1 }
)

DiskGroup datadg (
DiskGroup = datadg
)

// resource dependency tree
//
// group App_datadg
// {
// DiskGroup datadg
// }

group ClusterService (
SystemList = { bundle-sunfirev490-2 = 0, idu-test-sf440-1 = 1 }
AutoStartList = { bundle-sunfirev490-2, idu-test-sf440-1 }
OnlineRetryInterval = 120
)

IP webip (
Device = ce0
Address = "10.77.213.181"
NetMask = "255.255.255.192"
)

NIC csgnic (
Device = ce0
)

NotifierMngr ntfr (
SmtpServer = "<<SMTP server>>"
SmtpRecipients = { "<<emailID>>" = Error }
)

VRTSWebApp VCSweb (
Critical = 0
AppName = cmc
InstallDir = "/opt/VRTSweb/VERITAS"
TimeForOnline = 5
RestartLimit = 3
)

VCSweb requires webip
ntfr requires csgnic
webip requires csgnic

// resource dependency tree
//
// group ClusterService
// {
// VRTSWebApp VCSweb
// {
// IP webip
// {
// NIC csgnic
// }
// }
// NotifierMngr ntfr
// {
// NIC csgnic
// }
// }

This is my Customised agent file

type product (
static int OfflineTimeout = 1000
static int OnlineTimeout = 1200
static str ArgList[] = { NMSROOT, holdOff }
static int AMF{} = { Mode=3, MonitorInterval=180, OfflineMonitorInterval=300, RegisterRetryLimit=5 }
str NMSROOT = "<<localtion>>"
int holdOff = 1200
)

In this setup for my customised agent resource alone, i have modfied the OnlineTimeout attribute as 1200 secs . This is because the process in the monitoring list of my agent resource will take 20 minutes to come up. No problem with my agent in monitoring / Online activities. If in case my IP resource faults, my agent takes 20 mins to bring down all process. within this time, all resources / service group tries to come up in secondary server and fails as because it is still going down in primary server. I tried with setting offline timeout value as 1000 secs also, but still problem resides.

Are there any way to make all resources / service group to wait until my agent and its dependent resources in APP service group goes down in primary server.

Valuable comments are appreciated.

Thanks in advance,
Sri.

Gaurav_S · ‎04-29-2009

Hi Sri,

Did you checked the option to use pre-online triggers as mentioned before ?

Thanks

Gaurav

Sridhar_sri · ‎04-30-2009

Hi Gaurav,
I have responsed for ur reply and waiting for u to get back..

As i said earlier,

can u tell me what u exactly want me to look for here?there were no issues with my agent when taking the process ONLINE / OFFLINE etc.,

i have read the event trigger chapter in vcs user guide 5.0.there i could find preonline event trigger area.This is the description given there.

Indicates that when the HAD should call a user-defined script before
bringing a service group online in response to the hagrp -online
command or a fault.
If the trigger does not exist, VCS continues to bring the group online. If the
script returns 0 without an exit code, VCS runs the hagrp -online -
nopre command, with the -checkpartial option if appropriate.
If you do want to bring the group online, define the trigger to take no action.
This event trigger is configurable.

Can u tell me how can i use this exactly to solve my issue . My doubt is it prohibits the other service group to wait until my App service group completely goes offline ?can u point me some clues by looking over my code pasted above.

Hope the location for sample_triggers directory has been changed to %VCS_HOME%\bin\sample_triggers this new location for the latest VCS version. Correct me if i am wrong.I got this location reference from VCS user guide 5.0.

Ur help is much appreciated.

Thanks in advance,
Sri

VOX

When IP resource faults , HA solution test case is not working