02-09-2015 11:54 PM
The title of this thread is changed from "what's means of "restart mode"" to "Why the service group can't be token online automatically after fixing brain-split".
------------------------------------------------
We are running VCS 6.0.2 on RHEL 6.5, below is our cluster configuration:
Heartbeat link: eth3, eth4
Low-priority heartbeat link: not be enabled
Fencing: not be enabled
Cluster contains 2 servers: jarry-crf1, jarry-crf2.
Server groups:
TestGrp1 contains a "FileOnOff" resource and Parallel mode is enabled.
TestGrp2 contains a "FileOnOff" resource, depends on TestGrp1 and Failover mode is enabled.
Test steps:
1. Take TestGrp1 online on both server, take TestGrp2 online on server "jarry-crf2"
2. Stop both heartbeat links on server "jarry-crf1" by command "ifdown eth3; sleep 60; ifdown eth4"
3. Recover heartbeat links by command "ifup eth3; ifup eth4"
Then we found the "had" process is restared on server "jarry-crf2", and we found below logs in engine_A.log
2015/02/10 00:48:43 VCS NOTICE V-16-1-10433 Group TestGrp2 will not start automatically on System jarry-crf2 as the system is in restart mode.
2015/02/10 00:48:43 VCS NOTICE V-16-1-10433 Group TestGrp1 will not start automatically on System jarry-crf2 as the system is in restart mode.
2015/02/10 00:48:43 VCS NOTICE V-16-1-10445 Group TestGrp1 will not start automatically as atleast one system in the SystemList attribute of the group is in restart mode.
2015/02/10 00:48:47 VCS NOTICE V-16-1-10433 Group VCShmg will not start automatically on System jarry-crf2 as the system is in restart mode.
2015/02/10 00:48:47 VCS NOTICE V-16-1-10445 Group VCShmg will not start automatically as atleast one system in the SystemList attribute of the group is in restart mode.
My Questions:
1. Why "had" process is restarted after heartbeat being recovered.
2. What's means of "restart mode", how to bring service group leave "restart mode" and start automatically.
Thanks in advance!
Solved! Go to Solution.
02-12-2015 02:35 AM
Hello,
Thanks for detailed information .. this gives much clarity now ..
So your question is, once VCS is recovered, ideally it should start the resource & the group as a clean close of process was called before ..
I came across below technote which suggests that VCS will not bring resource online in the event of HAD getting restarted by "hashadow" process which makes me to believe that this is default behaviour of VCS.
http://www.symantec.com/docs/HOWTO79931
However couple of things I can suggest
1. See if you can mark that resource as "critical" & see if that makes any difference to group behavior (this is just a test ).
2. To solve this problem, as suggested before, you can use preonline triggers which can help you to run some scripts.
G
02-10-2015 12:13 AM
Hello,
I would say that you have shutdown heartbeat for 60 sec, the timeout for LLT & GAB to timeout is lesser than 60 sec (15 sec for LLT & another 15 for GAB). In the logs, you would have seen messages from LLT ticks timed out followed by GAB timing out.
That is the reason why you would have HAD process getting restarted... I presume these logs are pertaining to when HAD process was restarting ... The node which was good would generate messages because the group would have been landed in "autodisabled" state once LLT/GAB/HAD went down on node 2 which is why you get the above message.You can see this in "hastatus -sum" command output to confirm autodisabled flag.
Now question here is,
After waiting for sometime, do you still see group not coming online ? (does it continue to be in autodisabled state) OR after sometime group comes online as both nodes start to talk to each other.
What exactly is the objective of testing ? you want to see failover / dependency behaviours ?
Also a recommendation, in production clusters, it is always recommended to use IOFencing for data protection
G
02-11-2015 02:10 AM
Hi Gaurav
Thanks for your support!
There is neither coordination disk nor coordination server in our product deployment, so we want to test the VCS behavior when both heartbeat links are broken.
Today we simplified the test scenario, only one parallel group "TestGrp1" is introduced.
Testing steps are:
1. execute command "hastatus -sum", result is:
# hastatus -sum -- SYSTEM STATE -- System State Frozen A jarry-crf1 RUNNING 0 A jarry-crf2 RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B ClusterService jarry-crf1 Y N ONLINE B ClusterService jarry-crf2 Y N OFFLINE B TestGrp1 jarry-crf1 Y N ONLINE B TestGrp1 jarry-crf2 Y N ONLINE
2. On server "jarry-crf1", execute command "ifdown eth3; sleep 60; ifdown eth4; sleep 60;ifup eth3; ifup eth4".
3. On server "jarry-crf2", when the "HAD" process being restarted (after had shutdown and before had start), we deleted the flag file of FileOnOff (make it offline intentionally)
4. wait several minutes, execute command "hastatus -sum", result is:
# hastatus -sum -- SYSTEM STATE -- System State Frozen A jarry-crf1 RUNNING 0 A jarry-crf2 RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B ClusterService jarry-crf1 Y N ONLINE B ClusterService jarry-crf2 Y N OFFLINE B TestGrp1 jarry-crf1 Y N ONLINE B TestGrp1 jarry-crf2 Y N OFFLINE
The "TestGrp1" on "jarry-crf2" is still OFFLINE, we want to know why the TestGrp1 can't be token online automatically.
The reason of why deleting the flag file of FileOnOff intentionally when HAD being restarted is: our product deployment implemented "close" entry point, when the HAD process being exit, the "close" entry point was called, our application exited with HAD exiting, then after HAD being restarted, our application can not be token online automatically. Deleting flag file of FileOnOff is simulating our application's behavior.
02-11-2015 02:16 AM
The title of this thread is changed from "what's means of "restart mode"" to "Why the service group can't be token online automatically after fixing brain-split".
02-11-2015 04:15 AM
Hello,
Thanks for defining the case clearly ..
Well I would say VCS is still behaving as expected here. FileOnOff agent is a testing agent & has 1 required attribute. What I understand from your case is, the required attribute is defind inVCS config however you have deleted the flag file manually. That means VCS would have tried onlining the resource but it can't find the file as it was deleted manually. To confirm this theory, I am sure engine_A.log for above test case would indicate that VCS did try to bring the resource online however couldn't find the file.
To overcome this situation, I would say you would need to plan use of triggers. Especially "pre-online" trigger. In this pre-online trigger, you would need to define to create the test file before group can be onlined.
More info about preonline trigger can be found below
https://sort.symantec.com/public/documents/sfha/6.1/solaris/productguides/html/vcs_admin/ch14s03s09.htm
Hope this helps
G
02-11-2015 11:29 PM
Hello Gaurav
Maybe FileOnOff agent make this case confusing, let's introduct real scenario and try to describe my case more clear.
We have an application named "CRF" running one the server, we configured one service group "CrfGrp" with one resource for our application "CRF"
the configuration in main.cf is:
group CrfGrp ( SystemList = { jarry-crf1 = 0, jarry-crf2 = 1 } Parallel = 1 AutoStartList = { jarry-crf1, jarry-crf2 } ) CrfMonitor CrfRes ( )
And implements below script based entry points for CrfRes:
online: start "CRF" offline: stop "CRF" monitor: use "ps" command to check whether "CRF" is running clean: stop "CRF" close: stop "CRF"
Start testing:
1. execute command "hastatus -sum", result is:
# hastatus -sum -- SYSTEM STATE -- System State Frozen A jarry-crf1 RUNNING 0 A jarry-crf2 RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B ClusterService jarry-crf1 Y N ONLINE B ClusterService jarry-crf2 Y N OFFLINE B CrfGrp jarry-crf1 Y N ONLINE B CrfGrp jarry-crf2 Y N ONLINE
2. On server "jarry-crf1", execute command "ifdown eth3; sleep 60; ifdown eth4; sleep 60;ifup eth3; ifup eth4".
3. We can monitor that the "close" entry point of CrfRes is called at 01:59:33. Therefore, application "CRF" is stopped by VCS at this time point.
4. Several minutes later, HAD is restarted completely, the status of "CrfGrp" on "jarry-crf2" still is OFFLINE.
Above all, there are 3 phases:
before brain-split: CrfGrp is ONLINE on jarry-crf2
during brain-split: CrfGrp is ONLINE on jarry-crf2
brain-split fixed: CrfGrp is OFFLINE on jarry-crf2
Our question: why "CrfGrp" is not token online automatically on jarry-crf2?
related server group, resource status, logs listed below (after HAD being restarted completely):
# hastatus -sum -- SYSTEM STATE -- System State Frozen A jarry-crf1 RUNNING 0 A jarry-crf2 RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B ClusterService jarry-crf1 Y N ONLINE B ClusterService jarry-crf2 Y N OFFLINE B CrfGrp jarry-crf1 Y N ONLINE B CrfGrp jarry-crf2 Y N OFFLINE # hagrp -display CrfGrp #Group Attribute System Value CrfGrp AdministratorGroups global CrfGrp Administrators global CrfGrp Authority global 0 CrfGrp AutoFailOver global 1 CrfGrp AutoRestart global 1 CrfGrp AutoStart global 1 CrfGrp AutoStartIfPartial global 1 CrfGrp AutoStartList global jarry-crf1 jarry-crf2 CrfGrp AutoStartPolicy global Order CrfGrp ClusterFailOverPolicy global Manual CrfGrp ClusterList global CrfGrp ContainerInfo global CrfGrp DisableFaultMessages global 0 CrfGrp Evacuate global 1 CrfGrp ExtMonApp global CrfGrp ExtMonArgs global CrfGrp FailOverPolicy global Priority CrfGrp FaultPropagation global 1 CrfGrp Frozen global 0 CrfGrp GroupOwner global CrfGrp GroupRecipients global CrfGrp Guests global CrfGrp Load global 0 CrfGrp ManageFaults global ALL CrfGrp ManualOps global 1 CrfGrp OnlineClearParent global 0 CrfGrp OnlineRetryInterval global 0 CrfGrp OnlineRetryLimit global 0 CrfGrp OperatorGroups global CrfGrp Operators global CrfGrp Parallel global 1 CrfGrp PreOnline global 0 CrfGrp PreOnlining global 0 CrfGrp PreSwitch global 0 CrfGrp PreSwitching global 0 CrfGrp PreonlineTimeout global 300 CrfGrp Prerequisites global CrfGrp PrintTree global 1 CrfGrp Priority global 0 CrfGrp ProPCV global 0 CrfGrp SourceFile global ./main.cf CrfGrp SysDownPolicy global CrfGrp SystemList global jarry-crf1 0 jarry-crf2 1 CrfGrp SystemZones global CrfGrp TFrozen global 0 CrfGrp Tag global CrfGrp TriggerEvent global 1 CrfGrp TriggerPath global CrfGrp TriggerResFault global 1 CrfGrp TriggerResRestart global 0 CrfGrp TriggerResStateChange global 0 CrfGrp TriggersEnabled global CrfGrp TypeDependencies global CrfGrp UserAssoc global CrfGrp UserIntGlobal global 0 CrfGrp UserStrGlobal global CrfGrp AutoDisabled jarry-crf1 0 CrfGrp AutoDisabled jarry-crf2 0 CrfGrp Enabled jarry-crf1 1 CrfGrp Enabled jarry-crf2 1 CrfGrp IntentOnline jarry-crf1 1 CrfGrp IntentOnline jarry-crf2 0 CrfGrp NumRetries jarry-crf1 0 CrfGrp NumRetries jarry-crf2 0 CrfGrp PCVAllowOnline jarry-crf1 1 CrfGrp PCVAllowOnline jarry-crf2 1 CrfGrp Probed jarry-crf1 1 CrfGrp Probed jarry-crf2 1 CrfGrp ProbesPending jarry-crf1 0 CrfGrp ProbesPending jarry-crf2 0 CrfGrp Restart jarry-crf1 0 CrfGrp Restart jarry-crf2 0 CrfGrp State jarry-crf1 |ONLINE| CrfGrp State jarry-crf2 |OFFLINE| CrfGrp UserIntLocal jarry-crf1 0 CrfGrp UserIntLocal jarry-crf2 0 CrfGrp UserStrLocal jarry-crf1 CrfGrp UserStrLocal jarry-crf2 CrfGrp VCSi3Info jarry-crf1 CrfGrp VCSi3Info jarry-crf2 # hares -display CrfRes #Resource Attribute System Value CrfRes Group global CrfGrp CrfRes Type global CrfMonitor CrfRes AutoStart global 1 CrfRes Critical global 1 CrfRes Enabled global 1 CrfRes LastOnline global jarry-crf2 CrfRes MonitorOnly global 0 CrfRes ResourceOwner global CrfRes TriggerEvent global 0 CrfRes ArgListValues jarry-crf1 "" CrfRes ArgListValues jarry-crf2 "" CrfRes ConfidenceLevel jarry-crf1 100 CrfRes ConfidenceLevel jarry-crf2 0 CrfRes ConfidenceMsg jarry-crf1 CrfRes ConfidenceMsg jarry-crf2 CrfRes Flags jarry-crf1 CrfRes Flags jarry-crf2 CrfRes IState jarry-crf1 not waiting CrfRes IState jarry-crf2 not waiting CrfRes MonitorMethod jarry-crf1 Traditional CrfRes MonitorMethod jarry-crf2 Traditional CrfRes Probed jarry-crf1 1 CrfRes Probed jarry-crf2 1 CrfRes Start jarry-crf1 1 CrfRes Start jarry-crf2 0 CrfRes State jarry-crf1 ONLINE CrfRes State jarry-crf2 OFFLINE CrfRes ComputeStats global 0 CrfRes ContainerInfo global Type Name Enabled CrfRes ResContainerInfo global Type Name Enabled CrfRes ResourceRecipients global CrfRes TriggerPath global CrfRes TriggerResRestart global 0 CrfRes TriggerResStateChange global 0 CrfRes TriggersEnabled global CrfRes dummy global CrfRes MonitorTimeStats jarry-crf1 Avg 0 TS CrfRes MonitorTimeStats jarry-crf2 Avg 0 TS CrfRes ResourceInfo jarry-crf1 State Valid Msg TS CrfRes ResourceInfo jarry-crf2 State Valid Msg TS # hasys -display jarry-crf2 #System Attribute Value jarry-crf2 AgentsStopped 0 jarry-crf2 AvailableCapacity 100 jarry-crf2 CPUThresholdLevel Critical 90 Warning 80 Note 70 Info 60 jarry-crf2 CPUUsage 0 jarry-crf2 CPUUsageMonitoring Enabled 0 ActionThreshold 0 ActionTimeLimit 0 Action NONE NotifyThreshold 0 NotifyTimeLimit 0 jarry-crf2 Capacity 100 jarry-crf2 ConfigBlockCount 299 jarry-crf2 ConfigCheckSum 42524 jarry-crf2 ConfigDiskState CURRENT jarry-crf2 ConfigFile /etc/VRTSvcs/conf/config jarry-crf2 ConfigInfoCnt 0 jarry-crf2 ConfigModDate Thu 12 Feb 2015 02:19:30 AM EST jarry-crf2 ConnectorState Down jarry-crf2 CurrentLimits jarry-crf2 DiskHbStatus jarry-crf2 DynamicLoad 0 jarry-crf2 EngineRestarted 0 jarry-crf2 EngineVersion 6.0.30.0 jarry-crf2 FencingWeight 0 jarry-crf2 Frozen 0 jarry-crf2 GUIIPAddr jarry-crf2 HostUtilization CPU 0 Swap 0 jarry-crf2 LLTNodeId 1 jarry-crf2 LicenseType PERMANENT_SITE jarry-crf2 Limits jarry-crf2 LinkHbStatus eth3 UP eth4 UP jarry-crf2 LoadTimeCounter 0 jarry-crf2 LoadTimeThreshold 600 jarry-crf2 LoadWarningLevel 80 jarry-crf2 NoAutoDisable 0 jarry-crf2 NodeId 1 jarry-crf2 OnGrpCnt 1 jarry-crf2 PhysicalServer jarry-crf2 ShutdownTimeout 600 jarry-crf2 SourceFile ./main.cf jarry-crf2 SwapThresholdLevel Critical 90 Warning 80 Note 70 Info 60 jarry-crf2 SysInfo Linux:jarry-crf2,#1 SMP Sun Jul 27 15:55:46 EDT 2014,2.6.32-431.29.2.el6.x86_64,x86_64 jarry-crf2 SysName jarry-crf2 jarry-crf2 SysState RUNNING jarry-crf2 SystemLocation jarry-crf2 SystemOwner jarry-crf2 SystemRecipients jarry-crf2 TFrozen 0 jarry-crf2 TRSE 0 jarry-crf2 UpDownState Up jarry-crf2 UserInt 0 jarry-crf2 UserStr jarry-crf2 VCSFeatures NONE jarry-crf2 VCSMode VCS 2015/02/12 01:59:33 VCS WARNING V-16-1-11141 LLT heartbeat link status changed. Previous status =eth3, DOWN, eth4, DOWN; Current status =eth3, UP, eth4, UP. 2015/02/12 01:59:43 VCS NOTICE V-16-1-11022 VCS engine (had) started 2015/02/12 01:59:43 VCS NOTICE V-16-1-11027 VCS engine startup arguments=-restart 2015/02/12 01:59:43 VCS NOTICE V-16-1-11050 VCS engine version=6.0 2015/02/12 01:59:43 VCS NOTICE V-16-1-11051 VCS engine join version=6.0.30.0 2015/02/12 01:59:43 VCS NOTICE V-16-1-11052 VCS engine pstamp=6.0.300.000-GA-2013-01-10-16.00.01 2015/02/12 01:59:43 VCS NOTICE V-16-1-10114 Opening GAB library 2015/02/12 01:59:43 VCS NOTICE V-16-1-10619 'HAD' starting on: jarry-crf2 2015/02/12 01:59:43 VCS INFO V-16-1-10196 Cluster logger started 2015/02/12 01:59:43 VCS INFO V-16-1-10125 GAB timeout set to 30000 ms 2015/02/12 01:59:43 VCS NOTICE V-16-1-11057 GAB registration monitoring timeout set to 200000 ms 2015/02/12 01:59:43 VCS NOTICE V-16-1-11059 GAB registration monitoring action set to log system message 2015/02/12 01:59:43 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2015/02/12 01:59:48 VCS INFO V-16-1-10077 Received new cluster membership 2015/02/12 01:59:48 VCS NOTICE V-16-1-10112 System (jarry-crf2) - Membership: 0x3, DDNA: 0x0 2015/02/12 01:59:48 VCS NOTICE V-16-1-10322 System (Node '0') changed state from UNKNOWN to INITING 2015/02/12 01:59:48 VCS NOTICE V-16-1-10086 System (Node '0') is in Regular Membership - Membership: 0x3 2015/02/12 01:59:48 VCS NOTICE V-16-1-10086 System jarry-crf2 (Node '1') is in Regular Membership - Membership: 0x3 2015/02/12 01:59:48 VCS WARNING V-16-1-50129 Operation 'haclus -modify' rejected as the node is in CURRENT_DISCOVER_WAIT state 2015/02/12 01:59:48 VCS WARNING V-16-1-50129 Operation 'haclus -modify' rejected as the node is in CURRENT_DISCOVER_WAIT state 2015/02/12 01:59:48 VCS NOTICE V-16-1-10453 Node: 0 changed name from: '' to: 'jarry-crf1' 2015/02/12 01:59:48 VCS NOTICE V-16-1-10322 System jarry-crf1 (Node '0') changed state from INITING to RUNNING 2015/02/12 01:59:48 VCS NOTICE V-16-1-10322 System jarry-crf2 (Node '1') changed state from CURRENT_DISCOVER_WAIT to REMOTE_BUILD 2015/02/12 01:59:48 VCS NOTICE V-16-1-10464 Requesting snapshot from node: 0 2015/02/12 01:59:48 VCS NOTICE V-16-1-10465 Getting snapshot. snapped_membership: 0x3 current_membership: 0x3 current_jeopardy_membership: 0x0 2015/02/12 01:59:48 VCS NOTICE V-16-1-10181 Group ClusterService AutoRestart set to 1 2015/02/12 01:59:48 VCS NOTICE V-16-1-10181 Group CrfGrp AutoRestart set to 1 2015/02/12 01:59:48 VCS NOTICE V-16-1-10181 Group VCShmg AutoRestart set to 1 2015/02/12 01:59:48 VCS INFO V-16-1-10466 End of snapshot received from node: 0. snapped_membership: 0x3 current_membership: 0x3 current_jeopardy_membership: 0x0 2015/02/12 01:59:48 VCS WARNING V-16-1-10030 UseFence=NONE. Hence do not need fencing 2015/02/12 01:59:48 VCS NOTICE V-16-1-10467 Replaying broadcast queue. snapped_membership: 0x3 current_membership: 0x3 current_jeopardy_membership: 0x0 2015/02/12 01:59:48 VCS NOTICE V-16-1-10322 System jarry-crf2 (Node '1') changed state from REMOTE_BUILD to RUNNING 2015/02/12 01:59:48 VCS NOTICE V-16-1-10016 Agent /opt/VRTSvcs/bin/CrfMonitor/CrfMonitorAgent for resource type CrfMonitor successfully started at Thu Feb 12 01:59:48 2015 2015/02/12 01:59:48 VCS NOTICE V-16-1-10016 Agent /opt/VRTSvcs/bin/NIC/NICAgent for resource type NIC successfully started at Thu Feb 12 01:59:48 2015 2015/02/12 01:59:48 VCS NOTICE V-16-1-10016 Agent /opt/VRTSvcs/bin/NotifierMngr/NotifierMngrAgent for resource type NotifierMngr successfully started at Thu Feb 12 01:59:48 2015 2015/02/12 01:59:48 VCS NOTICE V-16-1-10016 Agent /opt/VRTSvcs/bin/HostMonitor for resource type HostMonitor successfully started at Thu Feb 12 01:59:48 2015 2015/02/12 01:59:48 VCS WARNING V-16-1-11141 LLT heartbeat link status changed. Previous status =UNKNOWN; Current status =eth3, UP, eth4, UP. 2015/02/12 01:59:48 VCS INFO V-16-6-15015 (jarry-crf2) hatrigger:/opt/VRTSvcs/bin/triggers/sysjoin is not a trigger scripts directory or can not be executed 2015/02/12 01:59:48 VCS INFO V-16-1-10297 Resource ntfr (Owner: Unspecified, Group: ClusterService) is online on jarry-crf2 (First probe) 2015/02/12 01:59:48 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group ClusterService 2015/02/12 01:59:48 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group ClusterService on all nodes 2015/02/12 01:59:48 VCS NOTICE V-16-1-10447 Group ClusterService is online on system jarry-crf2 2015/02/12 01:59:48 VCS WARNING V-16-6-15034 (jarry-crf2) violation:Offlining group ClusterService on system jarry-crf2 2015/02/12 01:59:48 VCS INFO V-16-1-50135 User root fired command: hagrp -offline -force ClusterService jarry-crf2 from localhost 2015/02/12 01:59:48 VCS NOTICE V-16-1-10167 Initiating manual offline of group ClusterService on system jarry-crf2 2015/02/12 01:59:48 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ntfr (Owner: Unspecified, Group: ClusterService) on System jarry-crf2 2015/02/12 01:59:48 VCS INFO V-16-6-15002 (jarry-crf2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/internal_triggers/violation jarry-crf2 ClusterService successfully 2015/02/12 01:59:49 VCS INFO V-16-1-10305 Resource ntfr (Owner: Unspecified, Group: ClusterService) is offline on jarry-crf2 (VCS initiated) 2015/02/12 01:59:49 VCS NOTICE V-16-1-10446 Group ClusterService is offline on system jarry-crf2 2015/02/12 01:59:49 VCS NOTICE V-16-1-10438 Group ClusterService has been probed on system jarry-crf2 2015/02/12 01:59:49 VCS NOTICE V-16-1-10433 Group ClusterService will not start automatically on System jarry-crf2 as the system is in restart mode. 2015/02/12 01:59:49 VCS ERROR V-16-10031-10001 (jarry-crf2) CrfMonitor:CrfMonitorRes:monitor:Failed to get CRF monitor's PID. 2015/02/12 01:59:50 VCS INFO V-16-1-10304 Resource CrfRes (Owner: Unspecified, Group: CrfGrp) is offline on jarry-crf2 (First probe) 2015/02/12 01:59:50 VCS NOTICE V-16-1-10438 Group CrfGrp has been probed on system jarry-crf2 2015/02/12 01:59:50 VCS NOTICE V-16-1-10433 Group CrfGrp will not start automatically on System jarry-crf2 as the system is in restart mode. 2015/02/12 01:59:50 VCS NOTICE V-16-1-10445 Group CrfGrp will not start automatically as atleast one system in the SystemList attribute of the group is in restart mode. 2015/02/12 01:59:52 VCS NOTICE V-16-1-10438 Group VCShmg has been probed on system jarry-crf2 2015/02/12 01:59:52 VCS NOTICE V-16-1-10433 Group VCShmg will not start automatically on System jarry-crf2 as the system is in restart mode. 2015/02/12 01:59:52 VCS NOTICE V-16-1-10445 Group VCShmg will not start automatically as atleast one system in the SystemList attribute of the group is in restart mode.
02-12-2015 02:35 AM
Hello,
Thanks for detailed information .. this gives much clarity now ..
So your question is, once VCS is recovered, ideally it should start the resource & the group as a clean close of process was called before ..
I came across below technote which suggests that VCS will not bring resource online in the event of HAD getting restarted by "hashadow" process which makes me to believe that this is default behaviour of VCS.
http://www.symantec.com/docs/HOWTO79931
However couple of things I can suggest
1. See if you can mark that resource as "critical" & see if that makes any difference to group behavior (this is just a test ).
2. To solve this problem, as suggested before, you can use preonline triggers which can help you to run some scripts.
G
02-12-2015 06:46 PM
Hello Gaurav
Now we know this is default behaviour of VCS.
For suggestion 1, our application resource has already been marked as "critical", so what we see is the behaviour of a critical resource.
For suggestion 2, we will try it.
Thanks you your helping!!