Query on VCS & VxVM
Hi Gaurav,
I wish to verify with you troubleshooting things.
An service group could failover onto other node because the curent running node is heavy loaded?If yes how i could see that this is the reason?Any clue on engine_A.log?Then cpus load more than 60%?physical memory to few and swap not enough?
If a plex is in an NDEV this means that the disk is faulty or the paths to disks are faulty.right?
If an service group failover to other node then the engine_A.log will shows me the faulty resource who was the culprit?If yes,then let`s say that an logical volume reource went faulty.Next i should to investigate in /var/adm/messages to see if the issue is because of the disk or of because the paths to disk(hba)?
At the time i perform "hastart" command,this will start only the had,hashadow and the agents based on main.cf resources?right?
hastart will not start the llt and gab.right?
so if llt and gab are not started and i issue hastart,then nothing happens?
Could be an resource online at the OS level and offline at the vcs level?
What happens when an resource agent disapear and the resource is online?
thanks so much.
If in case only an agent is hung, there is no need to restart entire cluster ... you can do "haagent -stop" or kill -9 to agent process & then do a "haagent -start <agent> "
As mentioned many times before, plex state will depend on the fault, not necessarily it will be in "Disabled stale" state, there are many other plex states. If VCS is unable to make recovery of volumes, it will keep the group in faulted state, then you can fix the plex states manually & once done, start the group from VCS.
If group fails over to other node & if VCS is able to make recovery, none of steps will be required however again .. depends on what the fault is & whether VCS can recover it automatically or no.
G
Please, my friend...
We cannot answer all your 'what if' questions...
All the 'What If' and theory is covered in the manuals as well as online documentation on SORT.
And in Classroom training...You seem to confuse Storage Foundation issues with Cluster admin.
Understand that failures at disk level will cause VCS to react in the way that it has been configured. Similar to any resource type that VCS is monitoring.
Plex states depend on what exactly is wrong at hardware level (covered in VxVM documentation).There should be no reason for VCS agents to be 'hung'. If so, you may want to log a call with Symantec Support.
Agents can be stopped and started with 'haagent' commands.
See http://sfdoccentral.symantec.com/sf/5.0/hpux/manpages/vcs/haagent_1m.html for command usage.