Solved: plex node

tanislavm · ‎07-01-2014

Hi, If I have an volume into an group,and this volume is an critical resource within an 2 vcs node cluster.If this volume is made up from 2 plexes and at certain moment the 2 plexes go into stale state,what happens?The service try to failover to the other node?Is this successfully?If not I should to perform vxplex -f attach and vxvol start on this node? thanks so much?

Gaurav_S · ‎07-01-2014

Hi,

If a volume is critical resource & if both the plex go bad (stale state), definitely volume resource will go into faulted state & since it is critical, it will take the entire service group to other node.

VCS Volume agent will attempt a recovery to start the volume on its own, if it can, it will. If plex goes in any other state in between because of underlying issues, volume resource will fault & this entire group will fault. Then you would need manual intervention to fix the plexes & bring volume to "Enabled Active" state

G

View solution in original post

Gaurav_S · ‎07-01-2014

Hi,

If a volume is critical resource & if both the plex go bad (stale state), definitely volume resource will go into faulted state & since it is critical, it will take the entire service group to other node.

VCS Volume agent will attempt a recovery to start the volume on its own, if it can, it will. If plex goes in any other state in between because of underlying issues, volume resource will fault & this entire group will fault. Then you would need manual intervention to fix the plexes & bring volume to "Enabled Active" state

G

tanislavm · ‎07-01-2014

Hi Gaurav,

I wish to verify with you troubleshooting things.

An service group could failover onto other node because the curent running node is heavy loaded?If yes how i could see that this is the reason?Any clue on engine_A.log?Then cpus load more than 60%?physical memory to few and swap not enough?

If a plex is in an NDEV this means that the disk is faulty or the paths to disks are faulty.right?

If an service group failover to other node then the engine_A.log will shows me the faulty resource who was the culprit?If yes,then let`s say that an logical volume reource went faulty.Next i should to investigate in /var/adm/messages to see if the issue is because of the disk or of because the paths to disk(hba)?

At the time i perform "hastart" command,this will start only the had,hashadow and the agents based on main.cf resources?right?

hastart will not start the llt and gab.right?

so if llt and gab are not started and i issue hastart,then nothing happens?

Could be an resource online at the OS level and offline at the vcs level?

What happens when an resource agent disapear and the resource is online?

thanks so much.

Gaurav_S · ‎07-01-2014

Since the query was unrelated to previous post, have moved this to a new thread .. Please follow 1 issue per forum discussion

https://www-secure.symantec.com/connect/forums/query-vcs-vxvm

G

Marianne · ‎07-02-2014

I believe Gaurav has answered your post about 'the 2 plexes go into stale state,'

You may want to download the Storage Foundation as well as Cluster server manuals from https://sort.symantec.com/documents

The VCS Admin Guide covers resource faults and how they are handled in detail.

Storage Foundation /Volume manager troubleshooting is covered in the relevant Admin Guide.
There should be no reason for 2 plexes to fail simultaneously - that is why we add mirroring (avoid Single Point of Failure (SPOF) that will prevent failover).
In case of such a 'catastrofic' failure, the 2nd node will attempt to online the Service Group, but if all the plexes of the volumes are inaccessible, the Service Group on both nodes will be faulted and the SG will be offline.

Please also consider the Symantec Classroom training for Storage Foundation and Cluster Server.

Handy NetBackup Links

VOX

plex node