Forum Discussion

Tiger09's avatar
Tiger09
Level 4
9 years ago

After network fluctuation disk group is imported on both nodes

Hi,

After network fluctuation disk group is imported on both nodes, we try to deport DG frm passive node but it showing "offlining" status only.

i think to reslove this we have to stop the VCS forecefull on passive node, but now i able to see that srvice grp is showing offline only & DG are deported successfully.

I have attached engine_A.log file...

CAN u help me to find out how Disk grp got deported on passive node...

 

3 Replies

  • Do you have fencing configured? Do you have SCSi-3 complaint disks?  Fencing prevents such issues from occuring and prevents data corruption.

    If no fencing is configured than during network partitions each node in the sub-cluster thinks that the other nodes in the cluster is lost and it will try to bring the service group online.

    You may have faced similar problems.

     

    Read more at https://www.veritas.com/community/articles/symantec-cluster-server-io-fencing-architecture-guide

    You engine log is not attached.

     

    Regards,

    Sudhir

  • Hi,

      didn't find engine_A.log. 

      suspect while network restore, vcs find violation, do some action,  try to offline resource form passive node.

     Or some error in passive node, cause critical resource fault,  which offline whole service group.

  • You need to carefully look at heartbeat connections to ensure that they are 100% independent - no shared infrastructure.

    If heartbeats share infrastructure anywhere in the connection path, then you need to change this or deploy I/O fencing.

    You will be very lucky if filesystems did not get corrupted as a result of the simultateous import.

    I have seen 2 such cases over the years where clusters were configured over a long distance and heartbeats shared infrastructure with normal network connection across town (despite requirements being made very clear before installation).

    In both cases filesystems got corrupted as a result of network drop between sites and data had to be restored from last backup.
    Both customers implemented I/O fencing after this incident....

    PLEASE take your time and read through these topics in VCS admin guide:

    About communications, membership, and data protection in the cluster
    Administering I/O fencing
    Controlling VCS behavior