cancel
Showing results for 
Search instead for 
Did you mean: 

After network fluctuation disk group is imported on both nodes

Tiger09
Level 4

Hi,

After network fluctuation disk group is imported on both nodes, we try to deport DG frm passive node but it showing "offlining" status only.

i think to reslove this we have to stop the VCS forecefull on passive node, but now i able to see that srvice grp is showing offline only & DG are deported successfully.

I have attached engine_A.log file...

CAN u help me to find out how Disk grp got deported on passive node...

 

3 REPLIES 3

sudhir_h
Level 4
Employee

Do you have fencing configured? Do you have SCSi-3 complaint disks?  Fencing prevents such issues from occuring and prevents data corruption.

If no fencing is configured than during network partitions each node in the sub-cluster thinks that the other nodes in the cluster is lost and it will try to bring the service group online.

You may have faced similar problems.

 

Read more at https://www.veritas.com/community/articles/symantec-cluster-server-io-fencing-architecture-guide

You engine log is not attached.

 

Regards,

Sudhir

starflyfly
Level 6
Employee Accredited Certified

Hi,

  didn't find engine_A.log. 

  suspect while network restore, vcs find violation, do some action,  try to offline resource form passive node.

 Or some error in passive node, cause critical resource fault,  which offline whole service group.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

You need to carefully look at heartbeat connections to ensure that they are 100% independent - no shared infrastructure.

If heartbeats share infrastructure anywhere in the connection path, then you need to change this or deploy I/O fencing.

You will be very lucky if filesystems did not get corrupted as a result of the simultateous import.

I have seen 2 such cases over the years where clusters were configured over a long distance and heartbeats shared infrastructure with normal network connection across town (despite requirements being made very clear before installation).

In both cases filesystems got corrupted as a result of network drop between sites and data had to be restored from last backup.
Both customers implemented I/O fencing after this incident....

PLEASE take your time and read through these topics in VCS admin guide:

About communications, membership, and data protection in the cluster
Administering I/O fencing
Controlling VCS behavior