Solved: Membership: 0x3, DDNA: 0x0 ...Membership: 0x1, DDN...

Zahid_Haseeb · ‎08-11-2010

I am facing the below errors in the Event Viewer on Passive Node. I have found the below document of symantec and i have come to know what is DDNA but i am not able to understand why the HAD are being stopped in the Event Viewer.
http://seer.entsupport.symantec.com/docs/326730.htm

Errors faced in event viewer

Zahid_Haseeb · ‎08-11-2010

Thanks All for there kind replies.

My problem is resolved. I saw the above Error ( V-15-1-20036 ) in the System Logs in the Event Viewer. I got an symantec document which clearly shows that Cluster Node got disconnected from the Network or either panicked.

http://seer.entsupport.symantec.com/docs/309080.htm

View solution in original post

Anoop_Kumar1 · ‎08-11-2010

I hope below technotes will help you.

http://seer.entsupport.symantec.com/docs/326730.htm
http://seer.entsupport.symantec.com/docs/329483.htm

Zahid_Haseeb · ‎08-11-2010

Hi Anoop

Thanks for your kind reply. I have read your both links but i am still not convence that why my HAD service stopped. Where can i check additional logs to verify my HAD

Gaurav_S · ‎08-11-2010

Also described in VCS Users Guide..

Daemon Down Node Alive (DDNA)

Daemon Down Node Alive (DDNA) is a condition in which the VCS high
availability daemon (HAD) on a node fails, but the node is running. When HAD
fails, the hashadow process tries to bring HAD up again. If the hashadow process
succeeds in bringing HAD up, the system leaves the DDNA membership and
joins the regular membership.

In a DDNA condition, VCS does not have information about the state of service
groups on the node. So, VCS places all service groups that were online on the
affected node in the autodisabled state. The service groups that were online on
the node cannot fail over.

Manual intervention is required to enable failover of autodisabled service
groups. The administrator must release the resources running on the affected
node, clear resource faults, and bring the service groups online on another node.

To comment on your query, since "had" is getting killed on some node (node is faulted), that is causing DDNA membership to change.... so DDNA messages are result of node fault .. its not vice versa ..

So you should investigate why had is getting faulted on node... DDNA messages give indications of whats going on .. they are not harmful thmselves...

Gaurav

Zahid_Haseeb · ‎08-11-2010

Thanks Gaurav for your kind reply .. Would you please give me any clue so i can move ahead my investigation...

Anoop your suggested link told me to see syslog.log file to investigate more if we want to find root cause of DDNA. But i am not using Linux. I am using Windows. Would you please let me know what log should i see in windows as we see syslogs in linux.

Marianne · ‎08-11-2010

See Event Viewer System as well as Application log on Node 0 and 1 as well as logs in %VCS_HOME%\log. The most important cluster log file is engine_A.txt.

Extract from VCS admin guide:
To view log files
1. From the Control Panel, double-click Administrative Tools, then Event Viewer.
2. Review the System Log to view LLT and GAB errors.
3. Review the Application Log to view HAD errors.

HAD service is represented by the Veritas High Availability Engine (had.exe) in Services.
There is also a hashadow.exe process that monitors had.exe and restart had if it is stopped for some or other reason.
You should be able to find evidence in Event Viewer Application Log on node 1.

Also run 'hasys -display' from cmd on node 1.
Look for EngineRestarted in output :
Indicates whether the VCS engine (HAD) was restarted by the hashadow process on a node in the cluster. The value 1 indicates that the engine was restarted; 0 indicates it was not restarted.

Handy NetBackup Links

Wally_Heim · ‎08-11-2010

Hi Zahid,

The hashadow_A.txt log located in the %vcs_home%\log folder will show you all the exact times that the hashadow process restarted HAD.

From there check those times in the System event logs. You should see event log entries from either GAB or LLT. Open these messages and check the binary data at the bottom to see what each entry is actually reporting.

We would need to know more about the exact errors that you are seeing in the event logs.

You might want to open a support case and provide a set of VxExplorer logs for Support to analyze.

Thanks,
Wally

Zahid_Haseeb · ‎08-11-2010

Hi Marianne

Hope you will be fine. I have seen the System logs of the victim node and i have found four entries of GAB which are below

0000: 00 00 38 00 01 00 88 00   ..8...ˆ.
0008: 00 00 00 00 12 00 07 40   .......@
0010: 04 00 00 00 00 00 00 00   ........
0018: 00 00 00 00 00 00 00 00   ........
0020: 00 00 00 00 00 00 00 00   ........
0028: 47 41 42 20 49 4e 46 4f   GAB INFO
0030: 20 56 2d 31 35 2d 31 2d    V-15-1-
0038: 32 30 30 33 36 20 50 6f   20036 Po
0040: 72 74 20 68 20 67 65 6e   rt h gen
0048: 20 20 20 34 37 35 34 30      47540
0050: 37 20 6d 65 6d 62 65 72   7 member
0058: 73 68 69 70 20 30 31 0a   ship 01.

0000: 00 00 38 00 01 00 88 00   ..8...ˆ.
0008: 00 00 00 00 12 00 07 40   .......@
0010: 04 00 00 00 00 00 00 00   ........
0018: 00 00 00 00 00 00 00 00   ........
0020: 00 00 00 00 00 00 00 00   ........
0028: 47 41 42 20 49 4e 46 4f   GAB INFO
0030: 20 56 2d 31 35 2d 31 2d    V-15-1-
0038: 32 30 30 33 36 20 50 6f   20036 Po
0040: 72 74 20 61 20 67 65 6e   rt a gen
0048: 20 20 20 34 37 35 34 30      47540
0050: 35 20 6d 65 6d 62 65 72   5 member
0058: 73 68 69 70 20 30 31 0a   ship 01.

0000: 00 00 38 00 01 00 88 00   ..8...ˆ.
0008: 00 00 00 00 12 00 07 40   .......@
0010: 04 00 00 00 00 00 00 00   ........
0018: 00 00 00 00 00 00 00 00   ........
0020: 00 00 00 00 00 00 00 00   ........
0028: 47 41 42 20 49 4e 46 4f   GAB INFO
0030: 20 56 2d 31 35 2d 31 2d    V-15-1-
0038: 32 30 30 34 30 20 50 6f   20040 Po
0040: 72 74 20 61 20 67 65 6e   rt a gen
0048: 20 20 20 34 37 35 34 30      47540
0050: 34 20 20 20 20 76 69 73   4    vis
0058: 69 62 6c 65 20 3b 31 0a   ible ;1.

Zahid_Haseeb · ‎08-11-2010

Thanks All for there kind replies.

My problem is resolved. I saw the above Error ( V-15-1-20036 ) in the System Logs in the Event Viewer. I got an symantec document which clearly shows that Cluster Node got disconnected from the Network or either panicked.

http://seer.entsupport.symantec.com/docs/309080.htm

Zahid_Haseeb · ‎08-16-2010

i have also found the below errors in the Engine log file of %VCS Home% log folder which clearly shows that there is some problem with the HeartBeat.

2010/08/10 16:12:54 VCS WARNING V-16-1-11155 LLT heartbeat link status changed. Previous status = 0xffffffff; Current status = 0x3.

VOX

Membership: 0x3, DDNA: 0x0 ...Membership: 0x1, DDNA: 0x2... is in DDNA Membership - Membership: 0x1, Visible: 0x0