Forum Discussion

Zahid_Haseeb's avatar
Zahid_Haseeb
Moderator
14 years ago

Membership: 0x3, DDNA: 0x0 ...Membership: 0x1, DDNA: 0x2... is in DDNA Membership - Membership: 0x1, Visible: 0x0


I am facing the below errors in the Event Viewer on Passive Node. I have found the below document of symantec and i have come to know what is DDNA but i am not able to understand why the HAD are being stopped in the Event Viewer.
http://seer.entsupport.symantec.com/docs/326730.htm


Errors faced in event viewer





9 Replies


  • I hope below technotes will help you.

    http://seer.entsupport.symantec.com/docs/326730.htm
    http://seer.entsupport.symantec.com/docs/329483.htm
  • Hi Anoop

    Thanks for your kind reply. I have read your both links but i am still not convence that why my HAD service stopped. Where can i check additional logs to verify my HAD
  • Also described in VCS Users Guide..

    Daemon Down Node Alive (DDNA)

    Daemon Down Node Alive (DDNA) is a condition in which the VCS high
    availability daemon (HAD) on a node fails, but the node is running. When HAD
    fails, the hashadow process tries to bring HAD up again. If the hashadow process
    succeeds in bringing HAD up, the system leaves the DDNA membership and
    joins the regular membership.

    In a DDNA condition, VCS does not have information about the state of service
    groups on the node. So, VCS places all service groups that were online on the
    affected node in the autodisabled state. The service groups that were online on
    the node cannot fail over.

    Manual intervention is required to enable failover of autodisabled service
    groups. The administrator must release the resources running on the affected
    node, clear resource faults, and bring the service groups online on another node.


    To comment on your query, since "had" is getting killed on some node (node is faulted), that is causing DDNA membership to change.... so DDNA messages are result of node fault .. its not vice versa ..

    So you should investigate why had is getting faulted on node... DDNA messages give indications of whats going on .. they are not harmful thmselves...

    Gaurav
  • Thanks Gaurav for your kind reply .. Would you please give me any clue so i can move ahead my investigation...

    Anoop your suggested link told me to see syslog.log file to investigate more if we want to find root cause of DDNA. But i am not using Linux. I am using Windows. Would you please let me know what log should i see in windows as we see syslogs in linux.
  • See Event Viewer System as well as Application log on Node 0 and 1 as well as logs in %VCS_HOME%\log. The most important cluster log file is engine_A.txt.

    Extract from VCS admin guide:
    To view log files
    1. From the Control Panel, double-click Administrative Tools, then Event Viewer.
    2. Review the System Log to view LLT and GAB errors.
    3. Review the Application Log to view HAD errors.

    HAD service is represented by the Veritas High Availability Engine (had.exe) in Services.
    There is also a hashadow.exe process that monitors had.exe and restart had if it is stopped for some or other reason.
    You should be able to find evidence in Event Viewer Application Log on node 1.

    Also run 'hasys -display' from cmd on node 1.
    Look for EngineRestarted in output :
    Indicates whether the VCS engine (HAD) was restarted by the hashadow process on a node in the cluster. The value 1 indicates that the engine was restarted; 0 indicates it was not restarted.
  • Hi Zahid,

    The hashadow_A.txt log located in the %vcs_home%\log folder will show you all the exact times that the hashadow process restarted HAD. 

    From there check those times in the System event logs.  You should see event log entries from either GAB or LLT.  Open these messages and check the binary data at the bottom to see what each entry is actually reporting.

    We would need to know more about the exact errors that you are seeing in the event logs.

    You might want to open a support case and provide a set of VxExplorer logs for Support to analyze.

    Thanks,
    Wally
  • Hi Marianne

    Hope you will be fine. I have seen the System logs of the victim node and i have found four entries of GAB which are below




    0000: 00 00 38 00 01 00 88 00   ..8...ˆ.
    0008: 00 00 00 00 12 00 07 40   .......@
    0010: 04 00 00 00 00 00 00 00   ........
    0018: 00 00 00 00 00 00 00 00   ........
    0020: 00 00 00 00 00 00 00 00   ........
    0028: 47 41 42 20 49 4e 46 4f   GAB INFO
    0030: 20 56 2d 31 35 2d 31 2d    V-15-1-
    0038: 32 30 30 33 36 20 50 6f   20036 Po
    0040: 72 74 20 68 20 67 65 6e   rt h gen
    0048: 20 20 20 34 37 35 34 30      47540
    0050: 37 20 6d 65 6d 62 65 72   7 member
    0058: 73 68 69 70 20 30 31 0a   ship 01.


    0000: 00 00 38 00 01 00 88 00   ..8...ˆ.
    0008: 00 00 00 00 12 00 07 40   .......@
    0010: 04 00 00 00 00 00 00 00   ........
    0018: 00 00 00 00 00 00 00 00   ........
    0020: 00 00 00 00 00 00 00 00   ........
    0028: 47 41 42 20 49 4e 46 4f   GAB INFO
    0030: 20 56 2d 31 35 2d 31 2d    V-15-1-
    0038: 32 30 30 33 36 20 50 6f   20036 Po
    0040: 72 74 20 61 20 67 65 6e   rt a gen
    0048: 20 20 20 34 37 35 34 30      47540
    0050: 35 20 6d 65 6d 62 65 72   5 member
    0058: 73 68 69 70 20 30 31 0a   ship 01.


    0000: 00 00 38 00 01 00 88 00   ..8...ˆ.
    0008: 00 00 00 00 12 00 07 40   .......@
    0010: 04 00 00 00 00 00 00 00   ........
    0018: 00 00 00 00 00 00 00 00   ........
    0020: 00 00 00 00 00 00 00 00   ........
    0028: 47 41 42 20 49 4e 46 4f   GAB INFO
    0030: 20 56 2d 31 35 2d 31 2d    V-15-1-
    0038: 32 30 30 34 30 20 50 6f   20040 Po
    0040: 72 74 20 61 20 67 65 6e   rt a gen
    0048: 20 20 20 34 37 35 34 30      47540
    0050: 34 20 20 20 20 76 69 73   4    vis
    0058: 69 62 6c 65 20 3b 31 0a   ible ;1.



  • i have also found the below errors in the Engine log file of %VCS Home% log folder which clearly shows that there is some problem with the HeartBeat.

    2010/08/10 16:12:54 VCS WARNING V-16-1-11155 LLT heartbeat link status changed. Previous status = 0xffffffff; Current status = 0x3.