Forum Discussion

symsonu's avatar
symsonu
Level 6
11 years ago

Nic resource going down frequently



We are experiencing below mentioned faults every now and then that affects our backup.

setup is like we have BkupLan service group in which we have NIC resource (bkup_nic) and Phantom resource .

Two service groups OSSfs and sybase1 have proxy resource monitoring this NIC resource.
and both are configured with Ip resources Ossbak_ip1 and sybbak_ip1.

The issue is Nic resource failed with below message and in turn proxy and ip resources failed.
Then NIC comes back and proxy and ip come back  too.
Below is the configuration too
Please advice why it goen offline/online frequently



2013/10/24 22:32:03 VCS WARNING V-16-10001-7506 (ossadm2) NIC:bkup_nic:monitor:Resource is offline. No Network Host could be reached

2013/10/24 22:32:03 VCS INFO V-16-2-13716 (ossadm2) Resource(bkup_nic): Output of the completed operation (monitor)
==============================================
Broken Pipe
==============================================

2013/10/24 22:32:03 VCS ERROR V-16-1-10303 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is FAULTED (timed out) on sys ossadm2
2013/10/24 22:32:03 VCS INFO V-16-6-0 (ossadm2) resfault:(resfault) Invoked with arg0=ossadm2, arg1=bkup_nic, arg2=ONLINE
2013/10/24 22:32:03 VCS INFO V-16-0 (ossadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=ossadm2 ,arg2=bkup_nic2013/10/24 22:32:03 VCS INFO V-16-6-15002 (ossadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault ossadm2 bkup_nic ONLINE  successfully
2013/10/24 22:32:20 VCS ERROR V-16-1-10303 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is FAULTED (timed out) on sys ossadm2
2013/10/24 22:32:20 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ossbak_ip (Owner: Unspecified, Group: Ossfs) on System ossadm2
2013/10/24 22:32:20 VCS ERROR V-16-1-10303 Resource syb1bak_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys ossadm2
2013/10/24 22:32:20 VCS INFO V-16-6-0 (ossadm2) resfault:(resfault) Invoked with arg0=ossadm2, arg1=ossbak_p1, arg2=ONLINE
2013/10/24 22:32:20 VCS INFO V-16-0 (ossadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=ossadm2 ,arg2=ossbak_p1
2013/10/24 22:32:20 VCS INFO V-16-6-0 (ossadm2) resfault:(resfault) Invoked with arg0=ossadm2, arg1=syb1bak_p1, arg2=ONLINE
2013/10/24 22:32:20 VCS INFO V-16-0 (ossadm2) resfault:(resfault.sh) Invoked with arg0=/ericsson/core/cluster/scripts/resfault.sh, arg1=ossadm2 ,arg2=syb1bak_p1
2013/10/24 22:32:20 VCS INFO V-16-6-15002 (ossadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault ossadm2 ossbak_p1 ONLINE  successfully
2013/10/24 22:32:20 VCS INFO V-16-6-15002 (ossadm2) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resfault ossadm2 syb1bak_p1 ONLINE  successfully
2013/10/24 22:32:22 VCS INFO V-16-1-10305 Resource ossbak_ip (Owner: Unspecified, Group: Ossfs) is offline on ossadm2 (VCS initiated)
2013/10/24 22:33:04 VCS INFO V-16-1-10299 Resource bkup_nic (Owner: Unspecified, Group: BkupLan) is online on ossadm2 (Not initiated by VCS)
2013/10/24 22:33:04 VCS NOTICE V-16-1-10447 Group BkupLan is online on system ossadm2
2013/10/24 22:33:20 VCS INFO V-16-1-10299 Resource syb1bak_p1 (Owner: Unspecified, Group: Sybase1) is online on ossadm2 (Not initiated by VCS)
2013/10/24 22:33:20 VCS INFO V-16-1-10299 Resource ossbak_p1 (Owner: Unspecified, Group: Ossfs) is online on ossadm2 (Not initiated by VCS)


====================================================================================


ossadm2{root} # hastatus -summ

-- SYSTEM STATE
-- System               State                Frozen

A  ossadm1              RUNNING              0
A  ossadm2              RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  BkupLan         ossadm1              Y          N               ONLINE
B  BkupLan         ossadm2              Y          N               ONLINE
B  DDCMon          ossadm1              Y          N               ONLINE
B  DDCMon          ossadm2              Y          N               ONLINE
B  Oss             ossadm1              Y          N               OFFLINE
B  Oss             ossadm2              Y          N               ONLINE
B  Ossfs           ossadm1              Y          N               OFFLINE
B  Ossfs           ossadm2              Y          N               ONLINE
B  PrivLan         ossadm1              Y          N               ONLINE
B  PrivLan         ossadm2              Y          N               ONLINE
B  PubLan          ossadm1              Y          N               ONLINE
B  PubLan          ossadm2              Y          N               ONLINE
B  StorLan         ossadm1              Y          N               ONLINE
B  StorLan         ossadm2              Y          N               ONLINE
B  Sybase1         ossadm1              Y          N               ONLINE
B  Sybase1         ossadm2              Y          N               OFFLINE
you have mail
ossadm2{root} # hagrp -resources BkupLan
bkup_nic
bkup_p
ossadm2{root} # hagrp -resources bkup_nic
VCS WARNING V-16-1-12130 Group bkup_nic does not exist
ossadm2{root} # hagrp -resources BkupLan
bkup_nic
bkup_p
ossadm2{root} # hares -display bkup_nic
#Resource    Attribute             System     Value
bkup_nic     Group                 global     BkupLan
bkup_nic     Type                  global     NIC
bkup_nic     AutoStart             global     1
bkup_nic     Critical              global     0
bkup_nic     Enabled               global     1
bkup_nic     LastOnline            global     ossadm2
bkup_nic     MonitorOnly           global     0
bkup_nic     ResourceOwner         global
bkup_nic     TriggerEvent          global     0
bkup_nic     ArgListValues         ossadm1    Device    1       oce12   PingOptimize    1       1       NetworkHosts    1       127.0.0.1       Protocol   1       IPv4    NetworkType     1       ether   ExclusiveIPZone 1       0
bkup_nic     ArgListValues         ossadm2    Device    1       oce12   PingOptimize    1       1       NetworkHosts    1       127.0.0.1       Protocol   1       IPv4    NetworkType     1       ether   ExclusiveIPZone 1       0
bkup_nic     ConfidenceLevel       ossadm1    100
bkup_nic     ConfidenceLevel       ossadm2    100
bkup_nic     ConfidenceMsg         ossadm1
bkup_nic     ConfidenceMsg         ossadm2
bkup_nic     Flags                 ossadm1
bkup_nic     Flags                 ossadm2
bkup_nic     IState                ossadm1    not waiting
bkup_nic     IState                ossadm2    not waiting
bkup_nic     MonitorMethod         ossadm1    Traditional
bkup_nic     MonitorMethod         ossadm2    Traditional
bkup_nic     Probed                ossadm1    1
bkup_nic     Probed                ossadm2    1
bkup_nic     Start                 ossadm1    0
bkup_nic     Start                 ossadm2    0
bkup_nic     State                 ossadm1    ONLINE
bkup_nic     State                 ossadm2    ONLINE
bkup_nic     ComputeStats          global     0
bkup_nic     ExclusiveIPZone       global     0
bkup_nic     NetworkHosts          global     127.0.0.1
bkup_nic     NetworkType           global     ether
bkup_nic     PingOptimize          global     1
bkup_nic     Protocol              global     IPv4
bkup_nic     TriggerResStateChange global     0
bkup_nic     ContainerInfo         ossadm1    Type              Name            Enabled
bkup_nic     ContainerInfo         ossadm2    Type              Name            Enabled
bkup_nic     Device                ossadm1    oce12
bkup_nic     Device                ossadm2    oce12
bkup_nic     MonitorTimeStats      ossadm1    Avg       0       TS
bkup_nic     MonitorTimeStats      ossadm2    Avg       0       TS
bkup_nic     ResourceInfo          ossadm1    State     Stale   Msg             TS
bkup_nic     ResourceInfo          ossadm2    State     Stale   Msg             TS
ossadm2{root} # hares -display bkup_p
#Resource    Attribute             System     Value
bkup_p       Group                 global     BkupLan
bkup_p       Type                  global     Phantom
bkup_p       AutoStart             global     1
bkup_p       Critical              global     1
bkup_p       Enabled               global     1
bkup_p       LastOnline            global     ossadm1
bkup_p       MonitorOnly           global     0
bkup_p       ResourceOwner         global
bkup_p       TriggerEvent          global     0
bkup_p       ArgListValues         ossadm1    ""
bkup_p       ArgListValues         ossadm2    ""
bkup_p       ConfidenceLevel       ossadm1    100
bkup_p       ConfidenceLevel       ossadm2    100
bkup_p       ConfidenceMsg         ossadm1
bkup_p       ConfidenceMsg         ossadm2
bkup_p       Flags                 ossadm1
bkup_p       Flags                 ossadm2
bkup_p       IState                ossadm1    not waiting
bkup_p       IState                ossadm2    not waiting
bkup_p       MonitorMethod         ossadm1    Traditional
bkup_p       MonitorMethod         ossadm2    Traditional
bkup_p       Probed                ossadm1    1
bkup_p       Probed                ossadm2    1
bkup_p       Start                 ossadm1    1
bkup_p       Start                 ossadm2    1
bkup_p       State                 ossadm1    ONLINE
bkup_p       State                 ossadm2    ONLINE
bkup_p       ComputeStats          global     0
bkup_p       TriggerResStateChange global     0
bkup_p       ContainerInfo         ossadm1    Type              Name            Enabled
bkup_p       ContainerInfo         ossadm2    Type              Name            Enabled
bkup_p       MonitorTimeStats      ossadm1    Avg       0       TS
bkup_p       MonitorTimeStats      ossadm2    Avg       0       TS
bkup_p       ResourceInfo          ossadm1    State     Stale   Msg             TS
bkup_p       ResourceInfo          ossadm2    State     Valid   Msg             TS
ossadm2{root} # hagrp -resources Sybase1
sybasedg
sybmaster_mount
syblog_mount
sybdata_mount
pmsyblog_mount
pmsybdata_mount
fmsyblog_mount
fmsybdata_mount
dbdumps_mount
syb1_ip
syb1_p1
syb1bak_ip
syb1bak_p1
masterdataservice
masterdataservice_BACKUP
stop_sybase
ossadm2{root} # hares -display syb1bak_p1
#Resource    Attribute             System     Value
syb1bak_p1   Group                 global     Sybase1
syb1bak_p1   Type                  global     Proxy
syb1bak_p1   AutoStart             global     1
syb1bak_p1   Critical              global     0
syb1bak_p1   Enabled               global     1
syb1bak_p1   LastOnline            global     ossadm2
syb1bak_p1   MonitorOnly           global     0
syb1bak_p1   ResourceOwner         global
syb1bak_p1   TriggerEvent          global     0
syb1bak_p1   ArgListValues         ossadm1    TargetResName     1       bkup_nic        TargetSysName   1       ""      TargetResName:Probed    1       1   TargetResName:State     1       2
syb1bak_p1   ArgListValues         ossadm2    TargetResName     1       bkup_nic        TargetSysName   1       ""      TargetResName:Probed    1       1   TargetResName:State     1       2
syb1bak_p1   ConfidenceLevel       ossadm1    0
syb1bak_p1   ConfidenceLevel       ossadm2    0
syb1bak_p1   ConfidenceMsg         ossadm1
syb1bak_p1   ConfidenceMsg         ossadm2
syb1bak_p1   Flags                 ossadm1
syb1bak_p1   Flags                 ossadm2
syb1bak_p1   IState                ossadm1    not waiting
syb1bak_p1   IState                ossadm2    not waiting
syb1bak_p1   MonitorMethod         ossadm1    Traditional
syb1bak_p1   MonitorMethod         ossadm2    Traditional
syb1bak_p1   Probed                ossadm1    1
syb1bak_p1   Probed                ossadm2    1
syb1bak_p1   Start                 ossadm1    0
syb1bak_p1   Start                 ossadm2    0
syb1bak_p1   State                 ossadm1    ONLINE
syb1bak_p1   State                 ossadm2    ONLINE
syb1bak_p1   ComputeStats          global     0
syb1bak_p1   ResourceInfo          global     State     Stale   Msg             TS
syb1bak_p1   TargetResName         global     bkup_nic
syb1bak_p1   TargetSysName         global
syb1bak_p1   TriggerResStateChange global     0
syb1bak_p1   ContainerInfo         ossadm1    Type              Name            Enabled
syb1bak_p1   ContainerInfo         ossadm2    Type              Name            Enabled
syb1bak_p1   MonitorTimeStats      ossadm1    Avg       0       TS
syb1bak_p1   MonitorTimeStats      ossadm2    Avg       0       TS
ossadm2{root} # hares -display syb1bak_ip
#Resource    Attribute             System     Value
syb1bak_ip   Group                 global     Sybase1
syb1bak_ip   Type                  global     IP
syb1bak_ip   AutoStart             global     1
syb1bak_ip   Critical              global     0
syb1bak_ip   Enabled               global     1
syb1bak_ip   LastOnline            global     ossadm1
syb1bak_ip   MonitorOnly           global     0
syb1bak_ip   ResourceOwner         global
syb1bak_ip   TriggerEvent          global     0
syb1bak_ip   ArgListValues         ossadm1    Device    1       oce12   Address 1       10.41.78.138    NetMask 1       255.255.255.224 Options 1       ""   ArpDelay        1       1       IfconfigTwice   1       1       RouteOptions    1       ""      PrefixLen       1       0       ExclusiveIPZone 1       0
syb1bak_ip   ArgListValues         ossadm2    Device    1       oce12   Address 1       10.41.78.138    NetMask 1       255.255.255.224 Options 1       ""   ArpDelay        1       1       IfconfigTwice   1       1       RouteOptions    1       ""      PrefixLen       1       0       ExclusiveIPZone 1       0
syb1bak_ip   ConfidenceLevel       ossadm1    100
syb1bak_ip   ConfidenceLevel       ossadm2    0
syb1bak_ip   ConfidenceMsg         ossadm1
syb1bak_ip   ConfidenceMsg         ossadm2
syb1bak_ip   Flags                 ossadm1
syb1bak_ip   Flags                 ossadm2
syb1bak_ip   IState                ossadm1    not waiting
syb1bak_ip   IState                ossadm2    not waiting
syb1bak_ip   MonitorMethod         ossadm1    Traditional
syb1bak_ip   MonitorMethod         ossadm2    Traditional
syb1bak_ip   Probed                ossadm1    1
syb1bak_ip   Probed                ossadm2    1
syb1bak_ip   Start                 ossadm1    1
syb1bak_ip   Start                 ossadm2    0
syb1bak_ip   State                 ossadm1    ONLINE
syb1bak_ip   State                 ossadm2    OFFLINE
syb1bak_ip   Address               global     10.41.78.138
syb1bak_ip   ArpDelay              global     1
syb1bak_ip   ComputeStats          global     0
syb1bak_ip   ExclusiveIPZone       global     0
syb1bak_ip   IfconfigTwice         global     1
syb1bak_ip   NetMask               global     255.255.255.224
syb1bak_ip   Options               global
syb1bak_ip   PrefixLen             global     0
syb1bak_ip   ResourceInfo          global     State     Stale   Msg             TS
syb1bak_ip   RouteOptions          global
syb1bak_ip   TriggerResStateChange global     0
syb1bak_ip   ContainerInfo         ossadm1    Type              Name            Enabled
syb1bak_ip   ContainerInfo         ossadm2    Type              Name            Enabled
syb1bak_ip   Device                ossadm1    oce12
syb1bak_ip   Device                ossadm2    oce12
syb1bak_ip   MonitorTimeStats      ossadm1    Avg       0       TS
syb1bak_ip   MonitorTimeStats      ossadm2    Avg       0       TS




ossadm2{root} # hares -display ossbak_p1
#Resource    Attribute             System     Value
ossbak_p1    Group                 global     Ossfs
ossbak_p1    Type                  global     Proxy
ossbak_p1    AutoStart             global     1
ossbak_p1    Critical              global     0
ossbak_p1    Enabled               global     1
ossbak_p1    LastOnline            global     ossadm2
ossbak_p1    MonitorOnly           global     0
ossbak_p1    ResourceOwner         global
ossbak_p1    TriggerEvent          global     0
ossbak_p1    ArgListValues         ossadm1    TargetResName     1       bkup_nic        TargetSysName   1       ""      TargetResName:Probed    1       1   TargetResName:State     1       2
ossbak_p1    ArgListValues         ossadm2    TargetResName     1       bkup_nic        TargetSysName   1       ""      TargetResName:Probed    1       1   TargetResName:State     1       2
ossbak_p1    ConfidenceLevel       ossadm1    0
ossbak_p1    ConfidenceLevel       ossadm2    0
ossbak_p1    ConfidenceMsg         ossadm1
ossbak_p1    ConfidenceMsg         ossadm2
ossbak_p1    Flags                 ossadm1
ossbak_p1    Flags                 ossadm2
ossbak_p1    IState                ossadm1    not waiting
ossbak_p1    IState                ossadm2    not waiting
ossbak_p1    MonitorMethod         ossadm1    Traditional
ossbak_p1    MonitorMethod         ossadm2    Traditional
ossbak_p1    Probed                ossadm1    1
ossbak_p1    Probed                ossadm2    1
ossbak_p1    Start                 ossadm1    0
ossbak_p1    Start                 ossadm2    0
ossbak_p1    State                 ossadm1    ONLINE
ossbak_p1    State                 ossadm2    ONLINE
ossbak_p1    ComputeStats          global     0
ossbak_p1    ResourceInfo          global     State     Stale   Msg             TS
ossbak_p1    TargetResName         global     bkup_nic
ossbak_p1    TargetSysName         global
ossbak_p1    TriggerResStateChange global     0
ossbak_p1    ContainerInfo         ossadm1    Type              Name            Enabled
ossbak_p1    ContainerInfo         ossadm2    Type              Name            Enabled
ossbak_p1    MonitorTimeStats      ossadm1    Avg       0       TS
ossbak_p1    MonitorTimeStats      ossadm2    Avg       0       TS
ossadm2{root} # hares -display ossbak_ip
#Resource    Attribute             System     Value
ossbak_ip    Group                 global     Ossfs
ossbak_ip    Type                  global     IP
ossbak_ip    AutoStart             global     1
ossbak_ip    Critical              global     0
ossbak_ip    Enabled               global     1
ossbak_ip    LastOnline            global     ossadm2
ossbak_ip    MonitorOnly           global     0
ossbak_ip    ResourceOwner         global
ossbak_ip    TriggerEvent          global     0
ossbak_ip    ArgListValues         ossadm1    Device    1       oce12   Address 1       10.41.78.137    NetMask 1       255.255.255.224 Options 1       ""   ArpDelay        1       1       IfconfigTwice   1       1       RouteOptions    1       ""      PrefixLen       1       0       ExclusiveIPZone 1       0
ossbak_ip    ArgListValues         ossadm2    Device    1       oce12   Address 1       10.41.78.137    NetMask 1       255.255.255.224 Options 1       ""   ArpDelay        1       1       IfconfigTwice   1       1       RouteOptions    1       ""      PrefixLen       1       0       ExclusiveIPZone 1       0
ossbak_ip    ConfidenceLevel       ossadm1    0
ossbak_ip    ConfidenceLevel       ossadm2    100
ossbak_ip    ConfidenceMsg         ossadm1
ossbak_ip    ConfidenceMsg         ossadm2
ossbak_ip    Flags                 ossadm1
ossbak_ip    Flags                 ossadm2
ossbak_ip    IState                ossadm1    not waiting
ossbak_ip    IState                ossadm2    not waiting
ossbak_ip    MonitorMethod         ossadm1    Traditional
ossbak_ip    MonitorMethod         ossadm2    Traditional
ossbak_ip    Probed                ossadm1    1
ossbak_ip    Probed                ossadm2    1
ossbak_ip    Start                 ossadm1    0
ossbak_ip    Start                 ossadm2    1
ossbak_ip    State                 ossadm1    OFFLINE
ossbak_ip    State                 ossadm2    ONLINE
ossbak_ip    Address               global     10.41.78.137
ossbak_ip    ArpDelay              global     1
ossbak_ip    ComputeStats          global     0
ossbak_ip    ExclusiveIPZone       global     0
ossbak_ip    IfconfigTwice         global     1
ossbak_ip    NetMask               global     255.255.255.224
ossbak_ip    Options               global
ossbak_ip    PrefixLen             global     0
ossbak_ip    ResourceInfo          global     State     Stale   Msg             TS
ossbak_ip    RouteOptions          global
ossbak_ip    TriggerResStateChange global     0
ossbak_ip    ContainerInfo         ossadm1    Type              Name            Enabled
ossbak_ip    ContainerInfo         ossadm2    Type              Name            Enabled
ossbak_ip    Device                ossadm1    oce12
ossbak_ip    Device                ossadm2    oce12
ossbak_ip    MonitorTimeStats      ossadm1    Avg       0       TS
ossbak_ip    MonitorTimeStats      ossadm2    Avg       0       TS


ossadm2{root} # hatype -display NIC
#Type        Attribute              Value
NIC          AEPTimeout             0
NIC          ActionTimeout          30
NIC          AgentClass             TS
NIC          AgentDirectory
NIC          AgentFailedOn
NIC          AgentFile
NIC          AgentPriority          0
NIC          AgentReplyTimeout      130
NIC          AgentStartTimeout      60
NIC          AlertOnMonitorTimeouts 0
NIC          ArgList                Device      PingOptimize    NetworkHosts    Protocol        NetworkType     ExclusiveIPZone
NIC          AttrChangedTimeout     60
NIC          CleanRetryLimit        0
NIC          CleanTimeout           60
NIC          CloseTimeout           60
NIC          ConfInterval           600
NIC          ContainerOpts          RunInContainer      0       PassCInfo       1
NIC          EPClass                -1
NIC          EPPriority             -1
NIC          ExternalStateChange
NIC          FaultOnMonitorTimeouts 4
NIC          FaultPropagation       1
NIC          FireDrill              0
NIC          IMF                    Mode        0       MonitorFreq     1       RegisterRetryLimit      3
NIC          IMFRegList
NIC          InfoInterval           0
NIC          InfoTimeout            30
NIC          LevelTwoMonitorFreq    0
NIC          LogDbg
NIC          LogFileSize            33554432
NIC          MonitorInterval        60
NIC          MonitorStatsParam      Frequency   0       ExpectedValue   100     ValueThreshold  100     AvgThreshold    40
NIC          MonitorTimeout         120
NIC          NumThreads             10
NIC          OfflineMonitorInterval 60
NIC          OfflineTimeout         300
NIC          OfflineWaitLimit       0
NIC          OnlineClass            -1
NIC          OnlinePriority         -1
NIC          OnlineRetryLimit       0
NIC          OnlineTimeout          300
NIC          OnlineWaitLimit        2
NIC          OpenTimeout            60
NIC          Operations             None
NIC          RestartLimit           0
NIC          ScriptClass            TS
NIC          ScriptPriority         0
NIC          SourceFile             /etc/VRTSvcs/conf/config/types.cf
NIC          SupportedActions       device.vfd  clearNICFaultInZone
NIC          ToleranceLimit         0
NIC          TypeOwner
 

  • I am surprised this works at all.  You should not specify 127.0.0.1 for NetworkHosts - this should either blank, in which cast the broadcast address is pinged and a response is required from a host other than itself, but it is recommended to populate this attribute with at least one IP that is always there, like a router or a DNS server (you should NOT specify host IPs of other nodes in the cluster).

    See extract from VCS Bundled agents guide:

     

    List of hosts on the network that are pinged to determine if the
    network connection is alive. You can use this attribute to help to save
    network capacity and reduce monitor time. Symantec recommends
    that you use the outgoing gateway routers for this value.
    Enter the IP address of the host, instead of the host name, to prevent
    the monitor from timing out. DNS causes the ping to hang. If more
    than one network host is listed, the monitor returns ONLINE if at least
    one of the hosts is alive.
    If an invalid network host address is specified or if there is mismatch
    in protocol of network host and Protocol attribute of the resource, the
    resource enters an UNKNOWN state. If you do not specify network
    hosts, the monitor tests the NIC by sending pings to the broadcast
    address on the NIC.

    Mike

  • Hello Symsonu,

     

    yes, loopback device should always be pingable.

    But, with setting loopback device as networkhost you break the whole concept of network monitoring in VCS.

    You need to set an IP adress that is on the same network to ensure that the network connection is healthy.

    If you set loopback address and network breaks VCS won't take corrective actions.

     

    Regading the fault itself.

    Did you check the syslog for any NIC events?

    Setting loopback device doesn't change anything if the NIC is down on the OS level.

    The loopback device is a virtual device that is NOT related to any NIC!

     

    eth4      Link encap:Ethernet  HWaddr 00:50:56:05:28:DD
              inet addr:192.168.50.101  Bcast:192.168.50.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:718589 errors:0 dropped:0 overruns:0 frame:0
              TX packets:735891 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:48433810 (46.1 Mb)  TX bytes:55185850 (52.6 Mb)

    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:25664 errors:0 dropped:0 overruns:0 frame:0
              TX packets:25664 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:4940075 (4.7 Mb)  TX bytes:4940075 (4.7 Mb)
     

     

    In the example above, if for example eth4 is down on the OS level VCS will take corrective action even though localhost is still reachable via ping command.

     

    My guess is that you will either see some link down or NIC down messages in the syslog during the time of the fault.

     

    Again, Networkhosts setting is used to detect network failures on a higher level.

    For example if your connection from the NIC to the switch is healthy the link will be up, but if the link from the switch to the remote host is broken your network is broken.

    Networkhosts is used to detect this kind of failures.

     

    Thanks and kind regards,

    Dan

  • Hello Symsonu,

     

    yes, loopback device should always be pingable.

    But, with setting loopback device as networkhost you break the whole concept of network monitoring in VCS.

    You need to set an IP adress that is on the same network to ensure that the network connection is healthy.

    If you set loopback address and network breaks VCS won't take corrective actions.

     

    Regading the fault itself.

    Did you check the syslog for any NIC events?

    Setting loopback device doesn't change anything if the NIC is down on the OS level.

    The loopback device is a virtual device that is NOT related to any NIC!

     

    eth4      Link encap:Ethernet  HWaddr 00:50:56:05:28:DD
              inet addr:192.168.50.101  Bcast:192.168.50.255  Mask:255.255.255.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:718589 errors:0 dropped:0 overruns:0 frame:0
              TX packets:735891 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:48433810 (46.1 Mb)  TX bytes:55185850 (52.6 Mb)

    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:25664 errors:0 dropped:0 overruns:0 frame:0
              TX packets:25664 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:4940075 (4.7 Mb)  TX bytes:4940075 (4.7 Mb)
     

     

    In the example above, if for example eth4 is down on the OS level VCS will take corrective action even though localhost is still reachable via ping command.

     

    My guess is that you will either see some link down or NIC down messages in the syslog during the time of the fault.

     

    Again, Networkhosts setting is used to detect network failures on a higher level.

    For example if your connection from the NIC to the switch is healthy the link will be up, but if the link from the switch to the remote host is broken your network is broken.

    Networkhosts is used to detect this kind of failures.

     

    Thanks and kind regards,

    Dan

  • symsonu,

    Had you actually read what Mike had posted, you might understand why if you're going to set the NetworkHosts attribute for that nic, it should be with something on the 10.41.78.138/27 network (ie: something that would go out via the bkup_nic interface, or that can at least be reached from there) - not the localhost IP on the localhost interface.

    The idea is that it's testing whether the NIC is up by checking it can ping/reach hosts on its network. Pinging the localhost ip (although "always reachable") is meaningless as it doesn't reflect the connectivity state of the device you're trying to monitor (traffic to 127.0.0.1 won't ever go via the backup nic) - hence Mike saying he was "surprised this works at all.  You should not specify 127.0.0.1 for NetworkHosts".

    Please read Mike's response, and set an appropriate value for NetworkHosts, or leave blank as suggested.

    regards,

    Grace

     

  • bkup_nic     NetworkHosts          global     127.0.0.1

     

    as per definition , then this  will always be pingable and should not fault.

    as loopback address is always reachable.

     

  • I am surprised this works at all.  You should not specify 127.0.0.1 for NetworkHosts - this should either blank, in which cast the broadcast address is pinged and a response is required from a host other than itself, but it is recommended to populate this attribute with at least one IP that is always there, like a router or a DNS server (you should NOT specify host IPs of other nodes in the cluster).

    See extract from VCS Bundled agents guide:

     

    List of hosts on the network that are pinged to determine if the
    network connection is alive. You can use this attribute to help to save
    network capacity and reduce monitor time. Symantec recommends
    that you use the outgoing gateway routers for this value.
    Enter the IP address of the host, instead of the host name, to prevent
    the monitor from timing out. DNS causes the ping to hang. If more
    than one network host is listed, the monitor returns ONLINE if at least
    one of the hosts is alive.
    If an invalid network host address is specified or if there is mismatch
    in protocol of network host and Protocol attribute of the resource, the
    resource enters an UNKNOWN state. If you do not specify network
    hosts, the monitor tests the NIC by sending pings to the broadcast
    address on the NIC.

    Mike