Forum Discussion

symsonu's avatar
symsonu
Level 6
12 years ago

need to find rootcause for service failure in veritas clusster

Hello Guys, I need to find the rootcause for the service failure in our veritas cluster. service groups didnot failover to other node. Below are the logs as i can see all this st...
  • mikebounds's avatar
    12 years ago

    OK - I have enough for sequence of events now:

     

    Solairs (mpathd) detects network outage:
    May 17 12:41:38 pk-ercoss1 in.mpathd[6024]: [ID 168056 daemon.error] All Interfaces in group pub_mnic have failed
     
    VCS detects network outage (with will be within 10 seconds of mpathd detection with default MonitorInterval = 10 for MultiNICB):
    2013/05/17 12:41:42 VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys pk-ercoss1
    
    2013/05/17 12:41:42 VCS ERROR V-16-1-10303 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is FAULTED (timed out) on sys pk-ercoss2
     
    Resource syb1_p1 in group Sybase1 faults on pk-ercoss2 so VCS faults the group:
    2013/05/17 12:41:50 VCS ERROR V-16-1-10303 Resource syb1_p1 (Owner: Unspecified, Group: Sybase1) is FAULTED (timed out) on sys pk-ercoss2
     
    Solairs (mpathd) detects network is fixed:
    May 17 12:41:57 pk-ercoss1 in.mpathd[6024]: [ID 620804 daemon.error] Successfully failed back to NIC oce9
     
    Group Sybase1 has tried to failover to pk-ercoss1, but IP cannot online as VCS has not detected network is fixed yet:
    2013/05/17 12:42:15 VCS ERROR V-16-10001-5004 (pk-ercoss1) IPMultiNICB:syb1_ip:online:Can not online. No interfaces available
     
    VCS detects network is fixed (with will be within 60 seconds of mpathd detection with default OfflineMonitorInterval = 60 for MultiNICB):
    2013/05/17 12:42:42 VCS INFO V-16-1-10299 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is online on pk-ercoss2 (Not initiated by VCS)
    
    2013/05/17 12:42:43 VCS INFO V-16-1-10299 Resource pub_mnic (Owner: Unspecified, Group: PubLan) is online on pk-ercoss1 (Not initiated by VCS)
     

     

    As MultiNICB is a persistent resource, its state changes to online when NIC is repaired so does not require administrative intervention.

    Mike