Forum Discussion

sparmar's avatar
sparmar
Level 3
14 years ago

Issues with failover in VCS - Bonding

Hello All

I wonder if you kind people can help me again.

I have a 2 node cluster on Sun x4240 servers (Redhat 5.4 x86), which I have installed VCS v5.0.
There are only about 9 service groups created on them, spead out between them, which have been running fine for the last few months.

I have bonded the 2 network interfaces eth4 and eth8 into one called bond0, which is a low priority, and I have 2 seperate heartbeats as well.

We recently had an issue wth the network cards (Sun 10g niu) which we think may be an issue with the drivers, as they failed and reported NETDEV timeouts and fell off the network.

The issues are, none of the Service Groups failed over, as for some reason VSC thinks bond0 is still up.
I've replicated the error by pulling out the 2 network interfaces eth4 and eth8, which mimmicks the issue.



So there seems to be 2 issues here, the first one is with the Sun 10G cards with the Standard redhat driver, and the other is the SG not failing over as a result.


Has any one seem this before?


My Cluster set up is:
NIC resource is in each service group
Redhat 5.4 x86
VCS v5.0 RP3
Heartbeats on = eth1 and eth3 (100mb full duplex)
Heres the output from had -version:

Engine Version=5.0
PSTAMP: Veritas-5.0MP3-07/16/08-02:01:00

And the output from rpm aq | grep VRTSvcs

VRTSvcs-5.0.30.00-MP3_GENERIC
VRTSvcsvr-5.0.30.00-MP3_GENERIC
VRTSvcsag-5.0.30.00-MP3_RHEL5
VRTSvcsor-5.0.30.00-MP3_RHEL5
VRTSvcs-5.0.30.00-MP3_RHEL5
VRTSvcsdr-5.0.30.00-MP3_RHEL5
VRTSvcsmn-5.0.30.00-MP3_GENERIC

  • Have a look on VCS Bundled agents guide here.... you can get full details of MultiNICA & MultiNICB resources...

    MultiNIC is basically used for redundancy of NIC Cards.. (somewhat similar to what solaris IPMP does) .... i.e even if  a NIC card fails, the services are unaffected...

     

    Guide can be found here:

     

    http://sfdoccentral.symantec.com

     

    Gaurav

  • can you share the main.cf file ?  things I would like to know is ..

    a) were there any recent changes made to setup ?

    b) are you using Multinic or plain NIC resource ? If Multinic, which one A or B 

    c) How is the dependency setup (main.cf would give answer to this)

    d) was there any upgrade recently done ? (MP3 or RP ?) ... IF yes, did you missed to replace the types.cf file from /etc/VRTSvcs/conf/ directory ?

     

    Gaurav

  • Hi Gaurav

     

    Many thanks for the reply.

     

    I'm using a plain NIC resource for each service group, which is set to never

    Entries in the main.cf would be like below:

    )
    NIC NIC_oraprod (
    Device = bond0
    )

    dependencies are similar to these below: (These are generic names)

    IP_oraprod requires NIC_oraprod
    LSNR_oraprod_lsnr requires IP_oraprod
    LSNR_oraprod_lsnr requires ORA_oraprod
    Mount_oraprod_u01 requires Vol_oraprod_vol1
    Mount_oraprod_u02 requires Vol_oraprod_vol2
    ORA_oraprod requires Mount_oraprod_u01
    ORA_oraprod requires Mount_oraprod_u02
    Vol_oraprod_vol1 requires DG_oraprod
    Vol_oraprod_vol2 requires DG_oraprod

     

    There hasn't been any upgrades or changes to the system since it was built, but I'm guessing we may have overlooked the failover test by removing the ethernet cables.

    When we tested the failover of a service group when the system was built, we did a `ifdown bond0` which worked fine, also when we did a `service network stop` all was ok.

     

    Any ideas on this issues would be greatly appreciated.

     

     

    Thanks

     

    Sparmar

     

     

     

  • ok so line of resource dependency is like this..

    NIC ---> IP ---> Listener

    i.e Listener requires IP & IP requires NIC..

    whether any of these 3 resource is marked as critical ?  If any of the critical resource fails or if any of the resource on which the critical resource depends or any of resource in the path of critical resource fails, service group should fail over...

    In your case, if none of 3 resource is marked critical, it will not be initiating a entire failover of service group.. (again depends on value of managefaults & FaultPropagation)

    I would suggest to try making one of resource (lets say IP) as critical & test again.... also for MangeFaults & Fautpropagation, have a look at VCS users guide... page no. 388

    Guide can be found here:

     

    http://sfdoccentral.symantec.com/sf/5.0MP3/solaris/pdf/vcs_users.pdf

     

     

    Gaurav

  • Hi Gaurav

     

    Thanks for the link, it's set to manage ALL, so I guess that's ok?

    Also all the resources are set to critical-enabled.

    We tried numerous things to get the Service Group to failover, but I looks like the bond0 interfaces are not playing ball properly.

    The service group only fails over under the following scenarios:

    Server shutdown

    service network stop

    ifdown bond0

     

    And these are the scenarios which don't work:

    ifdown eth4 and eth8

    ifconfig eth4 down and eth8

     

    Not sure where this leave us.  Is there any bugs reported in this situation?

    Is there a way that I can set up another type of NIC, you mentioned MutinicA and B, what benefit would this give me and is it feasible?

     

     

    Many thanks

    sparmar

  • I checked VCS release notes for 5.0MP3 but didn't see any known issue for bonded interface ...

    As I asked before, did you get the correct types.cf file in place ? Can you compare the types.cf file in /etc/VRTSvcs/conf & /etc/VRTSvcs/conf/config directory & update here if there is any difference between them ?

     

    Gaurav

  • Have a look on VCS Bundled agents guide here.... you can get full details of MultiNICA & MultiNICB resources...

    MultiNIC is basically used for redundancy of NIC Cards.. (somewhat similar to what solaris IPMP does) .... i.e even if  a NIC card fails, the services are unaffected...

     

    Guide can be found here:

     

    http://sfdoccentral.symantec.com

     

    Gaurav