cancel
Showing results for 
Search instead for 
Did you mean: 

Issues with failover in VCS - Bonding

sparmar
Level 3

Hello All

I wonder if you kind people can help me again.

I have a 2 node cluster on Sun x4240 servers (Redhat 5.4 x86), which I have installed VCS v5.0.
There are only about 9 service groups created on them, spead out between them, which have been running fine for the last few months.

I have bonded the 2 network interfaces eth4 and eth8 into one called bond0, which is a low priority, and I have 2 seperate heartbeats as well.

We recently had an issue wth the network cards (Sun 10g niu) which we think may be an issue with the drivers, as they failed and reported NETDEV timeouts and fell off the network.

The issues are, none of the Service Groups failed over, as for some reason VSC thinks bond0 is still up.
I've replicated the error by pulling out the 2 network interfaces eth4 and eth8, which mimmicks the issue.



So there seems to be 2 issues here, the first one is with the Sun 10G cards with the Standard redhat driver, and the other is the SG not failing over as a result.


Has any one seem this before?


My Cluster set up is:
NIC resource is in each service group
Redhat 5.4 x86
VCS v5.0 RP3
Heartbeats on = eth1 and eth3 (100mb full duplex)
Heres the output from had -version:

Engine Version=5.0
PSTAMP: Veritas-5.0MP3-07/16/08-02:01:00

And the output from rpm aq | grep VRTSvcs

VRTSvcs-5.0.30.00-MP3_GENERIC
VRTSvcsvr-5.0.30.00-MP3_GENERIC
VRTSvcsag-5.0.30.00-MP3_RHEL5
VRTSvcsor-5.0.30.00-MP3_RHEL5
VRTSvcs-5.0.30.00-MP3_RHEL5
VRTSvcsdr-5.0.30.00-MP3_RHEL5
VRTSvcsmn-5.0.30.00-MP3_GENERIC

1 ACCEPTED SOLUTION

Accepted Solutions

Gaurav_S
Moderator
Moderator
   VIP    Certified

Have a look on VCS Bundled agents guide here.... you can get full details of MultiNICA & MultiNICB resources...

MultiNIC is basically used for redundancy of NIC Cards.. (somewhat similar to what solaris IPMP does) .... i.e even if  a NIC card fails, the services are unaffected...

 

Guide can be found here:

 

http://sfdoccentral.symantec.com

 

Gaurav

View solution in original post

8 REPLIES 8

Gaurav_S
Moderator
Moderator
   VIP    Certified

can you share the main.cf file ?  things I would like to know is ..

a) were there any recent changes made to setup ?

b) are you using Multinic or plain NIC resource ? If Multinic, which one A or B 

c) How is the dependency setup (main.cf would give answer to this)

d) was there any upgrade recently done ? (MP3 or RP ?) ... IF yes, did you missed to replace the types.cf file from /etc/VRTSvcs/conf/ directory ?

 

Gaurav

sparmar
Level 3

Hi Gaurav

 

Many thanks for the reply.

 

I'm using a plain NIC resource for each service group, which is set to never

Entries in the main.cf would be like below:

)
NIC NIC_oraprod (
Device = bond0
)

dependencies are similar to these below: (These are generic names)

IP_oraprod requires NIC_oraprod
LSNR_oraprod_lsnr requires IP_oraprod
LSNR_oraprod_lsnr requires ORA_oraprod
Mount_oraprod_u01 requires Vol_oraprod_vol1
Mount_oraprod_u02 requires Vol_oraprod_vol2
ORA_oraprod requires Mount_oraprod_u01
ORA_oraprod requires Mount_oraprod_u02
Vol_oraprod_vol1 requires DG_oraprod
Vol_oraprod_vol2 requires DG_oraprod

 

There hasn't been any upgrades or changes to the system since it was built, but I'm guessing we may have overlooked the failover test by removing the ethernet cables.

When we tested the failover of a service group when the system was built, we did a `ifdown bond0` which worked fine, also when we did a `service network stop` all was ok.

 

Any ideas on this issues would be greatly appreciated.

 

 

Thanks

 

Sparmar

 

 

 

Gaurav_S
Moderator
Moderator
   VIP    Certified

ok so line of resource dependency is like this..

NIC ---> IP ---> Listener

i.e Listener requires IP & IP requires NIC..

whether any of these 3 resource is marked as critical ?  If any of the critical resource fails or if any of the resource on which the critical resource depends or any of resource in the path of critical resource fails, service group should fail over...

In your case, if none of 3 resource is marked critical, it will not be initiating a entire failover of service group.. (again depends on value of managefaults & FaultPropagation)

I would suggest to try making one of resource (lets say IP) as critical & test again.... also for MangeFaults & Fautpropagation, have a look at VCS users guide... page no. 388

Guide can be found here:

 

http://sfdoccentral.symantec.com/sf/5.0MP3/solaris/pdf/vcs_users.pdf

 

 

Gaurav

sparmar
Level 3

Hi Gaurav

 

Thanks for the link, it's set to manage ALL, so I guess that's ok?

Also all the resources are set to critical-enabled.

We tried numerous things to get the Service Group to failover, but I looks like the bond0 interfaces are not playing ball properly.

The service group only fails over under the following scenarios:

Server shutdown

service network stop

ifdown bond0

 

And these are the scenarios which don't work:

ifdown eth4 and eth8

ifconfig eth4 down and eth8

 

Not sure where this leave us.  Is there any bugs reported in this situation?

Is there a way that I can set up another type of NIC, you mentioned MutinicA and B, what benefit would this give me and is it feasible?

 

 

Many thanks

sparmar

Gaurav_S
Moderator
Moderator
   VIP    Certified

I checked VCS release notes for 5.0MP3 but didn't see any known issue for bonded interface ...

As I asked before, did you get the correct types.cf file in place ? Can you compare the types.cf file in /etc/VRTSvcs/conf & /etc/VRTSvcs/conf/config directory & update here if there is any difference between them ?

 

Gaurav

sparmar
Level 3

Thanks

 

 

The types.cf files are the same.

Gaurav_S
Moderator
Moderator
   VIP    Certified

Have a look on VCS Bundled agents guide here.... you can get full details of MultiNICA & MultiNICB resources...

MultiNIC is basically used for redundancy of NIC Cards.. (somewhat similar to what solaris IPMP does) .... i.e even if  a NIC card fails, the services are unaffected...

 

Guide can be found here:

 

http://sfdoccentral.symantec.com

 

Gaurav

sparmar
Level 3

Many thanks for all your help.