VCS 5.1 SP1, Netbackup 7.1.0.2 & Linux RHEL 5.6

Question

Hi All,
&nbsp;
I am new to this forum. Need your help with one of the issues i m facingat a customer site.
I am having a setup of&nbsp;3 clustered nodes. 2 cluster at one location &amp; 1 cluster at a remote location. The failover between the 2 nodes is a normal failover &amp; with the remote is GCO.
The OS used on this node is RHEL 5.6.
VCS version is 5.1 SP1 &amp; netbackup used is 7.1.0.2
The Nic on the&nbsp;2 nodes is a bonded NIC.
We are trying to test a scenario wherein my Network links go down. So we plug out both the Production cables but the failover is not happening.&nbsp;the&nbsp;ethtool is showing that there is no connectivity to the ethernet adapters but if u do a ifconfig -a &nbsp;the bond is still showing as UP.
Also if i do a shutdown on one of the nodes the Failover is happening from one node to another node without any issues.
If i try to ifdown on the bonded NIC the failover is initiated but the NBU services are not getting killed even after doing a bp.kill_all so the Nbu_server group is not going down on the active node &amp; the failover is not happening.
Does anyone have the same setup &amp; have faced any isues like this.

aryan3051984 · Accepted Answer

Hi All,
thanks to everyone to revert to this post. this issue is resolved by Symantec.
the backline has created an engineering binary for us &amp; it is working now fine.
We tested both the scenarios by removing the lan cable as well as by bringing down the bond &amp; the failover is happening without any issues.

eric_hennessey1 · Answer

How do you have your NIC resource configured in VCS?
The NIC agent monitors an interface not just for up/down state, but for active traffic. Just unplugging the cable(s) from a NIC typically won't change its state to DOWN.
Pasting in the segment from main.cf where your NIC is configured might help.

aryan3051984 · Answer

include "OracleASMTypes.cf"
include "types.cf"
include "Db2udbTypes.cf"
include "HTCTypes.cf"
include "NetBackupTypes.cf"
include "OracleTypes.cf"
include "SybaseTypes.cf"
cluster clusterA(
UserNames = { admin = gNOgNInKOjOOmWOiNL }
ClusterAddress = "10.XX.XXX.XXX"
Administrators = { admin }
)
remotecluster clusterB(
ClusterAddress = "10.XX.XXX.XXX"
)
heartbeat Icmp (
ClusterList = { clusterB }
Arguments @clusterB = { "10.XX.XXX.XXX" }
)
system systemA (
)
system systemB (
)
group ClusterService (
SystemList = { systemA = 0, systemB = 1 }
AutoStartList = { systemA, systemB }
OnlineRetryLimit = 3
OnlineRetryInterval = 120
)
Application wac (
StartProgram = "/opt/VRTSvcs/bin/wacstart"
StopProgram = "/opt/VRTSvcs/bin/wacstop"
MonitorProcesses = { "/opt/VRTSvcs/bin/wac" }
RestartLimit = 3
)
IP webip (
Device = bond0
Address = "10.XX.XXX.XXX"
NetMask = "255.255.255.0"
)
NIC backup_nic (
Device = eth2
NetworkHosts @systemA = { "10.XX.XXX.XXX" }
NetworkHosts @systemB = { "10.XX.XXX.XXX" }
)
NIC public_nic (
Device = bond0
NetworkHosts @systemA = { "10.XX.XXX.X" }
NetworkHosts @systemB = { "10.XX.XXX.X" }
)
wac requires webip
webip requires public_nic
　
// resource dependency tree
//
// group ClusterService
// {
// NIC backup_nic
// Application wac
// {
// IP webip
// {
// NIC public_nic
// }
// }
// }
　
group nbu_group (
SystemList = { systemA = 0, systemB = 1 }
Frozen = 1
AutoStart = 0
ClusterList = { clusterA = 0, clusterB = 1 }
Authority = 1
AutoStartList = { systemA, systemB }
)
DiskGroup nbu_dg (
DiskGroup = nbudg
)
HTC horcm0 (
Critical = 0
GroupName = MAS_HUR
)
IP nbu_ip (
Device @systemA = bond0
Device @systemB = bond0
Address = "10.XX.XXX.XXX"
NetMask = "255.255.255.0"
)
IP nbubk_ip (
Device @systemA = eth2
Device @systemB = eth2
Address = "10.XX.XXX.XXX"
NetMask = "255.255.252.0"
)
Mount nbu_mount (
MountPoint = "/opt/VRTSnbu"
BlockDevice = "/dev/vx/dsk/nbudg/nbuvol"
FSType = vxfs
FsckOpt = "-y"
)
NetBackup nbu_server (
Critical = 0
ResourceOwner = unknown
ServerName = NBU_Server
ServerType = NBUMaster
MonScript = NONE
RSPFile = "/usr/openv/netbackup/bin/cluster/NBU_RSP"
GroupName = nbu_group
)
Proxy p_backup_nic (
TargetResName = backup_nic
)
Proxy p_public_nic (
TargetResName = public_nic
)
Volume nbu_vol (
DiskGroup = nbudg
Volume = nbuvol
)
nbu_dg requires horcm0
nbu_ip requires p_public_nic
nbu_mount requires nbu_vol
nbu_server requires nbu_ip
nbu_server requires nbu_mount
nbu_server requires nbubk_ip
nbu_vol requires nbu_dg
nbubk_ip requires p_backup_nic
　
// resource dependency tree
//
// group nbu_group
// {
// NetBackup nbu_server
// {
// IP nbu_ip
// {
// Proxy p_public_nic
// }
// Mount nbu_mount
// {
// Volume nbu_vol
// {
// DiskGroup nbu_dg
// {
// HTC horcm0
// }
// }
// }
// IP nbubk_ip
// {
// Proxy p_backup_nic
// }
// }
// }

marianne · Answer

"ifconfig -a &nbsp;the bond is still showing as UP"
VCS cannot and will not register a fault as long as ifconfig reports interface as UP.
You could possibly add PingHostList and/or UseConnectionStatus as additional tests.

aryan3051984 · Answer

Hi Marianne,
I am actually trying to achieve is that if i remove my LAN cable the VCS should fault &amp; nbu_group to move to another node. Bit i am unable to achieve that.
Also when i try to do a ethdown bond0 it tries to failover but it is not able to get all the nbu processes down on the active node. so the failover is not happening. We even tried a kill -9 waiting for an hour but it is not working. Only thing left was shut down &amp; then bring the resources manually up on another node.
We have a symantec case open. They are trying to build a lab test for the same. No outcome since last one week &amp; we have a deadline of march end to handover this project.

marianne · Answer

I see 2 issues here (neither of them VCS problem):

The Bonded NIC that stays UP
	NBU that does not go down

For the NBU issue, I would test as follows :

Stop NBU manually using 'netbackup stop'. Since NBU resource is not marked as Critical, it will not cause a failover. Monitor from another window with bpps -x every couple of seconds to see if processes are terminating. Check how long it takes for all processes to go down. If there were active backups, it will take longer to go down.
	Start NBU again. Use VCS to offline the nbu_server resource.
		Use bpps -x every couple of seconds to check if processes are going down.

It is normal for NBU to take quite a while to go down, but 1 hour is certainly excessive. We normally increase the offline timeout based on the time it takes 'netbackup stop' to stop all the processes.
	As a matter of interest, which processes are not terminating?

Forum Discussion

VCS 5.1 SP1, Netbackup 7.1.0.2 & Linux RHEL 5.6

7 Replies

Related Content

Netbackup 9.1.0.1 Redhat Linux 7 -- in place OS upgrade

Unable to install Netbackup 9.1.0.1 client on Linux

Re: How to know the install location of Netbackup Software on Linux

Installing netbackup client on linux

Netbackup 7.1.0.2 How do i eject tapes ??

Recent Discussions

Configure two Mount type resources of nfs FStype attribute using the same share

order

key registration and reservation

Verifying that primary and dr clusters replication is synced

vcs can create logical nic