Solved: Cluster Failover

H_Sharma · ‎02-15-2015

Hello Experts,

We were experiencing issue in one of our master server i.e Win2008. So we moved the netbackup services to Node1. backup ran fine for 2-3 hours and later they were giving the error code 98 on the Node1. So finally we moved the netbackup services to node2 that was actual and backups were fine.

(1) There were the below event logs on node1

TLD(1) [7808] unable to read exit status from tldcd: Error number: (10060), stat = -3

TLD(1) [12200] Drive 7 (device 5) has not become ready. Last status: The requested resource is in use.

TLD(1) [13832] Drive 2 (device 2) has not become ready. Last status: The requested resource is in use.

Request for media ID L00087 is being rejected because mount requests are disabled (reason = robotic daemon going to DOWN state)

Pls help.

(2) Whenever we move netbackup services from Node1 to Node2 or vice versa we are losing the activty job logs. I mean if 10 jobs are running on node 1 and we have to move netbackup services to Node2. On node 2 we could see only the new jobs not the 10 jobs that were running on node1 same is happening on other server as well.

Pls help

revarooo · ‎02-15-2015

As per your previous thread about db/media/errors - you had reservation issues - something else was using the drive.

nbrbutil -dump will confirm if NetBackup has a reservation on the drive or tapes already.

If not, power cycle the drives and also check to ensure no other servers not using NetBackup have the drives zoned.

As for you jobs not passing across the nodes, what version of NBU is this?

Is netbackup\db\jobs on the shared disk that floats between the nodes.

The entire netbackup\db should be on the shared disk!!!!

View solution in original post

Marianne · ‎02-15-2015

You need to troubleshoot device errors while master is on node 2. Sounds to me like lack of SCSI3 reservation which is needed in a cluster environment. Probably node 1 still holding reservations if backups were active during failover. Read up about SCSI-3 reservation in Admin Guide 1. You can also check vmoprcmd command on how to release reservation from cmd. Another possibility is incorrect device mappings due to lack of persistent binding. I agree with revaroo. Jobs database should be on shared storage that will failover along with all other db's. Sounds like you need a health check on your cluster install.

Handy NetBackup Links

View solution in original post

revarooo · ‎02-15-2015

As per your previous thread about db/media/errors - you had reservation issues - something else was using the drive.

nbrbutil -dump will confirm if NetBackup has a reservation on the drive or tapes already.

If not, power cycle the drives and also check to ensure no other servers not using NetBackup have the drives zoned.

As for you jobs not passing across the nodes, what version of NBU is this?

Is netbackup\db\jobs on the shared disk that floats between the nodes.

The entire netbackup\db should be on the shared disk!!!!

Marianne · ‎02-15-2015

You need to troubleshoot device errors while master is on node 2. Sounds to me like lack of SCSI3 reservation which is needed in a cluster environment. Probably node 1 still holding reservations if backups were active during failover. Read up about SCSI-3 reservation in Admin Guide 1. You can also check vmoprcmd command on how to release reservation from cmd. Another possibility is incorrect device mappings due to lack of persistent binding. I agree with revaroo. Jobs database should be on shared storage that will failover along with all other db's. Sounds like you need a health check on your cluster install.

Handy NetBackup Links

VOX

Cluster Failover