cancel
Showing results for 
Search instead for 
Did you mean: 

Storage Unit Groups failing with Oracle Applications Backups

CJACUNAD
Level 3
Partner Accredited

 

Hi,

 

We got a problem with Storage Units Groups (with two differents media servers) when running Oracle Backups when one media server is down the automatic backup schedules selects correctly the only one available storage unit, but all default-application-backups (User Application Backups) starts trying to use the storage unit associated with the down media server "Reason: Media server is currently not connected to master server, Media server: md01sbcc01" and it's not able to select the other storage unit available in the storage unit group, we already down the media server timeout to 60 seconds but the process do not change the media server after that, and jobs remains queued and never changes it's media server or  the strage units.

Also this message repeat constantly: "Info nbjm (pid=12447) Waiting in NetBackup scheduler work queue on server bkserv01"

Our master and media servers are in 7.5.0.1 versions and this problem afects clients in differents versions 6.5.x, 7.1.x, etc,

What can we do in order to this works correctly without the need of remove the storage unit associated to the down media server ?

 

Thanks for your help

 

5 REPLIES 5

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Is there more than one Oracle-type policy for this client?

If there is and the policy name is not hard-coded in the backup script, NBU on master will assign the default-application-backups to the first policy that it finds for the client with Application Backup schedule.

This problem has been around for as long as I've known NBU: http://www.symantec.com/docs/TECH31742

CJACUNAD
Level 3
Partner Accredited

 

Hi,

 

Thanks for your answer but the problems it's not looks like related to an script issue, just because the Application Backup Schedule took the correct storage unit group, but the storage uit group is not working as expected, the storage unit group is not performing any failover or load balancing at all, all child process get only the firs storage unit in the list and not switch over the avaliable storage unit in the group, even if there is no runnig process over the available storage unit

The log show this

 

20/2012 00:01:06 - Info nbjm (pid=6986) starting backup job (jobid=779731) for client bkp-dl585-1, policy hot_saludarp, schedule Default-Application-Backup

04/20/2012 00:01:06 - Info nbjm (pid=6986) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=779731, request id:{D5F6A1BA-8AA5-11E1-A219-802CF4265992})

04/20/2012 00:01:06 - requesting resource MediaServersCC01-D2D-BasesDatos-3

04/20/2012 00:01:06 - requesting resource bkserv01.NBU_CLIENT.MAXJOBS.bkp-dl585-1

04/20/2012 00:01:06 - requesting resource bkserv01.NBU_POLICY.MAXJOBS.hot_saludarp

04/20/2012 00:01:11 - awaiting resource MediaServersCC01-D2D-BasesDatos-3. Waiting for resources. 

          Reason: Media server is currently not connected to master server, Media server: md01sbcc01, 

          Robot Type(Number): NONE(N/A), Media ID: N/A, Drive Name: N/A, 

          Volume Pool: bkp_hots, Storage Unit: D2D-BasesDatos-5, Drive Scan Host: N/A, 

          Disk Pool: D2D-BasesDatos-5, Disk Volume: HPLSU7 

04/20/2012 00:03:06 - Info nbjm (pid=6986) Waiting in NetBackup scheduler work queue on server bkserv01

04/20/2012 00:05:06 - Info nbjm (pid=6986) Waiting in NetBackup scheduler work queue on server bkserv01

04/20/2012 00:07:06 - Info nbjm (pid=6986) Waiting in NetBackup scheduler work queue on server bkserv01

04/20/2012 00:09:06 - Info nbjm (pid=6986) Waiting in NetBackup scheduler work queue on server bkserv01

04/20/2012 00:11:06 - Info nbjm (pid=6986) Waiting in NetBackup scheduler work queue on server bkserv01

04/20/2012 00:13:06 - Info nbjm (pid=6986) Waiting in NetBackup scheduler work queue on server bkserv01

 

The last message is repeating indefinitly until manually cancel, this kind of problem not occurs with Standart backups child process, and just only the parent process automatic backup select the available storage unit in the group.   

 

Thanks

 

watsons
Level 6

What is your storage unit selection of this STU Group?

There is Prioritzed, Failover, Round Robin & MediaServer load balancing.

In your situation, change it to Failover if it's currently set to the other 3. A Failover option will force the job to look for a STU that is not down or out of media.

If that still does not work, consider logging a call to Symantec to report this bug :)

CJACUNAD
Level 3
Partner Accredited

We tried with round robin and media server load balancing, because the original idea is balance the load between media servers and according to the documentation both round robin and media server load balancing check before if the storage unit is available just like failover does, but just for user directed backups something is wrong with that and do not performs any load balancing or failover.

Will_Restore
Level 6

Check your schedules - is Override policy storage selection is being used?