02-13-2015 03:05 AM
What is the expected behaviour when:
Whenever NBU master has been restarted in our environment (a VCS cluster), we see:
a. All jobs are killed so they have to restart from scratch.
b. Jobs are not killed cleanly so in particular for SAP backups, the Oracle database is left in backup mode, meaning when the backup is restarted it fails.
So I am hoping there is someway to cleanly stop the NBU master (preferably integrated into the VCS cluster) so that:
I did look in the "admin volume 1" guide and this says when shutting down NBU "make sure that no jobs are running", but this will never be the case in our environment, so for example there are 32 active jobs at the moment and we could potentially have over 100 active jobs, and in the docs, I coudn't find any guidance as to what you do if you have jobs running.
Mike
02-13-2015 03:43 AM
So I am hoping there is someway to cleanly stop the NBU master (preferably integrated into the VCS cluster) so that:
This cannot happen - the daemons responsible for backups on the Master have died. You could enable checkpointing in your policies, so that on restart backup jobs will start off from the last check point - that is an option
I do not know about database backups - sure someone else will pitch in here.
02-13-2015 05:26 AM
Can you elaborate on:
This cannot happen - the daemons responsible for backups on the Master have died
It is my understanding that NBU master schedules the backup, but the actual backup runs on the Media server, so once the media server has actually started writing, my understanding is that the NBU masters role is just to record the files backed up in the NBU catalog.
Looking at https://www-secure.symantec.com/connect/sites/default/files/NBU%207.x%20process%20flow%20QRC_1.pdf this says:
11. .bpbkar sends information about the backup image to bpbrm, which forwards it to bpdbm on the master server. This stream of metadata is sent throughout the backup and stored in the master server’s Image database.
12. When mounting and positioning of the media in the drive, or of the disk to be used, have been accomplished, the client backup process, bpbkar, will begin sending backup data to the bptm child process on the media server system. The bptm child process receives the image and stores it block by block into a shared memory segment on the media server. The parent bptm process retrieves the image from shared memory and directs it block by block to the allocated storage media.
13. When the backup has been completed bptm will notify bpbrm, which in turn will notify the Job Manager nbjm that the job has finished bptm will also notify nbjm that it is done with the media.
So this sounds like the filelist of files backed up is sent to NBU master in step 1 before the backup starts, and in step 12 there is no communication with the NBU master, except I guess progress updates, so I don't understand why the NBU media cannot carry on writing if the NBU master is not available for a short period of time.
Mike
02-13-2015 05:58 AM
Can you elaborate on:
This cannot happen - the daemons responsible for backups on the Master have died
The connection between bpbrm to bpdbm and bptm and nbjm has gone, can't just re-establish that on the fly.
bpbrm/bptm is likely to then end on the media server (or end up in a hung state) as it cannot communicate any more.
02-13-2015 06:07 AM
You can start/stop Netbackup in a VCS cluster using
hares -offline nbu_server -sys {node}
hares -online nbu_server -sys {node}
if you wan to take all ressoucres offline , use
hagrp -offline nbu_grp -sys {node}
hagrp -online nbu_grp -sys {node}
There is no method to save runnign jobs across Netbackup restart. You can suspend file system backup across restart and then resume them. But this is not possible for database backupps.
02-13-2015 07:47 AM
Thanks for replies.
So to clarify:
If NBU media looses connection to NBU master then:
I know you can stop NBU master by using hares - this is what I did, and from a ps it looked as though the VCS NBU agent using bp.kill, but can't find any documentation to confirm what VCS NBU agent does, or whether it is configurable (like VCS Oracle agent where you can choose shutdown option to do a clean or force shutdown)
Mike
02-13-2015 10:26 AM
1. There is a timeout, but if there is a prolonged outage the backup will fail. bpbrm on the media server has to send metadata to bpdbm on the master. Any prolonged delay and the backups will fail
2. See above = backups will fail if prolonged outage. A few seconds network outahe and the backups may survive
3. Yes I believe so - never tried it with restarting a Master server NBU daemons before though
4. Yes
5. According to a previous post on here then yes.
6. If you shutdown a master server or interrupt comms between master and media then bptm, bpbrm on media server MAY hang.
VCS NBU Agent is responsible for monitoring NBU processes AND for starting/stopping NBU Cluster resources, such as the shared/floating disk that the cluster nodes will both need to use when one of them is active, the shared/floating Cluster IP address etc
02-16-2015 04:48 AM
As for your db backups, I guess you could have a pre script check, to check and modify the db accordingly, but that also comes with its own issues. Would be interesting to know if when the db backup craters if theres scope for a post script. Jim
05-12-2015 01:56 AM
Re-visiting this post ....
I found this post in an O-L-D article:
https://www-secure.symantec.com/connect/articles/netbackup-and-vcs#comment-2506161
Probably the best answer to what should happen to running jobs....