04-25-2014 12:11 AM
Hi,
we have several Basic Disk DSSU connected to a media server (all systems Windows 2008 R2, NBU 7.5.0.1). Everything worked fine for years, now suddenly we are getting error 129 on 3/4 of the DSSUs. The disks are indeed full of NBU images, the relocation jobs run periodically without error, but find no images to relocate. Manual relocation also relocates nothing. All DSSUs are on separate physical disks.
bpimmedia show that NBU knows that there are images on the disks.
Any ideas what to check?
Solved! Go to Solution.
04-25-2014 02:16 AM
NBU 7.5. up to 7.5.0.2 had various bugs and Symantec strongly recommends to be on 7.5.0.3 or later.
NBU 7.5.0.6 is the latest. No guarantee that it will fix your issue, but worth a try.
What is the HWM and LWM on the DSUs?
Do you have admin, bpdm and bptm log folders on your media server?
bpdm log should tell us if disk cleanup is attempted.
admin log should give us info about duplication/relocation job.
The assumption is that all existing images on disk have been duplicated and therefore nothing left to do?
These TNs explains how it should be working:
Disk Staging Relocation Behavior: http://www.symantec.com/docs/TECH44719
Disk Staging Cleanup Behavior: http://www.symantec.com/docs/TECH66149
04-25-2014 02:53 AM
How long since anything was duplicated? - run a verify query on the catalog section against copy2 to see what the last ones were.
Have you had a change in Anti Virus or similar that could be locking things?
Having said that .. when i have seen issues it was due to a locked process - so if you get chance reboot the Master and Media Servers and then see if it starts working again
04-25-2014 11:27 AM
TRY this :-
nbdelete main function is to remove expired fragments from disk units. It can also be used to purge those image fragments from the database when the cleaning fails for one or another reason.
To force a cleanup of all diskpools, and remove references in database, run
<install path>\veritas\netbackup\bin\admincmd\nbdelete -allvolumes -force
The command itself sounds brutal, but is considered a safe command due to it works only on expired images.
There are also switches to make it run only on designated storage units.
nbdelete -h to see all other switches.
04-25-2014 02:16 AM
NBU 7.5. up to 7.5.0.2 had various bugs and Symantec strongly recommends to be on 7.5.0.3 or later.
NBU 7.5.0.6 is the latest. No guarantee that it will fix your issue, but worth a try.
What is the HWM and LWM on the DSUs?
Do you have admin, bpdm and bptm log folders on your media server?
bpdm log should tell us if disk cleanup is attempted.
admin log should give us info about duplication/relocation job.
The assumption is that all existing images on disk have been duplicated and therefore nothing left to do?
These TNs explains how it should be working:
Disk Staging Relocation Behavior: http://www.symantec.com/docs/TECH44719
Disk Staging Cleanup Behavior: http://www.symantec.com/docs/TECH66149
04-25-2014 02:46 AM
04-25-2014 02:53 AM
How long since anything was duplicated? - run a verify query on the catalog section against copy2 to see what the last ones were.
Have you had a change in Anti Virus or similar that could be locking things?
Having said that .. when i have seen issues it was due to a locked process - so if you get chance reboot the Master and Media Servers and then see if it starts working again
04-25-2014 03:51 AM
How long since anything was duplicated? - run a verify query on the catalog section against copy2 to see what the last ones were.
Hi,
not really sure what to make of this information, since only one media server and only 3/4 DSSUs on that server have the problem. I don't know how to filter that in the Catalog Query.
In the GUI, I ran a Catalog Query looking for primary copies on disks, noted one image which had its primary copy (copy 1) on one affected disk. Result of bpimagelist for that backup id:
c:\Program Files\Veritas\NetBackup\bin\admincmd>bpimagelist -backupid BWS0119_1397841569 IMAGE BWS0119 0 0 9 BWS0119_1397841569 HH_VMware 40 *NULL* root Cumulative-Inc 4 3 1397841569 156 1400519969 0 0 1553178 1187 1 2 0 HH_VMware_1397841569_INCR.f *NULL* *NULL* 0 2 0 0 0 *NULL* 0 0 1 0 0 1397286560 1397286560 *NULL* 0 0 0 *NULL* 589438 1 0 153140 0 0 *NULL* *NULL* 0 1397840687 0 0 *NULL* *NULL* 0 0 0 0 HISTO 0 0 0 0 0 0 0 0 0 0 FRAG 2 -1 150 0 2 6 51 0624L5 bws0112.naval.dom 65536 1347867 1397669233 2 0 *NULL* 1400519969 0 65539 0 0 0 1 0 1397843389 1 1 *NULL* *NULL* 0 0 FRAG 2 1 1553029 0 2 6 50 0624L5 bws0112.naval.dom 65536 1323598 1397669233 2 0 *NULL* 1400519969 0 65539 0 0 0 1 0 1397843389 1 1 *NULL* *NULL* 0 0
Doesn't that tell me that the only copy is copy2, on medium 0624L5 (a tape)?
I rebooted both master and media server, still the same behaviour.
I am not aware of any changes in our environment - and if so, I think they would affect all DSSUs and not just a couple.
04-25-2014 03:54 AM
These are for 12th and18th april - when was the last time a backup worked to this storage unit?
Run the verify for a recent period against Copy 1 and then change to copy 2 to see if there are the same number of tape copies - if so everything has been duplicated to tape - if not there is an issue
04-25-2014 04:25 AM
OK, I don't yet know why, but suddenly the affected DSSUs have been cleaned out (no more images on them at all). I once again stopped all processes on the medai server in that timeframe, so maybe it was a hung process after all. The Catalog query for primary copies on the affected disks show no result anymore, too.
I'll update you on monday, after the big backup jobs this weekend.
Thank you both so far.
04-25-2014 04:34 AM
Maybe when you restarted NBU after creating log folders it kicked in outstanding disk cleanups.
But I would not expect disk to be 'cleaned out'! Just to LWM.
What is logged in bpdm log?
Please copy to bpdm.txt and upload as File attachment.
04-25-2014 04:39 AM
Seen that many times - once they start to go only a reboot will stop them - it is a known bug so you do need to upgrade as soon as possible - will see if i can dig out the tech note
04-25-2014 06:41 AM
Cannot find the note but found my support thread - it was going back a while and at an earlier version of NBU
Still worth upgrading anyway but i have the feeling that it was being caused by a lack of performance on the Master server .. either memory or pagepool memory, just overloaded
A NBU upgrade (earlier to your version!) and a relocation of the Master to a new server prevented the issue occurring again - we also did all of the anti virus exclusions for NBU at the time
To many changes in one go i know but based on your version maybe look at performance and Anti virus first
Hope this helps
04-25-2014 11:27 AM
TRY this :-
nbdelete main function is to remove expired fragments from disk units. It can also be used to purge those image fragments from the database when the cleaning fails for one or another reason.
To force a cleanup of all diskpools, and remove references in database, run
<install path>\veritas\netbackup\bin\admincmd\nbdelete -allvolumes -force
The command itself sounds brutal, but is considered a safe command due to it works only on expired images.
There are also switches to make it run only on designated storage units.
nbdelete -h to see all other switches.
04-28-2014 06:46 AM
Everything seems back to normal now. Backups are stored on the DSSUs, relocation jobs run without problems.
I did issue the "nbdelete -allvolumes -force" while poking around, maybe that was the reason the disks were cleaned out completely.
An upgrade to 7.6 is in the planning stages.
Thank you all.