Vault failing Netbackup 7.1
Hi,
I use a vault to eject tapes at the end of the week, it failed last week and I ejected the tapes manually, I have restarted everything including the tape library, but the vault has failed this week as well. I am logging the issue with the tape library supplier as well asI'm not sure where the fault is, but is anyone able to give me any help?
The details from the jobs look like this:
26/07/2013 13:17:10 - vault waiting for global lock
26/07/2013 13:17:10 - requesting resource appnbu-v.NBVAULT.MAXJOBS
26/07/2013 13:17:22 - granted resource appnbu-v.NBVAULT.MAXJOBS
26/07/2013 13:17:24 - vault global lock acquired
26/07/2013 13:17:24 - vault waiting for session ID lock
26/07/2013 13:17:24 - requesting resource appnbu-v.VAULT_CREATE_SESSION_ID.LOCK_TLD(0)_ONSITETAPES
26/07/2013 13:17:28 - granted resource appnbu-v.VAULT_CREATE_SESSION_ID.LOCK_TLD(0)_ONSITETAPES
26/07/2013 13:17:29 - vault session ID lock acquired
26/07/2013 13:17:29 - vault session ID lock released
26/07/2013 13:32:53 - duplication skipped
26/07/2013 13:32:53 - vault waiting for assign slot lock
26/07/2013 13:32:53 - requesting resource appnbu-v.VAULT_ASSIGN_SLOT.LOCK_TLD(0)_OnsiteTapes
26/07/2013 13:32:58 - granted resource appnbu-v.VAULT_ASSIGN_SLOT.LOCK_TLD(0)_OnsiteTapes
26/07/2013 13:32:58 - vault assign slot lock acquired
26/07/2013 13:33:00 - vault assign slot lock released
26/07/2013 13:33:00 - catalog backup skipped
26/07/2013 13:33:00 - vault waiting for assign slot lock
26/07/2013 13:33:00 - requesting resource appnbu-v.VAULT_ASSIGN_SLOT.LOCK_TLD(0)_OnsiteTapes
26/07/2013 13:33:10 - granted resource appnbu-v.VAULT_ASSIGN_SLOT.LOCK_TLD(0)_OnsiteTapes
26/07/2013 13:33:11 - vault assign slot lock acquired
26/07/2013 13:33:11 - vault assign slot lock released
26/07/2013 13:33:12 - before eject, waiting for media to be unmounted; sleeping for 180 seconds
26/07/2013 13:36:12 - starting eject operation
26/07/2013 13:36:12 - begin Eject/Report
26/07/2013 13:36:12 - connecting
26/07/2013 13:36:12 - connected; connect time: 00:00:00
26/07/2013 13:36:12 - vault waiting for eject lock
26/07/2013 13:36:12 - requesting resource appnbu-v.VAULT_EJECT.LOCK_0
26/07/2013 13:36:17 - granted resource appnbu-v.VAULT_EJECT.LOCK_0
26/07/2013 13:36:18 - vault eject lock acquired
26/07/2013 13:36:18 - suspend media for this session: failed to suspend 1 of 52 media at eject time
26/07/2013 13:36:18 - starting eject of 52 media
26/07/2013 13:40:34 - eject complete with status 287. 0 of 52 media ejected
26/07/2013 13:40:34 - vault eject lock released
26/07/2013 13:58:35 - vault global lock released
vault eject failed(287)
And in the problem reports I see this:
26/07/2013 | 13:32:37 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@main Vault Session FAILED [PRFL=0/OnsiteTapes/WeeklyEject SID=331 JID=2518878 EC=287] | |||
26/07/2013 | 13:32:37 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@main FAILed NB_EC=287 NB_MSG=vault eject failed | |||
26/07/2013 | 13:36:18 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@SuspendMedia^332: Suspend FAILed MEDIA=000726, HOST=appnbu-v. EMM Err=2001049 ( Media is already allocated ) | |||
26/07/2013 | 13:36:18 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@SuspendMedia^332 FAILed MM_EC=199 MM_MSG=the media is allocated for use | |||
26/07/2013 | 13:36:18 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@SuspendMedia^332: Leaving with DMN=2 SC=199 | |||
26/07/2013 | 13:36:18 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VaultRobot::suspendMediaToEject()^332: suspend failed for media 000726, ignoring... | |||
26/07/2013 | 13:36:18 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VaultRobot::suspendMediaToEject()^332 FAILed NB_EC=97 NB_MSG=requested media id is in use, cannot process request | |||
26/07/2013 | 13:36:18 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VaultRobot::suspendMediaToEject()^332: Leaving with DMN=1 SC=97 | |||
26/07/2013 | 13:36:18 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VaultScsiRobot::doEject()^332: there was a problem in doing suspending media to eject returned 97 | |||
26/07/2013 | 13:36:18 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VaultScsiRobot::doEject()^332 FAILed NB_EC=335 NB_MSG=failure occurred while suspending media for eject | |||
26/07/2013 | 13:36:32 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VaultScsiRobot::doStartEject()^332: Executing command: "C:\Program Files\Veritas\Volmgr\bin\vmchange.exe" -res -multi_eject -sc -rn 0 -rt tld -rh appnbu-v -vh appnbu-v -v --- -ml 000245:000256:000273:000276 | |||
26/07/2013 | 13:40:20 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VaultScsiRobot::doStartEject()^332: Eject Failed | |||
26/07/2013 | 13:40:34 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VaultScsiRobot::doEject()^332: There are problems in doing eject, returned 287 | |||
26/07/2013 | 13:40:34 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VaultScsiRobot::doEject()^332: Leaving with DMN=1 SC=335 | |||
26/07/2013 | 13:40:34 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VltSession::lock_and_operate^332 OP_STEP=eject_media FAILED | |||
26/07/2013 | 13:40:34 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VltSession::lock_and_operate^332 FAILed NB_EC=287 NB_MSG=vault eject failed | |||
26/07/2013 | 13:40:34 | MasterCluster | Error | 0 | Retrieve | <16> vltrun@VltSession::lock_and_operate^332: Leaving with DMN=1 SC=287 |
Any help would be much appriciated
Thanks
Katie
One of the messages I saw in the full detail.log that you sent me suggested that the issue 'could' be with the robot inventory being out-of-date.
I say could, as with NBU a given error may have multiple causes, however it is sensible to try the 'most obvious' first.
From waat I saw, vault did try to eject a number of tapes using a standard NBU command, so for at least some of the tapes it would appear the issue is not vault itself.
Vault really only runs NBU commands (to do the dups, catalog backup and ejects, and so providing it gets as far as these commands, it pretty much clears vault as being the cause.
Martin