07-04-2011 09:02 AM
Hi folks,
I was wondering if any of you guys have faced this problem. My costumer have a Hitachi HNAS 3080 (OEM BlueArc Titan). For its backup solution, the BlueArc is connected to the SAN and sees the tape library and drives. In NetBackup, they are using SSO and also NDMP Direct backup method (to send the backups directly from the HNAS to the tape drives). This solution used to work and aparently it stopped a couple of months ago and no changed were made (at least no one was reported).
I have been reading about scsi reservation in the NAS applience, I check and it is active from the beginning. Also check the MSL6000 SCSI-FC bridge card parameters looking for a read ahead option and it was disabled.
Looking at the http://www.symantec.com/business/support/index?page=content&id=TECH43243 and nothing changes.
I've tryied freezen several tapes to see if it was really about the tape itself so I would force the backup to use a different set of tapes on each try, but no success.
Lastly, I asked symantec for a fix (etrack 2140608) about a simmilar issue (http://www.symantec.com/business/support/index?page=content&id=TECH139306) and didnt solve it.
NOTICE that if I do a incremental it finish OK because it doesn't have to use more than one tape, but if I try a full backup and the first tape get full then the backup fails. Also, the same NDMP full backup but in a Three Way configuration can be completed with no pleblems.
Here is the Job details info:
7/1/2011 5:59:55 PM - requesting resource xx-backup-hcart-robot-tld-0-xxx-nas 7/1/2011 5:59:55 PM - requesting resource xx-backup.NBU_CLIENT.MAXJOBS.xxx-nas 7/1/2011 5:59:55 PM - requesting resource xx-backup.NBU_POLICY.MAXJOBS.SRV_XXX-NAS_2 7/1/2011 5:59:55 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas Reason: Drives are in use, Media Server: xx-backup, Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A, Volume Pool: Mensual_Ret-Trimestr, Storage Unit: xx-backup-hcart-robot-tld-0-xxx-nas, Drive Scan Host: N/A 7/1/2011 6:06:08 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas - No drives are available 7/1/2011 6:06:16 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas Reason: Drives are in use, Media Server: xx-backup, Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A, Volume Pool: Mensual_Ret-Trimestr, Storage Unit: xx-backup-hcart-robot-tld-0-xxx-nas, Drive Scan Host: N/A 7/1/2011 6:07:15 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas - No drives are available 7/1/2011 6:07:27 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas Reason: Drives are in use, Media Server: xx-backup, Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A, Volume Pool: Mensual_Ret-Trimestr, Storage Unit: xx-backup-hcart-robot-tld-0-xxx-nas, Drive Scan Host: N/A 7/1/2011 10:35:38 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas - Maximum job count has been reached for the storage unit 7/1/2011 10:37:58 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas Reason: Drives are in use, Media Server: xx-backup, Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A, Volume Pool: Mensual_Ret-Trimestr, Storage Unit: xx-backup-hcart-robot-tld-0-xxx-nas, Drive Scan Host: N/A 7/2/2011 3:35:25 AM - granted resource xx-backup.NBU_CLIENT.MAXJOBS.xxx-nas 7/2/2011 3:35:25 AM - granted resource xx-backup.NBU_POLICY.MAXJOBS.SRV_XXX-NAS_2 7/2/2011 3:35:25 AM - granted resource 8132L4 7/2/2011 3:35:25 AM - granted resource HP.ULTRIUM4-SCSI.000 7/2/2011 3:35:25 AM - granted resource xx-backup-hcart-robot-tld-0-xxx-nas 7/2/2011 3:35:25 AM - estimated 5928792279 Kbytes needed 7/2/2011 3:35:26 AM - started process bpbrm (8340) 7/2/2011 3:35:26 AM - connecting 7/2/2011 3:35:26 AM - connected; connect time: 00:00:00 7/2/2011 3:35:27 AM - mounting 8132L4 7/2/2011 3:36:05 AM - mounted; mount time: 00:00:38 7/2/2011 3:36:10 AM - positioning 8132L4 to file 133 7/2/2011 3:38:30 AM - positioned 8132L4; position time: 00:02:20 7/2/2011 3:38:30 AM - begin writing 7/2/2011 4:05:09 AM - current media 8132L4 complete, requesting next resource Any 7/2/2011 4:05:10 AM - current media -- complete, awaiting next media Any Reason: Drives are in use, Media Server: xx-backup, Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A, Volume Pool: Mensual_Ret-Trimestr, Storage Unit: xx-backup-hcart-robot-tld-0-xxx-nas, Drive Scan Host: N/A 7/2/2011 4:06:03 AM - granted resource 2735L4 7/2/2011 4:06:03 AM - granted resource HP.ULTRIUM4-SCSI.000 7/2/2011 4:06:03 AM - granted resource xx-backup-hcart-robot-tld-0-xxx-nas 7/2/2011 4:06:03 AM - mounting 2735L4 7/2/2011 4:06:35 AM - mounted; mount time: 00:00:32 7/2/2011 4:06:41 AM - positioning 2735L4 to file 1 7/2/2011 4:06:46 AM - positioned 2735L4; position time: 00:00:05 7/2/2011 4:06:46 AM - begin writing 7/2/2011 9:19:59 AM - Error bptm(pid=8636) FREEZING media id 2735L4, External event caused rewind during write, all data on media is lost 7/2/2011 9:19:59 AM - Error ndmpagent(pid=6120) NDMP backup failed, path = /XXX-DATA 7/2/2011 9:19:59 AM - end writing; write time: 05:13:13 media write er 7/4/2011 7:29:36 AM - job 9835 was restarted as job 9872
Here you can see the bptm log at the fail time:
09:19:58.938 [8636.5108] <4> check_and_process_ndmpagent_backup_tasks: ndmpagent[0] reports EOM 09:19:58.938 [8636.5108] <2> write_data: status_to_return = -3, total_frag_kbytes 1010753984 09:19:58.938 [8636.5108] <2> write_backup: write_data() returned, exit_status = 0, CINDEX = 0, TWIN_INDEX = 0, backup_status = -3 09:19:58.938 [8636.5108] <2> signal_parent: set bpbrm media ready event (pid = 8340) 09:19:58.938 [8636.5108] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.19069) on drive index 0 09:19:59.829 [8636.5108] <2> io_ioctl: command (2)MTBSF 1 from (bptm.c.19099) on drive index 0 09:20:05.282 [8636.5108] <2> io_close: closing <>Veritas\NetBackup\db\media\tpreq\drive_HP.ULTRIUM4-SCSI.000, from bptm.c.19130 09:20:05.329 [8636.5108] <2> NdmpMediaSession_close_public_and_ndmpagent[0]: Saving ndmp public and ndmpagent sessions (save flag is set) 09:20:05.329 [8636.5108] <2> write_backup: block position check: actual 15793033, expected 15793034 09:20:05.329 [8636.5108] <16> write_backup: FREEZING media id 2735L4, External event caused rewind during write, all data on media is lost 09:20:05.329 [8636.5108] <2> send_MDS_msg: DEVICE_STATUS 1 2920 xx-backup 2735L4 4000463 HP.ULTRIUM4-SCSI.000 2000008 WRITE_ERROR 0 0 09:20:05.376 [8636.5108] <2> log_media_error: successfully wrote to error file - 07/02/11 09:20:05 2735L4 0 WRITE_ERROR HP.ULTRIUM4-SCSI.000 09:20:05.391 [8636.5108] <2> send_MDS_msg: MEDIADB 1 2920 2735L4 4000463 *NULL* 6 1309595731 1309595731 1317630931 0 0 0 0 5 4 0 9 1024 0 0 0 09:20:05.407 [8636.5108] <2> check_error_history: just tpunmount: called from bptm line 19179, EXIT_Status = 84 09:20:05.407 [8636.5108] <2> drivename_write: Called with mode 1 09:20:05.407 [8636.5108] <2> drivename_unlock: unlocked 09:20:05.407 [8636.5108] <2> drivename_checklock: Called 09:20:05.407 [8636.5108] <2> drivename_lock: lock established 09:20:05.407 [8636.5108] <2> drivename_unlock: unlocked
In addition, in https://www-secure.symantec.com/connect/forums/ndmp-import-issue-media-position-error i see the quote:
io_ioctl_ndmp (MTFSF) failed on media id 21908L
.. is a positioning error. The actual positioning is carried out by the o/s / tape drivers - nothing to do with NBU.
How can I look for a workaround if this is the case?
Looking at this http://www.symantec.com/business/support/index?page=content&id=TECH59178&actp=search&viewlocale=en_US&searchid=1309794327603 I've asked the costumer to log a case with the hardware (tapes) provider and they upgraded the library and tapes to the last firmware level but the problem is still comming. I've also upgraded the HNAS box to the lastest version.
The one thing that I didn't try was recabling the HP hw as read in https://www-secure.symantec.com/connect/forums/status-84-io-device-error-hardware-or-software.
Master Server: Windows 2003 x86,
NetBackup: 7.0.1
Solved! Go to Solution.
07-04-2011 09:44 AM
Have you seen this post?
https://www-secure.symantec.com/connect/forums/ndmp-job-failing-when-reaches-tape-fragmentation-size
07-04-2011 09:44 AM
Have you seen this post?
https://www-secure.symantec.com/connect/forums/ndmp-job-failing-when-reaches-tape-fragmentation-size
07-04-2011 10:01 AM
Hi Marianne,
I' glad to hear from you again. Yes, I saw the post you just mentioned and reviewed also the scsi reservation stuff on the Titan.
I can't find any reference to the Etrack 2196740 mentioned on that post so I'm not sure if logging a case would help (I'm going to do it just in case).
Also, I already suggested the costumer to move to 7.1, but they are doing some thing previously, so I'm trying to find another solution in between.
Thanks for you comments.
07-04-2011 10:06 PM
I see exactly the same error as in the post that I've referred to above:
External event caused rewind during write, all data on media is lost
There are other etracks mentioned by 'Backup Guy from Seattle' and CRZ.
07-05-2011 01:27 PM
Yes Marianne, I agree with you. I'm trying to push an update because the fix on the last etrack didn't help.
I give you the feedback as soon as the upgrade is done.
07-14-2011 10:25 AM
Hi Marianne,
Just to let you know that after the upgrade to NBU 7.1, the problem is gone.
Thanks again for your suggestions.
Regards,
SIMON
07-14-2011 07:54 PM
The problem is on your tape media.
Error Code 84
Action: Clean or repaired the drive or replace media and verify the drive configuration
Solution: Cleaning drive periodically