cancel
Showing results for 
Search instead for 
Did you mean: 

NDMP backup fails with status 84 when the 1st tape is full

Symontec
Level 5

Hi folks,

   I was wondering if any of you guys have faced this problem. My costumer have a Hitachi HNAS 3080 (OEM BlueArc Titan). For its backup solution, the BlueArc is connected to the SAN and sees the tape library and drives. In NetBackup, they are using SSO and also NDMP Direct backup method (to send the backups directly from the HNAS to the tape drives). This solution used to work and aparently it stopped a couple of months ago and no changed were made (at least no one was reported).

   I have been reading about scsi reservation in the NAS applience, I check and it is active from the beginning. Also check the MSL6000 SCSI-FC bridge card parameters looking for a read ahead option and it was disabled.

   Looking at the http://www.symantec.com/business/support/index?page=content&id=TECH43243 and nothing changes.

   I've tryied freezen several tapes to see if it was really about the tape itself so I would force the backup to use a different set of tapes on each try, but no success.

   Lastly, I asked symantec for a fix (etrack 2140608) about a simmilar issue (http://www.symantec.com/business/support/index?page=content&id=TECH139306) and didnt solve it.

   NOTICE that if I do a incremental it finish OK because it doesn't have to use more than one tape, but if I try a full backup and the first tape get full then the backup fails. Also, the same NDMP full backup but in a Three Way configuration can be completed with no pleblems.

    Here is the Job details info:

 7/1/2011 5:59:55 PM - requesting resource xx-backup-hcart-robot-tld-0-xxx-nas
7/1/2011 5:59:55 PM - requesting resource xx-backup.NBU_CLIENT.MAXJOBS.xxx-nas
7/1/2011 5:59:55 PM - requesting resource xx-backup.NBU_POLICY.MAXJOBS.SRV_XXX-NAS_2
7/1/2011 5:59:55 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas Reason: Drives are in use, Media Server: xx-backup,
                 Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
                 Volume Pool: Mensual_Ret-Trimestr, Storage Unit: xx-backup-hcart-robot-tld-0-xxx-nas, Drive Scan Host: N/A

7/1/2011 6:06:08 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas - No drives are available
7/1/2011 6:06:16 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas Reason: Drives are in use, Media Server: xx-backup,
                 Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
                 Volume Pool: Mensual_Ret-Trimestr, Storage Unit: xx-backup-hcart-robot-tld-0-xxx-nas, Drive Scan Host: N/A
               
7/1/2011 6:07:15 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas - No drives are available
7/1/2011 6:07:27 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas Reason: Drives are in use, Media Server: xx-backup,
                 Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
                 Volume Pool: Mensual_Ret-Trimestr, Storage Unit: xx-backup-hcart-robot-tld-0-xxx-nas, Drive Scan Host: N/A
              
7/1/2011 10:35:38 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas - Maximum job count has been reached for the storage unit
7/1/2011 10:37:58 PM - awaiting resource xx-backup-hcart-robot-tld-0-xxx-nas Reason: Drives are in use, Media Server: xx-backup,
                 Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
                 Volume Pool: Mensual_Ret-Trimestr, Storage Unit: xx-backup-hcart-robot-tld-0-xxx-nas, Drive Scan Host: N/A
               
7/2/2011 3:35:25 AM - granted resource xx-backup.NBU_CLIENT.MAXJOBS.xxx-nas
7/2/2011 3:35:25 AM - granted resource xx-backup.NBU_POLICY.MAXJOBS.SRV_XXX-NAS_2
7/2/2011 3:35:25 AM - granted resource 8132L4
7/2/2011 3:35:25 AM - granted resource HP.ULTRIUM4-SCSI.000
7/2/2011 3:35:25 AM - granted resource xx-backup-hcart-robot-tld-0-xxx-nas
7/2/2011 3:35:25 AM - estimated 5928792279 Kbytes needed
7/2/2011 3:35:26 AM - started process bpbrm (8340)
7/2/2011 3:35:26 AM - connecting
7/2/2011 3:35:26 AM - connected; connect time: 00:00:00
7/2/2011 3:35:27 AM - mounting 8132L4
7/2/2011 3:36:05 AM - mounted; mount time: 00:00:38
7/2/2011 3:36:10 AM - positioning 8132L4 to file 133
7/2/2011 3:38:30 AM - positioned 8132L4; position time: 00:02:20
7/2/2011 3:38:30 AM - begin writing
7/2/2011 4:05:09 AM - current media 8132L4 complete, requesting next resource Any
7/2/2011 4:05:10 AM - current media -- complete, awaiting next media Any Reason: Drives are in use, Media Server: xx-backup,
                 Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
                 Volume Pool: Mensual_Ret-Trimestr, Storage Unit: xx-backup-hcart-robot-tld-0-xxx-nas, Drive Scan Host: N/A
             
7/2/2011 4:06:03 AM - granted resource 2735L4
7/2/2011 4:06:03 AM - granted resource HP.ULTRIUM4-SCSI.000
7/2/2011 4:06:03 AM - granted resource xx-backup-hcart-robot-tld-0-xxx-nas
7/2/2011 4:06:03 AM - mounting 2735L4
7/2/2011 4:06:35 AM - mounted; mount time: 00:00:32
7/2/2011 4:06:41 AM - positioning 2735L4 to file 1
7/2/2011 4:06:46 AM - positioned 2735L4; position time: 00:00:05
7/2/2011 4:06:46 AM - begin writing
7/2/2011 9:19:59 AM - Error bptm(pid=8636) FREEZING media id 2735L4, External event caused rewind during write, all data on media is lost
7/2/2011 9:19:59 AM - Error ndmpagent(pid=6120) NDMP backup failed, path = /XXX-DATA      
7/2/2011 9:19:59 AM - end writing; write time: 05:13:13
media write er
7/4/2011 7:29:36 AM - job 9835 was restarted as job 9872 

   

Here you can see the bptm log at the fail time:

 
09:19:58.938 [8636.5108] <4> check_and_process_ndmpagent_backup_tasks: ndmpagent[0] reports EOM
09:19:58.938 [8636.5108] <2> write_data: status_to_return = -3, total_frag_kbytes 1010753984
09:19:58.938 [8636.5108] <2> write_backup: write_data() returned, exit_status = 0, CINDEX = 0, TWIN_INDEX = 0, backup_status = -3
09:19:58.938 [8636.5108] <2> signal_parent: set bpbrm media ready event (pid = 8340)
09:19:58.938 [8636.5108] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.19069) on drive index 0
09:19:59.829 [8636.5108] <2> io_ioctl: command (2)MTBSF 1 from (bptm.c.19099) on drive index 0
09:20:05.282 [8636.5108] <2> io_close: closing <>Veritas\NetBackup\db\media\tpreq\drive_HP.ULTRIUM4-SCSI.000, from bptm.c.19130
09:20:05.329 [8636.5108] <2> NdmpMediaSession_close_public_and_ndmpagent[0]: Saving ndmp public and ndmpagent sessions (save flag is set)
09:20:05.329 [8636.5108] <2> write_backup: block position check: actual 15793033, expected 15793034
09:20:05.329 [8636.5108] <16> write_backup: FREEZING media id 2735L4, External event caused rewind during write, all data on media is lost
09:20:05.329 [8636.5108] <2> send_MDS_msg: DEVICE_STATUS 1 2920 xx-backup 2735L4 4000463 HP.ULTRIUM4-SCSI.000 2000008 WRITE_ERROR 0 0
09:20:05.376 [8636.5108] <2> log_media_error: successfully wrote to error file - 07/02/11 09:20:05 2735L4 0 WRITE_ERROR HP.ULTRIUM4-SCSI.000
09:20:05.391 [8636.5108] <2> send_MDS_msg: MEDIADB 1 2920 2735L4 4000463 *NULL* 6 1309595731 1309595731 1317630931 0 0 0 0 5 4 0 9 1024 0 0 0
09:20:05.407 [8636.5108] <2> check_error_history: just tpunmount: called from bptm line 19179, EXIT_Status = 84
09:20:05.407 [8636.5108] <2> drivename_write: Called with mode 1
09:20:05.407 [8636.5108] <2> drivename_unlock: unlocked
09:20:05.407 [8636.5108] <2> drivename_checklock: Called
09:20:05.407 [8636.5108] <2> drivename_lock: lock established
09:20:05.407 [8636.5108] <2> drivename_unlock: unlocked
 

 

In addition, in https://www-secure.symantec.com/connect/forums/ndmp-import-issue-media-position-error i see the quote: 

io_ioctl_ndmp (MTFSF) failed on media id 21908L

.. is a positioning error.  The actual positioning is carried out by the o/s / tape drivers - nothing to do with NBU. 

How can I look for a workaround if this is the case?

Looking at this http://www.symantec.com/business/support/index?page=content&id=TECH59178&actp=search&viewlocale=en_US&searchid=1309794327603 I've asked the costumer to log a case with the hardware (tapes) provider and they upgraded the library and tapes to the last firmware level but the problem is still comming. I've also upgraded the HNAS box to the lastest version.

The one thing that I didn't try was recabling the HP hw as read in https://www-secure.symantec.com/connect/forums/status-84-io-device-error-hardware-or-software.

Any advices or ideas are welcome.

Master Server: Windows 2003 x86,

NetBackup: 7.0.1

1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Have you seen this post?

https://www-secure.symantec.com/connect/forums/ndmp-job-failing-when-reaches-tape-fragmentation-size

View solution in original post

6 REPLIES 6

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Have you seen this post?

https://www-secure.symantec.com/connect/forums/ndmp-job-failing-when-reaches-tape-fragmentation-size

Symontec
Level 5

Hi Marianne,

I' glad to hear from you again. Yes, I saw the post you just mentioned and reviewed also the scsi reservation stuff on the Titan.

I can't find any reference to the Etrack 2196740 mentioned on that post so I'm not sure if logging a case would help (I'm going to do it just in case).

Also, I already suggested the costumer to move to 7.1, but they are doing some thing previously, so I'm trying to find another solution in between.

Thanks for you comments.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I see exactly the same error as in the post that I've referred to above:

External event caused rewind during write, all data on media is lost

There are other etracks mentioned by 'Backup Guy from Seattle' and CRZ.

Symontec
Level 5

Yes Marianne, I agree with you. I'm trying to push an update because the fix on the last etrack didn't help.

I give you the feedback as soon as the upgrade is done.

Symontec
Level 5

Hi Marianne,

   Just to let you know that after the upgrade to NBU 7.1, the problem is gone.

   Thanks again for your suggestions.

Regards,

SIMON

falti_manullang
Level 5
Partner

The problem is on your tape media.

Error Code 84

Action: Clean or repaired the drive or replace media and verify the drive configuration

Solution: Cleaning drive periodically