cancel
Showing results for 
Search instead for 
Did you mean: 

Veritas NBU6.5-Driver I/O Error

phuong_huynh
Level 3
Partner
Today,When I go to check Vertias NBU6.5 Backup status, find that one policy shows error,as follows:
begin writing
Error bptm (pid=19808) cannot write image to media id 000054, drive index 1, I/O error
end writing
status code 84
 
Please give some suggests.
TKS
24 REPLIES 24

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

hi ,

we would need more details

1) what is your operation system of Media server?

2) what is your hardware? tape library info

3) are you seeing this error for any particular media or Drive? or its random between the Drives and medias?

4) let us know the output of below command

scan

vmoprcmd -d

tpconfig -d

tpautoconf -t

5) and also the logs of bptm and /usr/openv/netbackup/db/media/errors

6) detail status of the failed job.

and also did you check if the tape is write protected.?

does it giving the error afer writing some data ,, or without writing any data?

phuong_huynh
Level 3
Partner

Hi,

1) what is your operation system of Media server?

The media server is running solaris 10 sparc.

2) what is your hardware? tape library info

Tape library SL500

3) are you seeing this error for any particular media or Drive? or its random between the Drives and medias?

This is random for any media.

4) let us know the output of below command

See file attach "sl500_cmd_logs".

5) and also the logs of bptm and /usr/openv/netbackup/db/media/errors

See file attach "errors"

6) detail status of the failed job.

See picture attach image005 and image 006

and also did you check if the tape is write protected.?

Yes, I have checked, the tape is not write proteted

Thank you.

Marianne
Level 6
Partner    VIP    Accredited Certified

NetBackup relies on the OS for I/O. This means that NBU is merely reporting error and that we are not going to get a lot of info by looking at NBU alone.

If you are seeing regular status 84's, then /usr/openv/netbackup/db/media/error will help us determine if I/O errors are experienced on a particular tape drive or particular media.

You also need to enable the following logs on the media server:

Create /usr/openv/netbackup/logs/bptm folder

Add VERBOSE entry to /usr/openv/volmgr/vm.conf and restart NBU on media server.
Device-related messages and errors will now be logged to /var/adm/messages.

Some helpful TN's:

http://www.symantec.com/docs/TECH169477

http://www.symantec.com/docs/TECH43243

Please see this extract from above doc:

As an application, NetBackup has no direct access to a device, instead relying on the operating system (OS) to handle any communication with the device. This means that during a write operation NetBackup asks the OS to write to the device and report back the success or failure of that operation. If there is a failure, NetBackup will merely report that a failure occurred, and any troubleshooting should start at the OS level. If the OS is unable to perform the write, there are three likely causes; OS configuration, a problem on the SCSI path, or a problem with the device.

 

**** PS **** Are you aware of the fact that support for NBU 6.5 has ended in Oct last year?
PLEASE upgrade!

phuong_huynh
Level 3
Partner

Hi,

I sent the log your request, please see the attach file. Also, I don't find vm.conf in the directory /usr/openv/volmgr.

 

 

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

 

01:23:35.351 [19808] <2> write_data: attempting write error recovery, err = 5
01:23:35.351 [19808] <2> tape_error_rec: error recovery to block 10158002 requested
01:23:35.351 [19808] <2> tape_error_rec: attempting error recovery, delay 3 minutes before next attempt, tries left = 5
01:26:35.353 [19808] <2> io_ioctl: command (0)MTWEOF 0 from (overwrite.c.503) on drive index 1
01:26:35.354 [19808] <2> io_ioctl: MTWEOF failed during error recovery, I/O error
 
see the related T/N
 
 
 
 
both are pointing to tape Drive driver  updates and correct the I/O issue in hardware level.

phuong_huynh
Level 3
Partner

I have 6 tape drive, how to know which tape drive is corrupted.I check the service of the SL500 LED warning but no errors.

Marianne
Level 6
Partner    VIP    Accredited Certified

... I don't find vm.conf in the directory /usr/openv/volmgr.

In older versions of NBU the vm.conf file does not exist by default.
Please create the file and insert 
VERBOSE
in the file. 
Save the file and restart NBU.

2013
Level 5

Please go ahead and clean the Tape Drives and then shoot a backup and see how it goes....

Also check last time when it was cleaned by executing the below command:

/usr/openv/volmgr/bin/tpclean -L

I know Tape library SL500 have its own functinality to clean the Tape Drives but you need to check what frequency was set for the Tape Drives

/usr/openv/volmgr/bin/tpclean -F drive_name cleaning_frequency

Hope it helps!!!

mph999
Level 6
Employee Accredited

You do not necessarily see a warning light - the drive has no mechanical fault as such, it just can't read/ write reliably.

 

mph999
Level 6
Employee Accredited

From the last 2 or 3 months, from the error.txt file I find :

(I ran this through a script, the file alone does not contain the information in this format)

 

HP.ULTRIUM4-SCSI.000 has had errors with 2 different tapes   (Total occurrences (errors) for this drive is 2)
HP.ULTRIUM4-SCSI.001 has had errors with 65 different tapes   (Total occurrences (errors) for this drive is 122)
HP.ULTRIUM4-SCSI.002 has had errors with 2 different tapes   (Total occurrences (errors) for this drive is 2)
HP.ULTRIUM4-SCSI.003 has had errors with 14 different tapes   (Total occurrences (errors) for this drive is 26)
HP.ULTRIUM4-SCSI.004 has had errors with 13 different tapes   (Total occurrences (errors) for this drive is 18)
HP.ULTRIUM4-SCSI.005 has had errors with 57 different tapes   (Total occurrences (errors) for this drive is 91)
 
Two drives show as having many many more errors than the other drives.  Assuming the drives are each used a similar amount, then it might suggest thy have some issues.
 
The tapes that errored in these two drives (001 and 005) did not show significantly higher numbers of errors in othetr drives that the same media had had errors in - in other words, you drives look worn, not your tapes.
 
You seem to have write errors.
 
It is a lot 'harder' for a drive to write to a tape than read it, therefore, as a drive wears out, I would expect it to start to fail with write errors before read, which is what we see.
 
If this issue has started wit no changes made to the environment, I would not expect it to be driver. firmware related.
 
Martin

phuong_huynh
Level 3
Partner

Hi all,

I have upgraded firmware for tape drive and Library and also update patch for OS but the error does't fix.

Please give me advice.

Marianne
Level 6
Partner    VIP    Accredited Certified

Have you created vm.conf with VERBOSE entry yet?

Can you see that Media Manager prosesses are running with -v?

Have you checked /var/adm/messages for hardware errors?

There is more to the data path than just library and tape drives - there is also the hba in the server, cable(s) that goes to a switch, switch port(s), cables that go to each of the drives.

As I've said before, looking at NBU only is not going to tell us much. You need to troubleshoot at OS level. 
Switch logs may also help.

The error log is telling us that you are experiencing errors on basically all the drives and lots of tapes. Chances are slim that all of them are faulty. What is the common factor that links all drives to the OS? The hba comes to mind, right?
hba is also more than just a piece of hardware - there is firmware and drivers that must be checked along with the hardware. /var/adm/messages is a good starting point to look for device-related errors.

phuong_huynh
Level 3
Partner

I have created vm.conf with VERBOSE entry.

In the policy that I have run, the policy failed to ues tape drive id 001 and 003 (please see attach file). But when I use the tar command of OS for each drives is ok.

root@Nbmaster2 # tar cvf /dev/rmt/10 explominer.tar
a explominer.tar 24244 tape blocks
root@Nbmaster2 #
root@Nbmaster2 #
root@Nbmaster2 # tar cvf /dev/rmt/7 explominer.tar
a explominer.tar 24244 tape blocks
root@Nbmaster2 #

Marianne
Level 6
Partner    VIP    Accredited Certified
What is status of tape drives? Check with 'vmoprcmd -d'. Have you checked bptm log and messages file for errors? Ability to write with tar command confirms that I/O errors are intermittent. Old firmware on hba is known for giving errors when load is high.

phuong_huynh
Level 3
Partner

The status of tape drives are OK.

root@Nbmaster2 # /usr/openv/volmgr/bin/vmoprcmd -d

                                PENDING REQUESTS

                                     <NONE>

                                  DRIVE STATUS

Drv Type   Control  User      Label  RecMID  ExtMID  Ready   Wr.Enbl.  ReqId
  0 hcart    TLD                -                     No       -         0  
  1 hcart    TLD                -                     No       -         0  
  2 hcart    TLD                -                     No       -         0  
  3 hcart    TLD                -                     No       -         0  
  4 hcart  DOWN-TLD             -                     No       -         0  
  5 hcart    TLD                -                     No       -         0  

                             ADDITIONAL DRIVE STATUS

Drv DriveName            Shared    Assigned        Comment                   
  0 HP.ULTRIUM4-SCSI.000  No       -                                         
  1 HP.ULTRIUM4-SCSI.001  No       -                                         
  2 HP.ULTRIUM4-SCSI.002  No       -                                         
  3 HP.ULTRIUM4-SCSI.003  No       -                                         
  4 HP.ULTRIUM4-SCSI.004  No       -                                         
  5 HP.ULTRIUM4-SCSI.005  No       -                                         
root@Nbmaster2 #

I would suggest to the team hardware about the upgrade firmware for hba card.

Marianne
Level 6
Partner    VIP    Accredited Certified
 4 hcart  DOWN-TLD             -                     No       -         0  

 4 HP.ULTRIUM4-SCSI.004  No       -                                         

Drive 004 is DOWN. Have you checked bptm log and messages files as suggested previously?

You need to do some 'home work' before suggesting firmware upgrade.
Check messages file (or backup of messages file) for boot messages. (who -b will tell you when last the server was rebooted). You will find the hba make and model along with firmware and driver version. 
While you have messages file open, look for hardware-related errors.
Look on hba vendor's web site for known issues with the firmware and driver versions.

About drives not getting used, check for stuck/orphaned device allocation:
nbrbutil -dump
Check the 'MDS Allocation' section at the bottom of the output for media or drive allocation that is not really in use, not the Allocation Key number and release with:
nbrbutil 
 -releaseMDS <mdsAlocationKey> 
 

phuong_huynh
Level 3
Partner

/usr/openv/netbackup/db/media/errors

03/05/13 01:20:39 000144 3 WRITE_ERROR HP.ULTRIUM4-SCSI.003
03/05/13 01:20:44 000144 3 TAPE_ALERT HP.ULTRIUM4-SCSI.003 0x10000000 0x00000000
03/05/13 04:40:50 000017 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/05/13 04:40:55 000017 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 06:23:15 000141 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/06/13 06:23:20 000141 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 21:24:19 000051 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/06/13 21:24:24 000051 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 21:59:51 000130 3 WRITE_ERROR HP.ULTRIUM4-SCSI.003
03/06/13 21:59:56 000130 3 TAPE_ALERT HP.ULTRIUM4-SCSI.003 0x10000000 0x00000000
03/06/13 23:38:21 000010 2 TAPE_ALERT HP.ULTRIUM4-SCSI.002 0x10000000 0x00000000
03/07/13 08:43:50 000019 5 TAPE_ALERT HP.ULTRIUM4-SCSI.005 0x10000000 0x00000000
03/07/13 09:01:47 000143 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/07/13 09:01:52 000143 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
root@Nbmaster2 #

root@Nbmaster2 # /usr/openv/volmgr/bin/tpconfig -update -drive 4 -drstatus UP
Updated drive < HP.ULTRIUM4-SCSI.004 > of type hcart in configuration
root@Nbmaster2 #
root@Nbmaster2 #
root@Nbmaster2 # /usr/openv/volmgr/bin/tpconfig -d
Id  DriveName           Type   Residence
      Drive Path                                                       Status
****************************************************************************
0   HP.ULTRIUM4-SCSI.000 hcart  TLD(0)  DRIVE=3
      /dev/rmt/8cbn                                                    UP
1   HP.ULTRIUM4-SCSI.001 hcart  TLD(0)  DRIVE=5
      /dev/rmt/10cbn                                                   UP
2   HP.ULTRIUM4-SCSI.002 hcart  TLD(0)  DRIVE=6
      /dev/rmt/11cbn                                                   UP
3   HP.ULTRIUM4-SCSI.003 hcart  TLD(0)  DRIVE=2
      /dev/rmt/7cbn                                                    UP
4   HP.ULTRIUM4-SCSI.004 hcart  TLD(0)  DRIVE=1
      /dev/rmt/6cbn                                                    UP
5   HP.ULTRIUM4-SCSI.005 hcart  TLD(0)  DRIVE=4
      /dev/rmt/9cbn                                                    UP

Currently defined robotics are:
  TLD(0)     robotic path = /dev/sg/c1tw500104f000b88092l0

EMM Server = Nbmaster2

root@Nbmaster2 #

 

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Seems you are ignoring my advice to check bptm log and /var/adm/messages.

I give up....

Good luck!

phuong_huynh
Level 3
Partner

I sent log /var/adm/messages for the hardware team for their review and concluded that the error only appears in the log when the write data to tape using Netbackup software, also uses OS command not found error.

I just want to confirm that the configuration of Veritas is correct and whether this is a bug of veritas 6.5.

Anyway, thank you for your support very much.