Veritas NBU6.5-Driver I/O Error

phuong_huynh · ‎02-27-2013

Today,When I go to check Vertias NBU6.5 Backup status, find that one policy shows error,as follows:

begin writing

Error bptm (pid=19808) cannot write image to media id 000054, drive index 1, I/O error

end writing

status code 84

Please give some suggests.

TKS

RamNagalla · ‎02-27-2013

hi ,

we would need more details

1) what is your operation system of Media server?

2) what is your hardware? tape library info

3) are you seeing this error for any particular media or Drive? or its random between the Drives and medias?

4) let us know the output of below command

scan

vmoprcmd -d

tpconfig -d

tpautoconf -t

5) and also the logs of bptm and /usr/openv/netbackup/db/media/errors

6) detail status of the failed job.

and also did you check if the tape is write protected.?

does it giving the error afer writing some data ,, or without writing any data?

phuong_huynh · ‎02-27-2013

Hi,

1) what is your operation system of Media server?

The media server is running solaris 10 sparc.

2) what is your hardware? tape library info

Tape library SL500

3) are you seeing this error for any particular media or Drive? or its random between the Drives and medias?

This is random for any media.

4) let us know the output of below command

See file attach "sl500_cmd_logs".

5) and also the logs of bptm and /usr/openv/netbackup/db/media/errors

See file attach "errors"

6) detail status of the failed job.

See picture attach image005 and image 006

and also did you check if the tape is write protected.?

Yes, I have checked, the tape is not write proteted

Thank you.

Marianne · ‎02-27-2013

NetBackup relies on the OS for I/O. This means that NBU is merely reporting error and that we are not going to get a lot of info by looking at NBU alone.

If you are seeing regular status 84's, then /usr/openv/netbackup/db/media/error will help us determine if I/O errors are experienced on a particular tape drive or particular media.

You also need to enable the following logs on the media server:

Create /usr/openv/netbackup/logs/bptm folder

Add VERBOSE entry to /usr/openv/volmgr/vm.conf and restart NBU on media server.
Device-related messages and errors will now be logged to /var/adm/messages.

Some helpful TN's:

http://www.symantec.com/docs/TECH169477

http://www.symantec.com/docs/TECH43243

Please see this extract from above doc:

As an application, NetBackup has no direct access to a device, instead relying on the operating system (OS) to handle any communication with the device. This means that during a write operation NetBackup asks the OS to write to the device and report back the success or failure of that operation. If there is a failure, NetBackup will merely report that a failure occurred, and any troubleshooting should start at the OS level. If the OS is unable to perform the write, there are three likely causes; OS configuration, a problem on the SCSI path, or a problem with the device.

**** PS **** Are you aware of the fact that support for NBU 6.5 has ended in Oct last year?
PLEASE upgrade!

Handy NetBackup Links

phuong_huynh · ‎02-28-2013

Hi,

I sent the log your request, please see the attach file. Also, I don't find vm.conf in the directory /usr/openv/volmgr.

RamNagalla · ‎02-28-2013

01:23:35.351 [19808] <2> write_data: attempting write error recovery, err = 5

01:23:35.351 [19808] <2> tape_error_rec: error recovery to block 10158002 requested

01:23:35.351 [19808] <2> tape_error_rec: attempting error recovery, delay 3 minutes before next attempt, tries left = 5

01:26:35.353 [19808] <2> io_ioctl: command (0)MTWEOF 0 from (overwrite.c.503) on drive index 1

01:26:35.354 [19808] <2> io_ioctl: MTWEOF failed during error recovery, I/O error

see the related T/N

http://www.symantec.com/business/support/index?page=content&id=TECH59178

http://www.symantec.com/business/support/index?page=content&id=TECH58452

both are pointing to tape Drive driver updates and correct the I/O issue in hardware level.

phuong_huynh · ‎02-28-2013

I have 6 tape drive, how to know which tape drive is corrupted.I check the service of the SL500 LED warning but no errors.

Marianne · ‎02-28-2013

... I don't find vm.conf in the directory /usr/openv/volmgr.

In older versions of NBU the vm.conf file does not exist by default.
Please create the file and insert
VERBOSE
in the file.
Save the file and restart NBU.

Handy NetBackup Links

2013 · ‎02-28-2013

Please go ahead and clean the Tape Drives and then shoot a backup and see how it goes....

Also check last time when it was cleaned by executing the below command:

/usr/openv/volmgr/bin/tpclean -L

I know Tape library SL500 have its own functinality to clean the Tape Drives but you need to check what frequency was set for the Tape Drives

/usr/openv/volmgr/bin/tpclean -F drive_name cleaning_frequency

Hope it helps!!!

mph999 · ‎02-28-2013

You do not necessarily see a warning light - the drive has no mechanical fault as such, it just can't read/ write reliably.

mph999 · ‎02-28-2013

From the last 2 or 3 months, from the error.txt file I find :

(I ran this through a script, the file alone does not contain the information in this format)

HP.ULTRIUM4-SCSI.000 has had errors with 2 different tapes (Total occurrences (errors) for this drive is 2)

HP.ULTRIUM4-SCSI.001 has had errors with 65 different tapes (Total occurrences (errors) for this drive is 122)

HP.ULTRIUM4-SCSI.002 has had errors with 2 different tapes (Total occurrences (errors) for this drive is 2)

HP.ULTRIUM4-SCSI.003 has had errors with 14 different tapes (Total occurrences (errors) for this drive is 26)

HP.ULTRIUM4-SCSI.004 has had errors with 13 different tapes (Total occurrences (errors) for this drive is 18)

HP.ULTRIUM4-SCSI.005 has had errors with 57 different tapes (Total occurrences (errors) for this drive is 91)

Two drives show as having many many more errors than the other drives. Assuming the drives are each used a similar amount, then it might suggest thy have some issues.

The tapes that errored in these two drives (001 and 005) did not show significantly higher numbers of errors in othetr drives that the same media had had errors in - in other words, you drives look worn, not your tapes.

You seem to have write errors.

It is a lot 'harder' for a drive to write to a tape than read it, therefore, as a drive wears out, I would expect it to start to fail with write errors before read, which is what we see.

If this issue has started wit no changes made to the environment, I would not expect it to be driver. firmware related.

Martin

phuong_huynh · ‎03-06-2013

Hi all,

I have upgraded firmware for tape drive and Library and also update patch for OS but the error does't fix.

Please give me advice.

Marianne · ‎03-06-2013

Have you created vm.conf with VERBOSE entry yet?

Can you see that Media Manager prosesses are running with -v?

Have you checked /var/adm/messages for hardware errors?

There is more to the data path than just library and tape drives - there is also the hba in the server, cable(s) that goes to a switch, switch port(s), cables that go to each of the drives.

As I've said before, looking at NBU only is not going to tell us much. You need to troubleshoot at OS level.
Switch logs may also help.

The error log is telling us that you are experiencing errors on basically all the drives and lots of tapes. Chances are slim that all of them are faulty. What is the common factor that links all drives to the OS? The hba comes to mind, right?
hba is also more than just a piece of hardware - there is firmware and drivers that must be checked along with the hardware. /var/adm/messages is a good starting point to look for device-related errors.

Handy NetBackup Links

phuong_huynh · ‎03-06-2013

I have created vm.conf with VERBOSE entry.

In the policy that I have run, the policy failed to ues tape drive id 001 and 003 (please see attach file). But when I use the tar command of OS for each drives is ok.

root@Nbmaster2 # tar cvf /dev/rmt/10 explominer.tar
a explominer.tar 24244 tape blocks
root@Nbmaster2 #
root@Nbmaster2 #
root@Nbmaster2 # tar cvf /dev/rmt/7 explominer.tar
a explominer.tar 24244 tape blocks
root@Nbmaster2 #

Marianne · ‎03-06-2013

What is status of tape drives? Check with 'vmoprcmd -d'. Have you checked bptm log and messages file for errors? Ability to write with tar command confirms that I/O errors are intermittent. Old firmware on hba is known for giving errors when load is high.

Handy NetBackup Links

phuong_huynh · ‎03-07-2013

The status of tape drives are OK.

root@Nbmaster2 # /usr/openv/volmgr/bin/vmoprcmd -d

                                PENDING REQUESTS

                                     <NONE>

                                  DRIVE STATUS

Drv Type   Control User      Label RecMID ExtMID Ready   Wr.Enbl. ReqId
0 hcart    TLD                -                     No       -         0
1 hcart    TLD                -                     No       -         0
2 hcart    TLD                -                     No       -         0
3 hcart    TLD                -                     No       -         0
4 hcart DOWN-TLD             -                     No       -         0
5 hcart    TLD                -                     No       -         0

                             ADDITIONAL DRIVE STATUS

Drv DriveName            Shared    Assigned        Comment
0 HP.ULTRIUM4-SCSI.000 No       -
1 HP.ULTRIUM4-SCSI.001 No       -
2 HP.ULTRIUM4-SCSI.002 No       -
3 HP.ULTRIUM4-SCSI.003 No       -
4 HP.ULTRIUM4-SCSI.004 No       -
5 HP.ULTRIUM4-SCSI.005 No       -
root@Nbmaster2 #

I would suggest to the team hardware about the upgrade firmware for hba card.

Marianne · ‎03-07-2013

 4 hcart  DOWN-TLD             -                     No       -         0  

 4 HP.ULTRIUM4-SCSI.004  No       -

Drive 004 is DOWN. Have you checked bptm log and messages files as suggested previously?

You need to do some 'home work' before suggesting firmware upgrade.
Check messages file (or backup of messages file) for boot messages. (who -b will tell you when last the server was rebooted). You will find the hba make and model along with firmware and driver version.
While you have messages file open, look for hardware-related errors.
Look on hba vendor's web site for known issues with the firmware and driver versions.

About drives not getting used, check for stuck/orphaned device allocation:
nbrbutil -dump
Check the 'MDS Allocation' section at the bottom of the output for media or drive allocation that is not really in use, not the Allocation Key number and release with:
nbrbutil -releaseMDS <mdsAlocationKey>

Handy NetBackup Links

phuong_huynh · ‎03-07-2013

/usr/openv/netbackup/db/media/errors

03/05/13 01:20:39 000144 3 WRITE_ERROR HP.ULTRIUM4-SCSI.003
03/05/13 01:20:44 000144 3 TAPE_ALERT HP.ULTRIUM4-SCSI.003 0x10000000 0x00000000
03/05/13 04:40:50 000017 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/05/13 04:40:55 000017 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 06:23:15 000141 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/06/13 06:23:20 000141 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 21:24:19 000051 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/06/13 21:24:24 000051 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 21:59:51 000130 3 WRITE_ERROR HP.ULTRIUM4-SCSI.003
03/06/13 21:59:56 000130 3 TAPE_ALERT HP.ULTRIUM4-SCSI.003 0x10000000 0x00000000
03/06/13 23:38:21 000010 2 TAPE_ALERT HP.ULTRIUM4-SCSI.002 0x10000000 0x00000000
03/07/13 08:43:50 000019 5 TAPE_ALERT HP.ULTRIUM4-SCSI.005 0x10000000 0x00000000
03/07/13 09:01:47 000143 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/07/13 09:01:52 000143 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
root@Nbmaster2 #

root@Nbmaster2 # /usr/openv/volmgr/bin/tpconfig -update -drive 4 -drstatus UP
Updated drive < HP.ULTRIUM4-SCSI.004 > of type hcart in configuration
root@Nbmaster2 #
root@Nbmaster2 #
root@Nbmaster2 # /usr/openv/volmgr/bin/tpconfig -d
Id DriveName           Type   Residence
      Drive Path                                                       Status
****************************************************************************
0   HP.ULTRIUM4-SCSI.000 hcart TLD(0) DRIVE=3
      /dev/rmt/8cbn                                                    UP
1   HP.ULTRIUM4-SCSI.001 hcart TLD(0) DRIVE=5
      /dev/rmt/10cbn                                                   UP
2   HP.ULTRIUM4-SCSI.002 hcart TLD(0) DRIVE=6
      /dev/rmt/11cbn                                                   UP
3   HP.ULTRIUM4-SCSI.003 hcart TLD(0) DRIVE=2
      /dev/rmt/7cbn                                                    UP
4   HP.ULTRIUM4-SCSI.004 hcart TLD(0) DRIVE=1
      /dev/rmt/6cbn                                                    UP
5   HP.ULTRIUM4-SCSI.005 hcart TLD(0) DRIVE=4
      /dev/rmt/9cbn                                                    UP

Currently defined robotics are:
TLD(0)     robotic path = /dev/sg/c1tw500104f000b88092l0

EMM Server = Nbmaster2

root@Nbmaster2 #

Marianne · ‎03-07-2013

Seems you are ignoring my advice to check bptm log and /var/adm/messages.

I give up....

Good luck!

Handy NetBackup Links

phuong_huynh · ‎03-10-2013

I sent log /var/adm/messages for the hardware team for their review and concluded that the error only appears in the log when the write data to tape using Netbackup software, also uses OS command not found error.

I just want to confirm that the configuration of Veritas is correct and whether this is a bug of veritas 6.5.

Anyway, thank you for your support very much.

VOX

Veritas NBU6.5-Driver I/O Error