02-27-2013 01:24 AM
02-27-2013 01:48 AM
hi ,
we would need more details
1) what is your operation system of Media server?
2) what is your hardware? tape library info
3) are you seeing this error for any particular media or Drive? or its random between the Drives and medias?
4) let us know the output of below command
scan
vmoprcmd -d
tpconfig -d
tpautoconf -t
5) and also the logs of bptm and /usr/openv/netbackup/db/media/errors
6) detail status of the failed job.
and also did you check if the tape is write protected.?
does it giving the error afer writing some data ,, or without writing any data?
02-27-2013 07:31 PM
Hi,
1) what is your operation system of Media server?
The media server is running solaris 10 sparc.
2) what is your hardware? tape library info
Tape library SL500
3) are you seeing this error for any particular media or Drive? or its random between the Drives and medias?
This is random for any media.
4) let us know the output of below command
See file attach "sl500_cmd_logs".
5) and also the logs of bptm and /usr/openv/netbackup/db/media/errors
See file attach "errors"
6) detail status of the failed job.
See picture attach image005 and image 006
and also did you check if the tape is write protected.?
Yes, I have checked, the tape is not write proteted
Thank you.
02-27-2013 08:15 PM
NetBackup relies on the OS for I/O. This means that NBU is merely reporting error and that we are not going to get a lot of info by looking at NBU alone.
If you are seeing regular status 84's, then /usr/openv/netbackup/db/media/error will help us determine if I/O errors are experienced on a particular tape drive or particular media.
You also need to enable the following logs on the media server:
Create /usr/openv/netbackup/logs/bptm folder
Add VERBOSE entry to /usr/openv/volmgr/vm.conf and restart NBU on media server.
Device-related messages and errors will now be logged to /var/adm/messages.
Some helpful TN's:
http://www.symantec.com/docs/TECH169477
http://www.symantec.com/docs/TECH43243
Please see this extract from above doc:
As an application, NetBackup has no direct access to a device, instead relying on the operating system (OS) to handle any communication with the device. This means that during a write operation NetBackup asks the OS to write to the device and report back the success or failure of that operation. If there is a failure, NetBackup will merely report that a failure occurred, and any troubleshooting should start at the OS level. If the OS is unable to perform the write, there are three likely causes; OS configuration, a problem on the SCSI path, or a problem with the device.
**** PS **** Are you aware of the fact that support for NBU 6.5 has ended in Oct last year?
PLEASE upgrade!
02-28-2013 01:02 AM
Hi,
I sent the log your request, please see the attach file. Also, I don't find vm.conf in the directory /usr/openv/volmgr.
02-28-2013 01:32 AM
02-28-2013 02:12 AM
I have 6 tape drive, how to know which tape drive is corrupted.I check the service of the SL500 LED warning but no errors.
02-28-2013 02:33 AM
... I don't find vm.conf in the directory /usr/openv/volmgr.
In older versions of NBU the vm.conf file does not exist by default.
Please create the file and insert
VERBOSE
in the file.
Save the file and restart NBU.
02-28-2013 02:35 AM
Please go ahead and clean the Tape Drives and then shoot a backup and see how it goes....
Also check last time when it was cleaned by executing the below command:
/usr/openv/volmgr/bin/tpclean -L
I know Tape library SL500 have its own functinality to clean the Tape Drives but you need to check what frequency was set for the Tape Drives
/usr/openv/volmgr/bin/tpclean -F drive_name cleaning_frequency
Hope it helps!!!
02-28-2013 02:38 AM
You do not necessarily see a warning light - the drive has no mechanical fault as such, it just can't read/ write reliably.
02-28-2013 02:40 AM
From the last 2 or 3 months, from the error.txt file I find :
(I ran this through a script, the file alone does not contain the information in this format)
03-06-2013 08:25 PM
Hi all,
I have upgraded firmware for tape drive and Library and also update patch for OS but the error does't fix.
Please give me advice.
03-06-2013 09:30 PM
Have you created vm.conf with VERBOSE entry yet?
Can you see that Media Manager prosesses are running with -v?
Have you checked /var/adm/messages for hardware errors?
There is more to the data path than just library and tape drives - there is also the hba in the server, cable(s) that goes to a switch, switch port(s), cables that go to each of the drives.
As I've said before, looking at NBU only is not going to tell us much. You need to troubleshoot at OS level.
Switch logs may also help.
The error log is telling us that you are experiencing errors on basically all the drives and lots of tapes. Chances are slim that all of them are faulty. What is the common factor that links all drives to the OS? The hba comes to mind, right?
hba is also more than just a piece of hardware - there is firmware and drivers that must be checked along with the hardware. /var/adm/messages is a good starting point to look for device-related errors.
03-06-2013 10:58 PM
I have created vm.conf with VERBOSE entry.
In the policy that I have run, the policy failed to ues tape drive id 001 and 003 (please see attach file). But when I use the tar command of OS for each drives is ok.
root@Nbmaster2 # tar cvf /dev/rmt/10 explominer.tar
a explominer.tar 24244 tape blocks
root@Nbmaster2 #
root@Nbmaster2 #
root@Nbmaster2 # tar cvf /dev/rmt/7 explominer.tar
a explominer.tar 24244 tape blocks
root@Nbmaster2 #
03-06-2013 11:41 PM
03-07-2013 12:32 AM
The status of tape drives are OK.
root@Nbmaster2 # /usr/openv/volmgr/bin/vmoprcmd -d
PENDING REQUESTS
<NONE>
DRIVE STATUS
Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId
0 hcart TLD - No - 0
1 hcart TLD - No - 0
2 hcart TLD - No - 0
3 hcart TLD - No - 0
4 hcart DOWN-TLD - No - 0
5 hcart TLD - No - 0
ADDITIONAL DRIVE STATUS
Drv DriveName Shared Assigned Comment
0 HP.ULTRIUM4-SCSI.000 No -
1 HP.ULTRIUM4-SCSI.001 No -
2 HP.ULTRIUM4-SCSI.002 No -
3 HP.ULTRIUM4-SCSI.003 No -
4 HP.ULTRIUM4-SCSI.004 No -
5 HP.ULTRIUM4-SCSI.005 No -
root@Nbmaster2 #
I would suggest to the team hardware about the upgrade firmware for hba card.
03-07-2013 01:01 AM
4 hcart DOWN-TLD - No - 0 4 HP.ULTRIUM4-SCSI.004 No -
Drive 004 is DOWN. Have you checked bptm log and messages files as suggested previously?
You need to do some 'home work' before suggesting firmware upgrade.
Check messages file (or backup of messages file) for boot messages. (who -b will tell you when last the server was rebooted). You will find the hba make and model along with firmware and driver version.
While you have messages file open, look for hardware-related errors.
Look on hba vendor's web site for known issues with the firmware and driver versions.
About drives not getting used, check for stuck/orphaned device allocation:
nbrbutil -dump
Check the 'MDS Allocation' section at the bottom of the output for media or drive allocation that is not really in use, not the Allocation Key number and release with:
nbrbutil -releaseMDS <mdsAlocationKey>
03-07-2013 02:02 AM
/usr/openv/netbackup/db/media/errors
03/05/13 01:20:39 000144 3 WRITE_ERROR HP.ULTRIUM4-SCSI.003
03/05/13 01:20:44 000144 3 TAPE_ALERT HP.ULTRIUM4-SCSI.003 0x10000000 0x00000000
03/05/13 04:40:50 000017 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/05/13 04:40:55 000017 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 06:23:15 000141 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/06/13 06:23:20 000141 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 21:24:19 000051 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/06/13 21:24:24 000051 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 21:59:51 000130 3 WRITE_ERROR HP.ULTRIUM4-SCSI.003
03/06/13 21:59:56 000130 3 TAPE_ALERT HP.ULTRIUM4-SCSI.003 0x10000000 0x00000000
03/06/13 23:38:21 000010 2 TAPE_ALERT HP.ULTRIUM4-SCSI.002 0x10000000 0x00000000
03/07/13 08:43:50 000019 5 TAPE_ALERT HP.ULTRIUM4-SCSI.005 0x10000000 0x00000000
03/07/13 09:01:47 000143 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/07/13 09:01:52 000143 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
root@Nbmaster2 #
root@Nbmaster2 # /usr/openv/volmgr/bin/tpconfig -update -drive 4 -drstatus UP
Updated drive < HP.ULTRIUM4-SCSI.004 > of type hcart in configuration
root@Nbmaster2 #
root@Nbmaster2 #
root@Nbmaster2 # /usr/openv/volmgr/bin/tpconfig -d
Id DriveName Type Residence
Drive Path Status
****************************************************************************
0 HP.ULTRIUM4-SCSI.000 hcart TLD(0) DRIVE=3
/dev/rmt/8cbn UP
1 HP.ULTRIUM4-SCSI.001 hcart TLD(0) DRIVE=5
/dev/rmt/10cbn UP
2 HP.ULTRIUM4-SCSI.002 hcart TLD(0) DRIVE=6
/dev/rmt/11cbn UP
3 HP.ULTRIUM4-SCSI.003 hcart TLD(0) DRIVE=2
/dev/rmt/7cbn UP
4 HP.ULTRIUM4-SCSI.004 hcart TLD(0) DRIVE=1
/dev/rmt/6cbn UP
5 HP.ULTRIUM4-SCSI.005 hcart TLD(0) DRIVE=4
/dev/rmt/9cbn UP
Currently defined robotics are:
TLD(0) robotic path = /dev/sg/c1tw500104f000b88092l0
EMM Server = Nbmaster2
root@Nbmaster2 #
03-07-2013 02:19 AM
Seems you are ignoring my advice to check bptm log and /var/adm/messages.
I give up....
Good luck!
03-10-2013 11:31 PM
I sent log /var/adm/messages for the hardware team for their review and concluded that the error only appears in the log when the write data to tape using Netbackup software, also uses OS command not found error.
I just want to confirm that the configuration of Veritas is correct and whether this is a bug of veritas 6.5.
Anyway, thank you for your support very much.