Re: Netbackup 5.1 MP6 error

joshluigp · ‎08-10-2007

Hello,!!

I clean drivers frecuently but the error is not on all policies, it´s just appear when i do these kind of backups.

On backups that didn´t take much time the backup made without errros. But these backups take aprox 2:40 hrs pear pool of disks and the error 84 appear sometimes.

The storage is connected DA. i don´t have any switch between server and storage.

And the most anormal is that did´nt appear always. sometimes one pool of disks made the backup correctly and the next pool of discs the error appear.

This error happened with diferent medias. this the bptm log.

03:02:48.353 [15099] <2> write_data: received first buffer (65536 bytes), begin writing data
03:39:20.912 [14832] <2> send_brm_msg: MEDIA NOT READY
03:39:20.912 [14832] <2> write_data: attempting write error recovery, err = 5
03:39:20.912 [14832] <2> tape_error_rec: error recovery to block 7349881 requested
03:39:20.913 [14832] <2> tape_error_rec: attempting error recovery, delay 3 minutes before next attempt, tries left = 5
03:42:20.892 [14832] <2> io_ioctl: command (0)MTWEOF 0 from (overwrite.c.489) on drive index 4
03:42:20.892 [14832] <2> io_ioctl: MTWEOF failed during error recovery, I/O error
03:42:20.895 [14832] <2> tape_error_rec: absolute block position after error is 7346343
03:42:20.895 [14832] <2> set_job_details: Done
03:42:20.913 [14832] <16> write_data: cannot write image to media id DWH046, drive index 4, I/O error
03:42:20.925 [14832] <2> log_media_error: successfully wrote to error file - 08/09/07 03:42:20 DWH046 4 WRITE_ERROR
03:42:20.925 [14832] <2> check_error_history: called from bptm line 17599, EXIT_Status = 84
03:42:20.925 [14832] <2> check_error_history: using time window of 24 hours
03:42:20.925 [14832] <2> check_error_history: using drive error threshold of 20
03:42:20.947 [14832] <2> check_error_history: drive index = 4, media id = DWH046, time = 08/09/07 03:42:20, both_match = 0, m
edia_match = 0, drive_match = 2
03:42:20.947 [14832] <2> io_close: closing /usr/openv/netbackup/db/media/tpreq/DWH046, from bptm.c.15695
03:42:20.948 [14832] <2> tpunmount: Check_for_waiting = 0, No_tpunmount_after_restore = 0, Media_Unmount_Delay = 0, MediaOffs
et = 33
03:42:20.948 [14832] <2> tpunmount: tpunmount'ing /usr/openv/netbackup/db/media/tpreq/DWH046
03:42:20.949 [14832] <2> TpUnmountWrapper: SCSI RELEASE
03:42:31.927 [14832] <2> send_brm_msg: ERROR 84
03:42:43.920 [14832] <2> mpx_terminate_exit: EXITING with status 84

I have a master server with netbackupp 5.1 and i installed MP6 with L180 library with 3 drives assigned. And when i made a backup from many raw devices i recived 84 errors.

the policy was created to backup 103 disks 33Gb each one from dmx3000 using que emcp trayectory. The policy are configure like this.

12 jobs per policy
media multiplexing 6

and the Backup selection are something like these.

NEW_STREAM
/devices/pseudo/emcp@80:h,raw
NEW_STREAM
/devices/pseudo/emcp@81:h,raw
NEW_STREAM
/devices/pseudo/emcp@82:h,raw
NEW_STREAM
/devices/pseudo/emcp@83:h,raw
NEW_STREAM
/devices/pseudo/emcp@84:h,raw
NEW_STREAM
/devices/pseudo/emcp@85:h,raw
NEW_STREAM
/devices/pseudo/emcp@86:h,raw
NEW_STREAM
/devices/pseudo/emcp@87:h,raw
NEW_STREAM

the OS is Solaris 5.9 .

and the logs form de OS looks like these.

Aug 9 03:39:20 backup01 scsi: [ID 107833 kern.notice] Sense Key: Media Error
Aug 9 03:39:20 backup01 scsi: [ID 107833 kern.notice] ASC: 0xc (write error), ASCQ: 0x0, FRU: 0x0
Aug 9 08:15:13 backup01 scsi: [ID 107833 kern.warning] WARNING: /pci@15d,700000/scsi@1/st@1,0 (st57):
Aug 9 08:15:13 backup01 Error for Command: write Error Level: Fatal
Aug 9 08:15:13 backup01 scsi: [ID 107833 kern.notice] Requested Block: 1836866 Error Block: 1836866
Aug 9 08:15:13 backup01 scsi: [ID 107833 kern.notice] Vendor: SEAGATE

Any Ideas to help me out?

Thanks.

Regards,!!

sdo · ‎08-12-2007

To me, status 84 usually indicates bad tape media, or a failing tape drive. But as you say it only happens intermittently on this policy.

1) Is this a new "policy" that has never worked consistently, or has it been working for months/years and only recently has started failing? If only recently, what has changed in the environment?

2) Is the pool of tapes different to all other pools, i.e. is the pool of tapes being used by this policy specific to this policy, i.e. are the tapes from a different manufacturer, or a different batch? Is it possible that they are bad tapes? Are all the tapes new?

3) If all other backups are fine to the same tape drives then it's probably not the physical tape drive(s).

4) I notice that you have error "ASC: 0xc (write error), ASCQ: 0x0, FRU: 0x0", which looks quite similar to the error text referenced in point 1 of bperror below, i.e. "key = 0x0, asc = 0x0, ascq = 0x0" - and although the text is slightly different, could they be the same thing? But this is probably only relevant if your failing policy is a new policy - i.e. is this the first time you have tried such a policy?

5) Have you checked the HCL (Hardware Compatibility) document for your O/S and NetBackup version? It can be found here:

http://www.symantec.com/enterprise/support/documentation.jsp?pid=15143

...in the first dropdown/list box of "Show Document Types" choose "Compatibility" and then select for your versions.

6) The only other thing I can suggest right now is to work your way through the points raised below, i.e. the recommendations from bpperor.

-----------------------------------------------------------------------------------------------------------------------

If we use "bperror" to inform us of "-r"ecommended actions to resolved:

$ bperror -S 84 -r
media write error

The system's device driver returned an I/O error while NetBackup was writing to removable media or a disk file.

Try the following:

1. For NetBackup Advanced Client only:
If the following message appears in the /usr/openv/netbackup/bptm log, and the values for key, asc, and ascq are all zero (0x0) as shown in this example message:
tape error occurred on extended copy command, key = 0x0, asc = 0x0, ascq = 0x0
your host-bus adapter and its driver are probably not supported by NetBackup Advanced Client. The host-bus adapters supported in the release are listed in the NetBackup Release Notes.

2. For additional information, check the following:
  * NetBackup Problems report to determine the device or media that caused the error
  * System and error logs for the system (UNIX)
  * Event Viewer Application and System logs (Windows)

3. If NetBackup was writing backups to a disk file, verify that the disk has enough space for the backup.
For a catalog backup to a disk path on a UNIX system, you may be trying to write a image greater than two gigabytes. File sizes greater than two gigabytes is a limitation on many UNIX file systems. Tape files do not have this limit.

4. If the media is tape or optical disk, check for:
  * A defective or dirty drive, in which case, clean it or have it repaired (refer to the tpclean command for robotic drives).
  * The wrong media type. Verify that the media matches the drive type you are using. On an optical drive, the platters may not be formatted correctly.
  * Defective media. If this is the case, use the bpmedia command to set the volume to the FROZEN state so it is not used for future backups.
  * Incorrect drive configuration. Verify the Media Manager and system configuration for the drive.
  For example, on UNIX the drive could be configured for fixed mode when it must be variable mode. See the Media Manager Device Configuration Guide for more information.
  This often results in the media being frozen with a message, "too many data blocks written, check tape/drive block size configuration."

Stumpr2 · ‎08-13-2007

Excellent post!

joshluigp · ‎08-13-2007

hello,!!

These problem startted few months ago. The recently modifications we made was only new OS patches and the release from version 5.1mp1 to mp6. OS Solaris 5.9.

We gave the correct permision through the trayectories used by netbackup to backup the information disk

we use the emcpower trayectory to backup the disks .

The drivers is uses by another policies and the backups makes correctly. only with these kond of backups the system send 84 error.

the library have 6 drivers, 3 assigned to the media server and the other 3 drivers for master and only 2 drivers assigned por this policy.

anyone knows something

sdo · ‎08-16-2007

You mention O/S updates, NetBackup application software updates, and driver updates. What about firmware updates?

IMHO, all software producers nearly always develop and test on the latest firmware etc...

Have you checked whether there are firmware updates available for HBAs, tape drive units, SAN switches, network switches, NICs, library management units, etc...

Message Edited by sdw303 on 08-16-200712:38 PM

Arella · ‎09-22-2008

sdw303 sounds like most of the veritas support folks. The poor guy said he does not have any switches. Also, the respnses are much like most veritas folks' response. Generalized and creless. What has this world come to?!!!

VOX

Netbackup 5.1 MP6 error