media error but no sense key info

nataras · ‎06-03-2007

Hi,

We face frequent media errors. the follwing is the logs below:

the bkp application error logs are:

5/31/2007 7:41:37 PM tcppapp001-bip crmdbsp01-bvip Error 581318 Media Device FREEZING media id TP2079, External event caused rewind during write, all data on media is lost

5/31/2007 7:41:37 PM tcppapp001-bip crmdbsp01-bvip Error 581316 Backup INF - media write error (84), cannot continue with copy 2

------------------------------------------------------------------

Error for /var/adm/messages from solaris:

May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.warning] WARNING: /pci@8,700000/lpfc@3/st@96,1 (st129):

May 22 03:10:17 tcppapp001 Error for Command: space Error Level: Fatal

May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0

May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number:

May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command

May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0

May 22 03:25:17 tcppapp001 scsi: [ID 107833 kern.warning] WARNING: /pci@8,700000/lpfc@3/st@96,1 (st129):

May 22 03:25:17 tcppapp001 Error for Command: rezero/rewind Error Level: Fatal

May 22 03:25:17 tcppapp001 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0

May 22 03:25:17 tcppapp001 scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number:

May 22 03:25:17 tcppapp001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command

May 22 03:25:17 tcppapp001 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0

------------------------------------------------------------------------

Netbkp log messages from bptm:

01:54:02.764 [17671] <2> io_terminate_tape: absolute block position prior to writing empty header is 125138, copy 1

01:54:02.764 [17671] <2> io_terminate_tape: block position check: actual 125138, expected 125137

01:54:02.764 [17671] <2> set_job_details: Done

01:54:02.765 [17671] <2> logconnections: BPDBM CONNECT FROM 172.16.0.145.49336 TO 172.16.3.123.13721

01:54:02.827 [17671] <16> io_terminate_tape: FREEZING media id TP2379, too many data blocks written, check tape/driver block size configuration

01:54:02.829 [17671] <2> log_media_error: successfully wrote to error file - 06/01/07 01:54:02 TP2379 3 WRITE_ERROR

01:54:02.830 [17671] <2> set_job_details: Done

01:54:02.833 [17671] <2> logconnections: BPDBM CONNECT FROM 172.16.0.145.49337 TO 172.16.3.123.13721

01:54:02.897 [17671] <16> terminate_twin: INF - media write error (84), cannot continue with copy 1

01:54:02.898 [17671] <2> nb_getsockconnected: Connect to intdbsp02-bvip on port 698

01:54:02.899 [17671] <2> logconnections: BPCD CONNECT FROM 172.16.0.145.698 TO 172.16.3.116.13782

01:54:03.093 [17671] <2> logconnections: BPDBM CONNECT FROM 172.16.0.145.49340 TO 172.16.3.123.13721

01:54:03.171 [17671] <2> check_error_history: called from bptm line 18284, EXIT_Status = 84

01:54:03.189 [17671] <2> check_error_history: drive index = 3, media id = TP2379, time = 06/01/07 01:54:03, both_match = 0, media_match = 0, drive_match =

1

01:54:03.189 [17671] <2> tpunmount: tpunmount'ing /usr/openv/netbackup/db/media/tpreq/TP2379

01:54:03.191 [17671] <2> TpUnmountWrapper: SCSI RELEASE

01:54:03.206 [17671] <2> bptm: EXITING with status 84 <----------

Any idea why the snese key is showing all zeros. can some one help me on this?

Regards,

Omar_Villa · ‎06-03-2007

For what I see u have a hardware issue in you SUN box or the tape drive u are ussing need maintenance or be replaced.

May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.warning] WARNING: /pci@8,700000/lpfc@3/st@96,1 (st129):

May 22 03:10:17 tcppapp001 Error for Command: space Error Level: Fatal

May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0

May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number:

May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command

May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0

has u see the /pci@8,700000/lpfc@3/st@96 has a warning, do the following to see if this error is from the same drive or if it haves issues.

First get the WWN of this target (lpfc3t96) to do this run the following:

Get Drive WWN

1. Get the drive path

vmoprcmd -devconfig -l

vmoprcmd -autoconfig -t

2. Go to the drives path, /dev/rmt run i.e. ls -la 10cbn wich is the drive we are looking for, you will get the lpfc address

10cbn -> ../../devices/pci@8,700000/pci@3/lpfc@4/st@12,0:cbn

NOTE: This means lpfc4t12, this will be need it in the next steps

3. Verify the lpfc assignation with

grep \"lpfc\" /etc/path_to_inst

4. Open the /usr/sbin/lpfc/lputil utility

Go to the Persitent Bindings

Select Display all nodes

Select the lpfc address in this cas lpfc4

Then u will get the list of all the bindings, search for the needed target, t12

You must get i.e.:

Mapped FCP Node 12 50-01-04-f0-00-79-09-3b 50-01-04-f0-00-01-02-03

5. Go to /kernel/drv/lpfc.conf and search for the lpfc4t12 path and match it with the persistent binding u get with lputil, i.e.:

"500104f00010203:lpfc4t12"

After this verify the serial number of the drive with the local OS DB and the vmglob with:

vmoprcmd -shmdrive | grep <drivename>

vmoprcmd -autoconfig -t (compare the last command output path to get the drive)

vmglob -listall -java | grep <driveanem> (the serial number must be the same in all the medias you share the drive and between the last two commands)

if u see a different serial number will depend of where u see it, if is in the vmglob command probably u can fix it with:

vmoprcmd -autoconfig "-replace_drive <drivename> -path <path>

or if u see it in the vmoprcmd -shmdrive output, can fix with:

vmoprcmd -devconfig "-update -drive <driveindex> -path <path>

or if in the vmoprcmd -autoconfig output, verify if the drive was change and ensure the OS can see it with the correct path.

in case of any change restart ltid and resync the DB's

vmoprcmd -stopltid

vmoprcmd -startltid

vmoprcmd -autoconfig -sync

and check vmglob shows the same serial number and match the autoconf and devconfig DB.

Hope this helps.

VOX

media error but no sense key info