06-03-2007 01:42 AM
Hi,
We face frequent media errors. the follwing is the logs below:
the bkp application error logs are:
5/31/2007 7:41:37 PM tcppapp001-bip crmdbsp01-bvip Error 581318 Media Device FREEZING media id TP2079, External event caused rewind during write, all data on media is lost
5/31/2007 7:41:37 PM tcppapp001-bip crmdbsp01-bvip Error 581316 Backup INF - media write error (84), cannot continue with copy 2
------------------------------------------------------------------
Error for /var/adm/messages from solaris:
May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.warning] WARNING: /pci@8,700000/lpfc@3/st@96,1 (st129):
May 22 03:10:17 tcppapp001 Error for Command: space Error Level: Fatal
May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number:
May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command
May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0
May 22 03:25:17 tcppapp001 scsi: [ID 107833 kern.warning] WARNING: /pci@8,700000/lpfc@3/st@96,1 (st129):
May 22 03:25:17 tcppapp001 Error for Command: rezero/rewind Error Level: Fatal
May 22 03:25:17 tcppapp001 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
May 22 03:25:17 tcppapp001 scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number:
May 22 03:25:17 tcppapp001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command
May 22 03:25:17 tcppapp001 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0
------------------------------------------------------------------------
Netbkp log messages from bptm:
01:54:02.764 [17671] <2> io_terminate_tape: absolute block position prior to writing empty header is 125138, copy 1
01:54:02.764 [17671] <2> io_terminate_tape: block position check: actual 125138, expected 125137
01:54:02.764 [17671] <2> set_job_details: Done
01:54:02.765 [17671] <2> logconnections: BPDBM CONNECT FROM 172.16.0.145.49336 TO 172.16.3.123.13721
01:54:02.827 [17671] <16> io_terminate_tape: FREEZING media id TP2379, too many data blocks written, check tape/driver block size configuration
01:54:02.829 [17671] <2> log_media_error: successfully wrote to error file - 06/01/07 01:54:02 TP2379 3 WRITE_ERROR
01:54:02.830 [17671] <2> set_job_details: Done
01:54:02.833 [17671] <2> logconnections: BPDBM CONNECT FROM 172.16.0.145.49337 TO 172.16.3.123.13721
01:54:02.897 [17671] <16> terminate_twin: INF - media write error (84), cannot continue with copy 1
01:54:02.898 [17671] <2> nb_getsockconnected: Connect to intdbsp02-bvip on port 698
01:54:02.899 [17671] <2> logconnections: BPCD CONNECT FROM 172.16.0.145.698 TO 172.16.3.116.13782
01:54:03.093 [17671] <2> logconnections: BPDBM CONNECT FROM 172.16.0.145.49340 TO 172.16.3.123.13721
01:54:03.171 [17671] <2> check_error_history: called from bptm line 18284, EXIT_Status = 84
01:54:03.189 [17671] <2> check_error_history: drive index = 3, media id = TP2379, time = 06/01/07 01:54:03, both_match = 0, media_match = 0, drive_match =
1
01:54:03.189 [17671] <2> tpunmount: tpunmount'ing /usr/openv/netbackup/db/media/tpreq/TP2379
01:54:03.191 [17671] <2> TpUnmountWrapper: SCSI RELEASE
01:54:03.206 [17671] <2> bptm: EXITING with status 84 <----------
Any idea why the snese key is showing all zeros. can some one help me on this?
Regards,
06-03-2007 09:19 PM
May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.warning] WARNING: /pci@8,700000/lpfc@3/st@96,1 (st129):
May 22 03:10:17 tcppapp001 Error for Command: space Error Level: Fatal
May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number:
May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] Sense Key: Aborted Command
May 22 03:10:17 tcppapp001 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0
has u see the /pci@8,700000/lpfc@3/st@96 has a warning, do the following to see if this error is from the same drive or if it haves issues.
First get the WWN of this target (lpfc3t96) to do this run the following:
Get Drive WWN
1. Get the drive path
vmoprcmd -devconfig -l
vmoprcmd -autoconfig -t
2. Go to the drives path, /dev/rmt run i.e. ls -la 10cbn wich is the drive we are looking for, you will get the lpfc address
10cbn -> ../../devices/pci@8,700000/pci@3/lpfc@4/st@12,0:cbn
NOTE: This means lpfc4t12, this will be need it in the next steps
3. Verify the lpfc assignation with
grep \"lpfc\" /etc/path_to_inst
4. Open the /usr/sbin/lpfc/lputil utility
Go to the Persitent Bindings
Select Display all nodes
Select the lpfc address in this cas lpfc4
Then u will get the list of all the bindings, search for the needed target, t12
You must get i.e.:
Mapped FCP Node 12 50-01-04-f0-00-79-09-3b 50-01-04-f0-00-01-02-03
5. Go to /kernel/drv/lpfc.conf and search for the lpfc4t12 path and match it with the persistent binding u get with lputil, i.e.:
"500104f00010203:lpfc4t12"
After this verify the serial number of the drive with the local OS DB and the vmglob with:
vmoprcmd -shmdrive | grep <drivename>
vmoprcmd -autoconfig -t (compare the last command output path to get the drive)
vmglob -listall -java | grep <driveanem> (the serial number must be the same in all the medias you share the drive and between the last two commands)
if u see a different serial number will depend of where u see it, if is in the vmglob command probably u can fix it with:
vmoprcmd -autoconfig "-replace_drive <drivename> -path <path>
or if u see it in the vmoprcmd -shmdrive output, can fix with:
vmoprcmd -devconfig "-update -drive <driveindex> -path <path>
or if in the vmoprcmd -autoconfig output, verify if the drive was change and ensure the OS can see it with the correct path.
in case of any change restart ltid and resync the DB's
vmoprcmd -stopltid
vmoprcmd -startltid
vmoprcmd -autoconfig -sync
and check vmglob shows the same serial number and match the autoconf and devconfig DB.
Hope this helps.