11-25-2014 05:44 PM
Hi,
I have a media server with 2 drives - A and B. All the while it was running fine until recently Drive A always encountered error drive index 0, Data error (cyclic redundancy check). It was no problem when backup small to medium data, but when it backup huge data, it always starting to slow down around 1.3TB and drag until the end or failed with above error.
OS - Windows 2008 R2
Netbackup 7.5.0.4
Event log error - The device, \Device\Tape0, has a bad block or The device, \Device\Tape1, has a bad block.
Can you advice if this is the tape drive faulty or some other issue.
Thanks and regards,
Kwan
Solved! Go to Solution.
11-25-2014 08:59 PM
Looks like faulty drive, but logs will confirm.
Check if bptm log folder exists on media server under netbackup\logs. If not, create the folder.
We are looking for TapeAlert errors/messages in the log.
Add VERBOSE entry to volmgr\vm.conf and restart NBU Device Manager service. This will increase logging to Event Viewer System and Application log.
Check ..\netbackup\db\media\errors file.
Please post entries for last couple of days.
This will give some indication of where/what the problem is.
11-26-2014 01:57 AM
Did somebody call me ... ;0)
The tape alerts translate to :
0x24000000 0x02000000
Flag 3: Hard error. Severity: Warning
Flag 6: Write failure. Severity: Critical
Flag 39: Diagnostics required. Severity: Warning
0x34000000 0x00000000
Flag 3: Hard error. Severity: Warning
Flag 4: Medium. Severity: Critical
Flag 6: Write failure. Severity: Critical
0x00800000 0x00000000
Flag 9: Write protect. Severity: Critical
We can ignore the last one (write protect) as this is not connect.
In summary, I would say you have a faulty drive. I have never in >10 years seen a drive throw a CRC error that didn't turn out to be faulty hardware.
Regards,
Martin
11-25-2014 08:59 PM
Looks like faulty drive, but logs will confirm.
Check if bptm log folder exists on media server under netbackup\logs. If not, create the folder.
We are looking for TapeAlert errors/messages in the log.
Add VERBOSE entry to volmgr\vm.conf and restart NBU Device Manager service. This will increase logging to Event Viewer System and Application log.
Check ..\netbackup\db\media\errors file.
Please post entries for last couple of days.
This will give some indication of where/what the problem is.
11-25-2014 11:33 PM
Hi Marianne,
Thanks for the reply. Here ges the errors file message.
11/03/14 09:33:41 0149L5 1 WRITE_ERROR IBM.ULT3580-HH5.004
11/03/14 09:33:51 0149L5 1 TAPE_ALERT IBM.ULT3580-HH5.004 0x00800000 0x00000000
11/10/14 12:46:47 0119L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/10/14 12:46:53 0119L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/13/14 07:54:18 0027L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/13/14 07:54:20 0027L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/17/14 12:34:49 0237L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/17/14 12:34:51 0237L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/20/14 15:46:00 0026L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/20/14 15:46:02 0026L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/22/14 09:39:27 0031L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/22/14 09:39:29 0031L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/23/14 17:49:57 0032L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/23/14 17:49:59 0032L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x34000000 0x00000000
11/24/14 04:39:21 0238L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/24/14 04:39:23 0238L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/24/14 12:21:57 0239L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/24/14 12:21:59 0239L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/26/14 14:22:34 0024L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
Regards,
Kwan
11-26-2014 12:31 AM
What is logged in your system messages or event logs at the times these Tape alerts are appearing?
I'd say for almost sure that this is hardware - the issue is coming from the tape dirve hence the tape alerts.
11-26-2014 12:35 AM
Hopefully Martin will be along soon...
Quite a busy day ahead - will check a bit later....
Docs that you can review in the meantime:
http://www.symantec.com/docs/TECH48603
http://www.t10.org/ftp/t10/document.02/02-142r0.pdf
11-26-2014 12:51 AM
Tape Alert Technote:
http://www.symantec.com/docs/TECH124594
11-26-2014 01:57 AM
Did somebody call me ... ;0)
The tape alerts translate to :
0x24000000 0x02000000
Flag 3: Hard error. Severity: Warning
Flag 6: Write failure. Severity: Critical
Flag 39: Diagnostics required. Severity: Warning
0x34000000 0x00000000
Flag 3: Hard error. Severity: Warning
Flag 4: Medium. Severity: Critical
Flag 6: Write failure. Severity: Critical
0x00800000 0x00000000
Flag 9: Write protect. Severity: Critical
We can ignore the last one (write protect) as this is not connect.
In summary, I would say you have a faulty drive. I have never in >10 years seen a drive throw a CRC error that didn't turn out to be faulty hardware.
Regards,
Martin
11-26-2014 03:45 AM
Runnning the errors file snippet through tperr.sh (available in the forun, just search for it) ...
0031L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0032L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0024L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 1)
0026L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0027L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0119L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0237L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0238L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0239L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0149L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
IBM.ULT3580-HH5.004 has had errors with 1 different tapes (Total occurrences (errors) for this drive is 2)
IBM.ULT3580-HH5.002 has had errors with 9 different tapes (Total occurrences (errors) for this drive is 17)
So I think it is fairly clear the drive, not the tapes are having issues.
11-26-2014 03:46 AM
This is how you convert them:
0x24000000 0x02000000
Write out each digit in dec. (so 0010 = 2, 0100 = 4)
0x24000000
0010 0100 0000 0000 0000 0000 0000 0000
0x02000000
0000 0010 0000 0000 0000 0000 0000 0000
Write them out, side by side
From left to write, count the positions of the 1's - so there is a one as the 3rd and 6th and 39th digits from the left, hence flags 1 and 3 and 39
0010 0100 0000 0000 0000 0000 0000 0000 0000 0010 0000 0000 0000 0000 0000 0000
So we now look in a document such as this:
http://www.t10.org/ftp/t10/document.02/02-142r0.pdf
We seee here that flags 1, 3 and 39 are :
Flag 1
Read Warning
The tape drive is having problems reading data. No data has
been lost, but there has been a reduction in the performance of
the tape.
Flag 3
Hard Error
The operation has stopped because an error has occurred while
reading or writing data which the drive cannot correct.
Flag 39
Diagnostics
Required
The tape drive may have a fault. Check for availability of
diagnostic information and run extended diagnostics if
applicable.
Check the tape drive users manual for instructions on running
extended diagnostic tests and retrieving diagnostic data
11-27-2014 05:19 PM
Hi All,
Thanks for your help. We end up replacing both tape drive. Drive B encountered media position error, so we opted to change both.
Thanks.