Solved: drive index 0, Data error (cyclic redundancy chec...

dskwan · ‎11-25-2014

Hi,

I have a media server with 2 drives - A and B. All the while it was running fine until recently Drive A always encountered error drive index 0, Data error (cyclic redundancy check). It was no problem when backup small to medium data, but when it backup huge data, it always starting to slow down around 1.3TB and drag until the end or failed with above error.

OS - Windows 2008 R2

Netbackup 7.5.0.4

Event log error - The device, \Device\Tape0, has a bad block or The device, \Device\Tape1, has a bad block.

Can you advice if this is the tape drive faulty or some other issue.

Thanks and regards,

Kwan

Marianne · ‎11-25-2014

Looks like faulty drive, but logs will confirm.

Check if bptm log folder exists on media server under netbackup\logs. If not, create the folder.
We are looking for TapeAlert errors/messages in the log.

Add VERBOSE entry to volmgr\vm.conf and restart NBU Device Manager service. This will increase logging to Event Viewer System and Application log.

Check ..\netbackup\db\media\errors file.
Please post entries for last couple of days.
This will give some indication of where/what the problem is.

Handy NetBackup Links

View solution in original post

mph999 · ‎11-26-2014

Did somebody call me ... ;0)

The tape alerts translate to :

0x24000000 0x02000000

Flag 3: Hard error. Severity: Warning
Flag 6: Write failure. Severity: Critical
Flag 39: Diagnostics required. Severity: Warning

0x34000000 0x00000000

Flag 3: Hard error. Severity: Warning
Flag 4: Medium. Severity: Critical
Flag 6: Write failure. Severity: Critical

0x00800000 0x00000000

Flag 9: Write protect. Severity: Critical

We can ignore the last one (write protect) as this is not connect.

In summary, I would say you have a faulty drive. I have never in >10 years seen a drive throw a CRC error that didn't turn out to be faulty hardware.

Regards,

Martin

View solution in original post

Marianne · ‎11-25-2014

Looks like faulty drive, but logs will confirm.

Check if bptm log folder exists on media server under netbackup\logs. If not, create the folder.
We are looking for TapeAlert errors/messages in the log.

Add VERBOSE entry to volmgr\vm.conf and restart NBU Device Manager service. This will increase logging to Event Viewer System and Application log.

Check ..\netbackup\db\media\errors file.
Please post entries for last couple of days.
This will give some indication of where/what the problem is.

Handy NetBackup Links

dskwan · ‎11-25-2014

Hi Marianne,

Thanks for the reply. Here ges the errors file message.

11/03/14 09:33:41 0149L5 1 WRITE_ERROR IBM.ULT3580-HH5.004
11/03/14 09:33:51 0149L5 1 TAPE_ALERT IBM.ULT3580-HH5.004 0x00800000 0x00000000
11/10/14 12:46:47 0119L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/10/14 12:46:53 0119L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/13/14 07:54:18 0027L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/13/14 07:54:20 0027L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/17/14 12:34:49 0237L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/17/14 12:34:51 0237L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/20/14 15:46:00 0026L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/20/14 15:46:02 0026L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/22/14 09:39:27 0031L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/22/14 09:39:29 0031L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/23/14 17:49:57 0032L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/23/14 17:49:59 0032L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x34000000 0x00000000
11/24/14 04:39:21 0238L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/24/14 04:39:23 0238L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/24/14 12:21:57 0239L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/24/14 12:21:59 0239L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/26/14 14:22:34 0024L5 0 WRITE_ERROR IBM.ULT3580-HH5.002

Regards,

Kwan

revarooo · ‎11-26-2014

What is logged in your system messages or event logs at the times these Tape alerts are appearing?

I'd say for almost sure that this is hardware - the issue is coming from the tape dirve hence the tape alerts.

Marianne · ‎11-26-2014

Hopefully Martin will be along soon...

Quite a busy day ahead - will check a bit later....

Docs that you can review in the meantime:
http://www.symantec.com/docs/TECH48603
http://www.t10.org/ftp/t10/document.02/02-142r0.pdf

Handy NetBackup Links

revarooo · ‎11-26-2014

Tape Alert Technote:

http://www.symantec.com/docs/TECH124594

mph999 · ‎11-26-2014

Did somebody call me ... ;0)

The tape alerts translate to :

0x24000000 0x02000000

Flag 3: Hard error. Severity: Warning
Flag 6: Write failure. Severity: Critical
Flag 39: Diagnostics required. Severity: Warning

0x34000000 0x00000000

Flag 3: Hard error. Severity: Warning
Flag 4: Medium. Severity: Critical
Flag 6: Write failure. Severity: Critical

0x00800000 0x00000000

Flag 9: Write protect. Severity: Critical

We can ignore the last one (write protect) as this is not connect.

In summary, I would say you have a faulty drive. I have never in >10 years seen a drive throw a CRC error that didn't turn out to be faulty hardware.

Regards,

Martin

mph999 · ‎11-26-2014

Runnning the errors file snippet through tperr.sh (available in the forun, just search for it) ...

0031L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0032L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0024L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 1)
0026L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0027L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0119L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0237L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0238L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0239L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)
0149L5 has had errors in 1 different drives (Total occurrences (errors) of this volume is 2)

IBM.ULT3580-HH5.004 has had errors with 1 different tapes (Total occurrences (errors) for this drive is 2)
IBM.ULT3580-HH5.002 has had errors with 9 different tapes (Total occurrences (errors) for this drive is 17)

So I think it is fairly clear the drive, not the tapes are having issues.

mph999 · ‎11-26-2014

This is how you convert them:

0x24000000 0x02000000

Write out each digit in dec. (so 0010 = 2, 0100 = 4)
0x24000000
0010 0100 0000 0000 0000 0000 0000 0000

0x02000000
0000 0010 0000 0000 0000 0000 0000 0000

Write them out, side by side

From left to write, count the positions of the 1's - so there is a one as the 3rd and 6th and 39th digits from the left, hence flags 1 and 3 and 39

0010 0100 0000 0000 0000 0000 0000 0000 0000 0010 0000 0000 0000 0000 0000 0000

So we now look in a document such as this:

http://www.t10.org/ftp/t10/document.02/02-142r0.pdf

We seee here that flags 1, 3 and 39 are :

Flag 1
Read Warning
The tape drive is having problems reading data. No data has
been lost, but there has been a reduction in the performance of
the tape.

Flag 3
Hard Error
The operation has stopped because an error has occurred while
reading or writing data which the drive cannot correct.

Flag 39

Diagnostics
Required

The tape drive may have a fault. Check for availability of
diagnostic information and run extended diagnostics if
applicable.
Check the tape drive users manual for instructions on running
extended diagnostic tests and retrieving diagnostic data

dskwan · ‎11-27-2014

Hi All,

Thanks for your help. We end up replacing both tape drive. Drive B encountered media position error, so we opted to change both.

Thanks.

VOX

drive index 0, Data error (cyclic redundancy check).