cancel
Showing results for 
Search instead for 
Did you mean: 

drive index 0, Data error (cyclic redundancy check).

dskwan
Level 3

Hi,

 

I have a media server with 2 drives - A and B. All the while it was running fine until recently Drive A always encountered error  drive index 0, Data error (cyclic redundancy check). It was no problem when backup small to medium data, but when it backup huge data, it always starting to slow down around 1.3TB and drag until the end or failed with above error.

 

OS - Windows 2008 R2

Netbackup 7.5.0.4

Event log error - The device, \Device\Tape0, has a bad block or The device, \Device\Tape1, has a bad block.

 

Can you advice if this is the tape drive faulty or some other issue.

 

Thanks and regards,

Kwan

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

Marianne
Level 6
Partner    VIP    Accredited Certified

Looks like faulty drive, but logs will confirm.

Check if bptm log folder exists on media server under netbackup\logs. If not, create the folder.
We are looking for TapeAlert errors/messages in the log.

Add VERBOSE entry to volmgr\vm.conf and restart NBU Device Manager service. This will increase logging to Event Viewer System and Application log.

Check ..\netbackup\db\media\errors file.
Please post entries for last couple of days. 
This will give some indication of where/what the problem is.

View solution in original post

mph999
Level 6
Employee Accredited

Did somebody call me ... ;0)

The tape alerts translate to :

0x24000000 0x02000000

Flag 3: Hard error. Severity: Warning
Flag 6: Write failure. Severity: Critical
Flag 39: Diagnostics required. Severity: Warning

 

0x34000000 0x00000000

Flag 3: Hard error. Severity: Warning
Flag 4: Medium. Severity: Critical
Flag 6: Write failure. Severity: Critical

 

0x00800000 0x00000000

Flag 9: Write protect. Severity: Critical

We can ignore the last one (write protect) as this is not connect.

In summary, I would say you have a faulty drive.  I have never in >10 years seen a drive throw a CRC error that didn't turn out to be faulty hardware.

Regards,

Martin

 

 

View solution in original post

9 REPLIES 9

Marianne
Level 6
Partner    VIP    Accredited Certified

Looks like faulty drive, but logs will confirm.

Check if bptm log folder exists on media server under netbackup\logs. If not, create the folder.
We are looking for TapeAlert errors/messages in the log.

Add VERBOSE entry to volmgr\vm.conf and restart NBU Device Manager service. This will increase logging to Event Viewer System and Application log.

Check ..\netbackup\db\media\errors file.
Please post entries for last couple of days. 
This will give some indication of where/what the problem is.

dskwan
Level 3

Hi Marianne,

Thanks for the reply. Here ges the errors file message.

11/03/14 09:33:41 0149L5 1 WRITE_ERROR IBM.ULT3580-HH5.004
11/03/14 09:33:51 0149L5 1 TAPE_ALERT IBM.ULT3580-HH5.004 0x00800000 0x00000000
11/10/14 12:46:47 0119L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/10/14 12:46:53 0119L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/13/14 07:54:18 0027L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/13/14 07:54:20 0027L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/17/14 12:34:49 0237L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/17/14 12:34:51 0237L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/20/14 15:46:00 0026L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/20/14 15:46:02 0026L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/22/14 09:39:27 0031L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/22/14 09:39:29 0031L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/23/14 17:49:57 0032L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/23/14 17:49:59 0032L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x34000000 0x00000000
11/24/14 04:39:21 0238L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/24/14 04:39:23 0238L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/24/14 12:21:57 0239L5 0 WRITE_ERROR IBM.ULT3580-HH5.002
11/24/14 12:21:59 0239L5 0 TAPE_ALERT IBM.ULT3580-HH5.002 0x24000000 0x02000000
11/26/14 14:22:34 0024L5 0 WRITE_ERROR IBM.ULT3580-HH5.002

Regards,

Kwan

revarooo
Level 6
Employee

What is logged in your system messages or event logs at the times these Tape alerts are appearing?

I'd say for almost sure that this is hardware - the issue is coming from the tape dirve hence the tape alerts.

Marianne
Level 6
Partner    VIP    Accredited Certified

Hopefully Martin will be along soon...

Quite a busy day ahead - will check a bit later....

 

Docs that you can review in the meantime:
http://www.symantec.com/docs/TECH48603 
http://www.t10.org/ftp/t10/document.02/02-142r0.pdf

revarooo
Level 6
Employee

Tape Alert Technote:

 

http://www.symantec.com/docs/TECH124594

 

mph999
Level 6
Employee Accredited

Did somebody call me ... ;0)

The tape alerts translate to :

0x24000000 0x02000000

Flag 3: Hard error. Severity: Warning
Flag 6: Write failure. Severity: Critical
Flag 39: Diagnostics required. Severity: Warning

 

0x34000000 0x00000000

Flag 3: Hard error. Severity: Warning
Flag 4: Medium. Severity: Critical
Flag 6: Write failure. Severity: Critical

 

0x00800000 0x00000000

Flag 9: Write protect. Severity: Critical

We can ignore the last one (write protect) as this is not connect.

In summary, I would say you have a faulty drive.  I have never in >10 years seen a drive throw a CRC error that didn't turn out to be faulty hardware.

Regards,

Martin

 

 

mph999
Level 6
Employee Accredited

 


Runnning the errors file snippet through tperr.sh (available in the forun, just search for it) ...

0031L5 has had errors in 1 different drives   (Total occurrences (errors) of this volume is 2)
0032L5 has had errors in 1 different drives   (Total occurrences (errors) of this volume is 2)
0024L5 has had errors in 1 different drives   (Total occurrences (errors) of this volume is 1)
0026L5 has had errors in 1 different drives   (Total occurrences (errors) of this volume is 2)
0027L5 has had errors in 1 different drives   (Total occurrences (errors) of this volume is 2)
0119L5 has had errors in 1 different drives   (Total occurrences (errors) of this volume is 2)
0237L5 has had errors in 1 different drives   (Total occurrences (errors) of this volume is 2)
0238L5 has had errors in 1 different drives   (Total occurrences (errors) of this volume is 2)
0239L5 has had errors in 1 different drives   (Total occurrences (errors) of this volume is 2)
0149L5 has had errors in 1 different drives   (Total occurrences (errors) of this volume is 2)


IBM.ULT3580-HH5.004 has had errors with 1 different tapes   (Total occurrences (errors) for this drive is 2)
IBM.ULT3580-HH5.002 has had errors with 9 different tapes   (Total occurrences (errors) for this drive is 17)

So I think it is fairly clear the drive, not the tapes are having issues.

mph999
Level 6
Employee Accredited


This is how you convert them:

0x24000000 0x02000000

 

Write out each digit in dec. (so 0010 = 2, 0100 = 4)
0x24000000
0010 0100 0000 0000 0000 0000 0000 0000

0x02000000
0000 0010 0000 0000 0000 0000 0000 0000

Write them out, side by side

From left to write, count the positions of the 1's - so there is a one as the 3rd and 6th and 39th digits from the left, hence flags 1 and 3 and 39

0010  0100  0000  0000  0000   0000  0000  0000  0000  0010 0000 0000 0000 0000 0000 0000

So  we now look in a document such as this:

http://www.t10.org/ftp/t10/document.02/02-142r0.pdf 

We seee here that flags 1, 3 and 39 are :

Flag 1
Read Warning 
The tape drive is having problems reading data. No data has
been lost, but there has been a reduction in the performance of
the tape.

Flag 3
Hard Error
The operation has stopped because an error has occurred while
reading or writing data which the drive cannot correct.

Flag 39

Diagnostics
Required


The tape drive may have a fault. Check for availability of
diagnostic information and run extended diagnostics if
applicable.
Check the tape drive users manual for instructions on running
extended diagnostic tests and retrieving diagnostic data

dskwan
Level 3

Hi All,

 

Thanks for your help. We end up replacing both tape drive. Drive B encountered media position error, so we opted to change both.

 

Thanks.