cancel
Showing results for 
Search instead for 
Did you mean: 

Dell Quantum Ultrium LTO 4 Soft Write Errors BE 12.0 SP2

Josh-SA
Level 3

I have a Dell Quantum Ultrium 4 SAS-attached Tape Drive connected to a Dell PowerEdge 2950 Server. The OS is Windows Server 2008 Enterprise 32-bit and the Veritas program is V12.0 with all updates and latest drivers installed, including SP2.

 

The Tape Drive is brand new and the Media is also brand new. I am encountering numerous soft write errors. Just doing Quick Erase results in around 30 soft write errors. A 1TB backup results in over 18000 soft write errors. I have tested the tape drive and media with various Dell utilities, including Full XTalk Diagnostics, everything passes fine.

 

I have spent hours searching for a solution and tried both Dell and Symantec Drivers. I have tried with SCSI pass-through mode enabled and disabled. I have tried enabling and disabling single block read and write modes. Nothing makes any difference. The only thing I haven’t changed is default block and buffer sizes.

 

ANY assistance would be greatly appreciated!

19 REPLIES 19

dcox
Level 3

Josh, thanks for posting this; I am seeing the same thing (I thought I was alone), let me explain my scenario.

 

I have Windows Server 2008 x64 (all updates)

attached to a Quantum LTO-4 HH [firmware 2170] (using 1RU rack enclosure P/N: TC-L43CN-EY)

external SAS cable attached via LSI SAS3442E-R [firmware 1.26]

Backup Exec 12.5 SP1

 

I get ~9 soft errors on a quick erase.

I get ~4500 soft errors on a 250GB backup.

 

I also have another LTO4-HH drive.  It's running the "shipped" firmware (version not relevant) and seeing the same issues.  So, I can safely say "it's not the drive".

I'm only using quantum tapes, so "it shouldn't be the tapes" we can assume.

I also have a legal copy of BackupExec 12.0, and guess what, yeah same thing, same # of soft errors.

 

Let me further atest, I too have ran the xTalk diagnostics (full diags) and everything tests fine.

I have attmpted to adjust the drive properties in backup exec, just as you explained, no help.

I have also tested the Quantum and Symantec drivers, as well as kernel mode drivers, no help.

 

I am at a loss on this.

I have tried to reformat the machine and start from scratch, problem persists.

 

what controller card areyou using?

 

I also have a second issue, if you could test to see if you have this too, it would be great.

Of course, it may be a 12.5 issue, I haven't tested this issue with 12.0

 

Tape drive is online, tape inserted.

Do a quick erase, all is ok.

Inventory drive, all is ok.

Eject tape, all is ok.

Inventory drive, all is ok.

Re-insert same tape, and now inventory drive, ERROR:  Hardware is offline (etc, etc).

run c:\program files\symantec\backup exec\bestop

run c:\progrma files\symantec\backup exec\bestart

Hardware is backonline, and...

Inventory drive, all is ok.

 

Thanks

Duane

{Removed to prevent spam}

Message Edited by dcox on 01-20-2009 07:53 PM
Message Edited by dcox on 01-20-2009 08:02 PM
Message Edited by IanSee on 01-21-2009 03:44 PM

Josh-SA
Level 3

Hello Duane

 

I just did the test regarding your "second issue". I do not have the same problem - the drive stays online.

 

My Controller Card is a Dell SAS 5/E. Don't have more details on it at moment.

 

Thanks

Josh

dcox
Level 3

Interesting, although possibly pure coincidence (as there are several other common threads between our systems), it would appear that the "Dell SAS 5/E Adapter Controller" is the one and the same LSI SAS3442 that I am using.

I don't have a second controller to test, have you tried that?

 

Thanks for the "second issue" test, I suspect it may be related to 12.5 (I will test 12.0 myself today).

 

Thanks,

Duane

 

dcox
Level 3

I believed I've eliminated my second issue.

It was related to backup exec 12.5 and using the "symantec drivers".

I do not see that issue anymore with quantum driver v3.4

 

But I DO still see the soft errors...

 

I thought I had another controller to try- but at this time, I do not.

Josh-SA
Level 3

Hi

I wish I had another controller card but I do not. I may buy one if I get desperate enough...

Thanks

Josh

dcox
Level 3

Do you have any symantec support?  Have you opened a ticket?

Have you called and if so, what was the response?

Josh-SA
Level 3
No I did not buy "software assurance" at the time since I was foolishly under the impression that this only entitled me to upgrades but that support came "free". So now I cannot open a ticket...

dcox
Level 3

Yeah, I understand that...

Let me open a ticket and create a case tomorrow.

I don't expect to find a solution over the phone, but perhpas I can report this as a bug...

 

Josh-SA
Level 3
Thanks a stack! I suspect if they can solve your problem, the solution will be the same for me. Or perhaps, like you say, it is a bug in which case they will hopefully bring out a patch in the not-too-distant future. Thanks again...

David3133
Level 2

Well, you are onto at least a good start.  THe Xtalk diagnostics can utilize the TapeAlert specification (full specs are at the http://www.tapealert.org site  if you want to understand what everything means.  Did you actually run the tests to verify the tape drive isn't reporting a problem?

 

Anyway, since you mentioned SCSI pass-through, I won't hold back on geeking out.

Here are some things to examine to take it to the next level

* Log pages for the tape target device, so you can see number of blocks read and written along with all the individual error counters. You say "soft error".  That isn't complete, there are many types of errors that people all bundle into the generic "soft error" pneumonic.   What, exactly is the sense ASC/ASQ?

 

 

* Since  SAS attached, there is a lot of info you can look at which will help either eliminate possibilities or identify probabilities.  Since you have a dell, then you probably have the LSI chipset, which means you can query the chip via bios or add on software to look at transport.

 

It could be some screwy mode page settings.  Have you verified that they are approprate?  There is a lot that can go wrong.   I copied this from a site to give you an idea on some of the settings that could be made.   Since I don't own an Ultrium 4 then I have no idea if the settings are all appropriate, but you can do that by asking.   anyway things like burst size and timeout limits and write delay should be checked. 

.

18000 soft write errors for 1TB isn't the end of the world, at least errors are soft.  what is the SAS queue depth? Is it possible you have poor cable and are bouncing between 1.5 and 3.0 Gbit/sec?  Can you get reasonably similar freqency of errors when you repeat a smaller test?  Are your SAS cables and connecters rather long and did you try to save money by getting junk cables?

 

Disconnect-Reconnect                     : Page [02h] (Current)

 Buffer full ratio                       : 0 {R/O}

 Buffer empty ratio                      : 0 {R/O}

 Bus inactivity limit                    : 0 {R/O}

 Disconnect time limit                   : 0

 Connect time limit                      : 0 {R/O}

 Maximum burst size                      : 494

 Enable modify data pointers (EMDP)      : 0 {R/O}

 Fair arbitration                        : 0 {R/O}

 Disconnect immediate (DImm)             : 0 {R/O}

 Data transfer disconnect control (DTDC) : 0 {R/O}

 First burst size                        : 0 {R/O}

 

Data Compression                         : Page [0Fh] (Current)

 DCE                                     : 0 {R/O}

 DCC                                     : 0 {R/O}

 DDE                                     : 0 {R/O}

 RED                                     : 0 {R/O}

 Compression algorithm                   : 00000000h

 Decompression algorithm                 : 00000000h

 

Tape Control                             : Page [10h] (Current)

 Change active partition (CAP)           : 0

 Change active format (CAF)              : 0

 Active format                           : 8

 Active partition                        : 0 {R/O}

 Write buffer full ratio                 : 0 {R/O}

 Read buffer empty ratio                 : 0 {R/O}

 Write delay time                        : 45

 Data buffer recovery (DBR)              : 0 {R/O}

 Block identifiers supported (BIS)       : 1 {R/O}

 Report setmarks (RSMK)                  : 1

 Automatic velocity control (AVC)        : 0 {R/O}

 Stop on consecutive filemarks (SOCF)    : 0 {R/O}

 Recover buffer over (RBO)               : 0 {R/O}

 Recover error warning (REW)             : 0 {R/O}

 Gap size                                : 0 {R/O}

 EOD Defined                             : 0 {R/O}

 Enable EOD generation (EEG)             : 1 {R/O}

 Synchronize early warning (SEW)         : 1 {R/O}

 Soft write protect (SWP)                : 0 {R/O}

 Buffer size at early warning            : 000000h

 Data compression algorithm              : 00h

 Associated write protect (ASOCWP)       : 0 {R/O}

 Persistent write protect (PERSWP)       : 0 {R/O}

 Permanent write protect (PRMWP)         : 0 {R/O}

 

Medium Partition                         : Page [11h] (Current)

 Maximum additional partitions           : 1 {R/O}

 Additional partitions defined           : 0 {R/O}

 Fixed data partitions (FDP)             : 0 {R/O}

 Select data partitions (SDP)            : 0 {R/O}

 Initiator-defined partitions (IDP)      : 0 {R/O}

 Partition size unit-of-measure (PSUM)   : 2 {R/O}

 Partition on format (POFM)              : 0 {R/O}

 CLEAR                                   : 0 {R/O}

 ADDP                                    : 0 {R/O}

 Medium format recognition               : 03h

 Partition Units                         : 0 {R/O}

 

 

 

David3133
Level 2

I forgot to add, don't waste money on a new controller until you make sure you have current firmware.  The software drivers for the LSI-manufactured SAS chipsets are relatively stable.  There are some "issues" with older firmware that can be root cause.   I would look into SAS firmware before going further.

 

dcox
Level 3
I'm running the latest firmware on both the controller and the tape drive.  I understand your intensions are good, but I suspect the problem to be with the backup exec program itself.  I'm suspecting the problem is with scsi commands, but I'm not sure what software I can run to dig deeper... any recommendations?

dcox
Level 3

I forgot to mention, since I have two of these drives (quantum lto4-hh) I took the other drive and attached it via an internal cable to the internal connector on the same LSI controller.  I wanted to eleminate my cable setup as the problem.  I ran the same kinds of test, quick erase and backup, and incurred the same amount of errors.  It's not the cables and it's not the drives, both scenarios accumulated the same amount of errors.

 

Duane

 

marcusdempsey
Level 3

Hi,

I have exactly the same issue where I experience a hugh amount of soft read/write errors, 2 x quantium lto4-hh drives unclosed in a powervault 114T enclosure.  I have backup exec 12.5 SP1 and a Dell SAS5/E Controller which the drives are connected to.

My firmware is currently up to date on both the drives and controller.  I'm currently talking to Dell about this issue but no luck as of yet.

Has anyone made anymore progress on this.

Marcus 

thegoolsby
Level 6
Employee Accredited

Hello all.

We are working on getting this issue addressed. There is a public document that you can all subscribe to for this issue:

Large number of soft-write errors are reported when running tape operations.
http://support.veritas.com/docs/321554

Please make sure that you have the Verify option enabled on your backup jobs, even after we resolve this. That is the most accurate way to ensure that the data is being written properly to your backup media, and the most efficient way to be made aware of a failing drive or bad media.

Thanks to those who brought this to our attention. You can safely disregard these errors for the time being, it appears to be just a reporting issue on our part, and something we hope to have resolved soon.

-Collin

 

marcusdempsey
Level 3
Collin,

No pressure on you, but do you have a estimated time for resolving this issue?  and will it be resolved as a hotfix/service pack?

Marcus

thegoolsby
Level 6
Employee Accredited
Marcus,

If I promised a date, Murphy's Law would strike and the patch wouldn't get released - so I unfortunately cannot do that.

I do feel comfortable saying that it shouldn't be long and the development has already begun working on the fix. It should come down as a hotfix in Live Update.

Whatever the final game plan ends up being, the tech note will get updated to reflect the change.

-Collin

phamen
Not applicable

Can we get an update on this please. It has been two weeks.

 

 

Thanks

Phamen

639102202
Not applicable
I am seeing the exact same scenario, same drive and Dell cards, tons of Soft read errors; though my backups seems to be failing with:

E00084c7- Read/Write errors have occured
E00084ec - Read/Write errors have occured

The tape drive is about 1 month old and I have tried numerous brand new tapes.  BUE 12.5 just went to SP3.