cancel
Showing results for 
Search instead for 
Did you mean: 

Backup Exec 2012 Offlining HP Tape Drive Randomly

Matt_Freestyle
Level 4

Hi All. First post here and I am really, really hoping for some help.

I am working on an environment that appears to have the fairly common problem of Backup Exec offlining an HP Autoloader at seemingly random times. Now, three weeks ago another tech fixed this issue by disabling 4 recommended HP services and it worked fine. The other night, the issue re-appeared despite nothing having changed - wierd.

Here is the setup:

Server 2K8R2 BE 2012 SP1
HP Ultrium1760 DRV tape device
HP 1/8 G2 Autoloader
HP P212 SaS controller

Here is the (compehensive) list of things I / we have tried to no avail:

1 - Replaced all hardware at a very early stage of the problem (has been ongoing for months now)
2 - Ensured Backup Exec's own drivers are being used for the device
3 - Set DB Maintenance to run way outside of the backup routine
4 - Un-installed and re-installed tape drive + autoloader
5 - Disable HP services as recommended by Symantec
6 - Attempt to backup to disk - this works
7 - Opened 3 seperate cases with Symantec, none of which fully resolved the issue as it re-occured
8 - Checked the ADAMM.log file and found this error just before the device offlined: 

[4608] 08/02/12 01:15:04.389 DeviceIo: 04:07:00:00 - Device error 1167 on "\\.\Tape0", SCSI cmd 4d, 1 total errors
[4608] 08/02/12 01:15:13.063 PvlDrive::DisableAccess() - ReserveDevice failed, offline device
Drive = 1033 "Tape drive 0001"
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

9 - Tried running SGMON with Verbose devices and media logging enabled - this didnt go well. The device didnt error but none of the backup jobs completed.

The only thing I think we havent tried is running tracer.exe after seeing the device go offline. 

I read in Symantecs documentation that they don't support SaS controllers with RAID enabled, the P212 is one of those controllers. However, in the same document there was a list of tested controllers and the P212 is on that list, so I'd be a little cheesed if Symantec said it was a compatibility problem despite having a document that says it should work.

Ive also checked all the usuale places (event log, BE job / device logs etc) for more info and there really isnt much to go on. HP's testing tools always come back with passes when run, Im 99% sure that this isnt a hardware issue but have nowhere left to go with it.

Could someone please help me out? 

P.S. just as an FYI - this Backup Exec instance came from an upgrade of 2010, it wasnt a brand new install.. Don't know if that matters.

Thanks in advance,

Matt

1 ACCEPTED SOLUTION

Accepted Solutions

Matt_Freestyle
Level 4

Hi All,

Sorry to drag this up from the grave. Also, sorry for not responding. The notification of replies started going into my junk for some reason!

Okay so I never did get the issue resolved on the existing hardware, in the end we ended up with HP diagnosing a "low level hardware issue". We shipped the customer an old server, whacked a new P212 RAID controller into it and connected the library up to it. So far, so good. It's been working for over a month now.

Seems the most likely cause of these SCSI reservation errors is as HP say, a low level hardware issue.

Thanks,

Matt.

View solution in original post

31 REPLIES 31

Backup_Exec1
Level 6
Employee Accredited Certified
Hi Please ensure backup exec 2012 is fully patched up with sp1a and latest ddi. If you have not installed latest ddi yet please install it from below link http://www.symantec.com/docs/TECH189571 Also once you do that unistall and reinstall tape drive using tapeinst and then do power cycle by powering off library and then media server and then power on library wait for it to intialize and then power on media server http://www.symantec.com/docs/TECH17931 Thanks

Larry_Fine
Moderator
Moderator
   VIP   

Backup Exec tried to reserve the device and it was rejected.  Therefore BE took the device offline.

[4608] 08/02/12 01:15:04.389 DeviceIo: 04:07:00:00 - Device error 1167 on "\\.\Tape0", SCSI cmd 4d, 1 total errors
[4608] 08/02/12 01:15:13.063 PvlDrive::DisableAccess() - ReserveDevice failed, offline device

This issue is much more likely in a shared SAN, with multiple servers trying to access & share devices.  On a single server SAS environment, this should not happen.  I know your P212 HBA is on that supported list, but that is where I would focus, as something is interfering with communication.  Is your HBA firmware and driver up to date?  I have heard of issues with HP software & services also.

Might you have another HBA to try?

CraigV
Moderator
Moderator
Partner    VIP    Accredited

...Larry is thinking of the HP Storage Agents. If you have a ProLiant server that was installed via SmartStart, stop and disable this service.

Might also be worth your while to get hold of HP's Library and Tape Tools. Stop the BE services, and run the diagnostics against the drive to rule out hardware errors on it.

Thanks!

Matt_Freestyle
Level 4

Thanks for the replies guys, sorry for the delay in coming back.

Backup Exec is fully patched yes.

I was headed toward the HBA as well. I have just checked the server over again, there is a second P212 controller controller a seperate set of disks in a RAID - could this have some sort of impact do you think?

I have tried the tape tools Craig, they all came back fine. The drive was replaced ages ago, back in Jan because Symantec put the issue down to a hardware fault. HP didnt quibble luckily and just replaced it, but the issue has since reared its ugly head again.

Im unsure of server vendor or if it was installed via SmartStart if it was HP - will check that this afternoon and report back. Thanks for the replies so far please keep them coming! Any ideas appreciated!!!

CraigV
Moderator
Moderator
Partner    VIP    Accredited

No, it shouldn't have any connection to your issue if HDDs are connected to another RAID controller.

You don't perhaps have access to a dedicated SAS HBA for the drive? If so, you can always connect the drive to this and check again.

 

PS: Would the "clever" person who -1'd me please take the time to PM me and explain why...indecision

Matt_Freestyle
Level 4

I must admit, I couldn't see any reason for the -1 either, Craig!

Unfortunately Craig a dedicated HBA isn't an option - We don't have one in the lab anywhere here and neither does the site I am working on. Also, I would have to down the server in production hours and this too, isnt an option. (tricky, I know!)

I'll double check the firmware this afternoon as well whilst im at it, annoyingly the tape drive didn't offline Thurs or at all over the weekend when full backups were running, the only things that I changed were the DB maint times and the servers NIC power management settings.. Can't see how the latter wuold change anything at all if the device isnt reserving correctly, but ho hum.

Thanks for the continued assistance.

Matt_Freestyle
Level 4

Chaps - we have a breakthrough! I checked the firmware version of the card and sure enough, its out of date, massively out of date! Then I found this document from HP... http://tinyurl.com/ceft8fk

It basicly explains how any firmware before 3.66 can have sporadic connectivity issues with tape drives.. RESULT! Well.. Sort of.. I need to get the firmware installed on site and then I will update this post with my findings. Fingers crossed this sorts it.

Symantec support - if you read this.. Please please please KB it and use it for future reference, I bet a lot of people that are having these sorts of connectivity issues are suffering exactly the same problem!

CraigV
Moderator
Moderator
Partner    VIP    Accredited

Good stuff...with regards to Symantec creating a KB of this, why not head to the Ideas section and add it in as an idea?

https://www-secure.symantec.com/connect/backup-and-recovery/ideas

Turls
Level 6

Matt, your information is included in the following Technote..

http://www.symantec.com/docs/TECH24414

Larry_Fine
Moderator
Moderator
   VIP   

You are welcome.  (I suggested you update your HBA firmware and driver).

Matt_Freestyle
Level 4

Sorry Larry, I should have thanked you in my previous post. I wasnt thinking that the thread would be marked as solved as it hasnt been tested yet.. But I will post back if I have any issues.

Larry_Fine
Moderator
Moderator
   VIP   

Sorry, I thought you marked it solved.  I was under the impression that only the OP could mark cases as solved?

Matt_Freestyle
Level 4

I think the Symantec Admin must have marked it as solved :(

CraigV
Moderator
Moderator
Partner    VIP    Accredited

Nope, TAs, OPs and Admins can, and I did (beats a -1 hey?) as that is kind of what the OP was saying (ie. he found the solution!)...I've made the change to reflect the correct post.

Matt_Freestyle
Level 4

Hi Guys, I'm back!

Okay, so the issue isnt resolved. The customer is still getting fairly consistent problems with their backup, however the drive is not continually offlining anymore after the firmware upgrade to the p212. Things are a little more steady, but still not great.

I cannot find any commonality in the problem, the drive appears to fail either at the beginning of a job of halfway through. When it fails halfway through the error BE throws says the device isnt connected, ADAM extract below:

 

[7144] 09/03/12 16:13:00.731 PvlDrive::OpenHandle() from device number
Drive = 1033 "Tape drive 0001"
DeviceName = \\.\Tape0
PrimaryName = \\.\Tape0
SecondaryName = \\?\scsi#sequential&ven_hp&prod_ultrium_4-scsi#5&1a9eeae2&0&070000#{53f5630b-b6bf-11d0-94f2-00a0c91efb8b}
ERROR = 0x00000006 (ERROR_INVALID_HANDLE)

 

When the failure occurs at the beginning of a job the failure in the log looks like this:

 

[22812] 08/31/12 13:00:34.438 B: Not caching B2D entity in Storage Manager: 'Test_B2d', key ID 1023, features 0x00000000, disk flags 0x0000001800000001
[24084] 08/31/12 18:00:16.633 DeviceIo: 04:07:00:00 - Device error 6 on "\\.\Tape0", SCSI cmd 16, 45 total errors
[24084] 08/31/12 18:00:21.663 PvlDrive::DisableAccess() - ReserveDevice failed, offline device
Drive = 1033 "Tape drive 0001"
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[24084] 08/31/12 18:00:21.692 PvlDrive::UpdateOnlineState()
Drive = 1033 "Tape drive 0001"
ERROR = The device is offline!

If the drive does go offline it now takes a full restart of the BE services to get it up and running again.

I have submitted a support case to HP in case of hardware fault, but HP Library and Tape Tools indicate the autoloader is healthy. 

I logged onto the autoloader interface and noticed it hasnt been powered down in a good while, and also the system time was half an hour out of sync. Seems trivial, but could this be the cause of my woes? 

Thanks in advance,

Matt

CraigV
Moderator
Moderator
Partner    VIP    Accredited

...is the firmware on your backup device current?

Also, are your cables all undamaged?

Bruce_Thomson
Level 2

Hi Guys,

This is my first post here. We are suffering the exact problem mentioned in this thread. I am about to log another call with Symantec Support (has been logged previously by my colleague).

The hardware is as follows:

HP X1600G2 24TB StorageWorks Server

HP MSL4048 with 2 x HP LTO5 SAS drives (all on current firmware)

Dedicated P414 512MB SAS Controller (current firmware)

HP Insight agent services completely disabled

Whave installed all available hotfixes, and SP1a for BES2012. The StorageWorks server is running Windows Storage Server 2008 R2 Std.

Larry_Fine
Moderator
Moderator
   VIP   

re: Dedicated P414 512MB SAS Controller (current firmware)

Is the P414 a RAID controller?

I couldn't find it in Google.  If it is a RAID controller, it is not listed on http://www.symantec.com/docs/TECH70907, so it wouldn't be supported.

If reservation errors are the issue, I would suspect a hardware or configuration root cause.  BE cannot do anything about a reservation failure.

Bruce_Thomson
Level 2

Hi Larry

Sorry that was a typo... It's a P411 which is supported.