cancel
Showing results for 
Search instead for 
Did you mean: 

BEXEC 12.5 puts 2nd drive offline in MSL4048 after extended use.

A_ride_4_ever
Level 3
OK here is a question I hope someone has an answer to.  I have a MSL4048 LTO3 FC with 2 drives,   Whats happening is after a few hrs of using both drives at the same time I will loose drive 2,   Drive 1 stays on line. When I lose drive 2 there is no hardware logs or reports of  errors from the HBA or  the library itself, Both drives were replaced less than a month ago.  The HBA is new.   In order for me to get backup exec to use the 2nd drive again it requires restarting the library and restarting the services for Bexec.  This will run for a while then it will drop drive to again.  HP has replaced everything with new parts and the HBA has been replaced as well.  If anyone can come up with a viable solution there's a six pack of beer or Java and a really big thank you in it for you :)

I have includid the adamm.txt so you can see whats going on.

1 ACCEPTED SOLUTION

Accepted Solutions

CraigV
Moderator
Moderator
Partner    VIP    Accredited
Hi there,

Came across this one again. It references high-polling on HP StorageWorks MSLs.
We've had something similar in our data centre using ARCserve 11.5, and this was a fix. Might prompt you in the right direction.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=uk&taskId=110&prodSeriesId=3936307&prodTypeId=12169&prodSeriesId=3936307&objectID=c01567338

If you make registry changes, don't forget to back it up first!

Laters!

View solution in original post

10 REPLIES 10

CraigV
Moderator
Moderator
Partner    VIP    Accredited
Hi there,

What is your licensing like? You need to have an additional license for Backup Exec 12.5 to use that second tape drive. BE 12.5 ships with a license for 1 drive...every additional drive = 1 additional license.
Have you looked at running the Device Wizard in Backup Exec to install the latest drivers?

Laters!

A_ride_4_ever
Level 3
I have the library expansion option,  As well as I have updated the drivers from symantec and firmware for the library and HBA,  HP's solution is to send new hardware but there last solution only  lasted for about 2 months before the problem started again.  The only thing I havet swaped out was the fiber interconects beacsue I have never herd of fiber just going bad.

CraigV
Moderator
Moderator
Partner    VIP    Accredited
OK...next question: how is your backup infrastructure set up?
Do you have SAN-based backups? Are you trying to stream multiple streams to both drives?

A_ride_4_ever
Level 3
The libray is connected directly to a Qlogic dule port HBA,    I can run a backup to drive 1 for 24hrs  w/o error, within 2 hrs of startign nother job to drive 2 it will go offline and take drive 1 offline with it.  I have been speakign with HP about this. They thaught it was a PNP error causing the drives to be redetected, and it seemd there solution worked, The library was runnign fine after they made some registry changes, however over the weekend I ran our 2 larger jobs, 42 and 55 hrs  about 1/2 way through drive 2 died and drive 1 followed.   There are new errors showing up from the command interface now related to a power on event. It might turn out to be a bad library controler as I can write directly to the drives using TLL w/o any problems.   Lastly no  multiple streams going to both drives.  The sixpack of whatever is still up for grabs :)  well a 5pack now I have no self control.

CraigV
Moderator
Moderator
Partner    VIP    Accredited
OK...so my only other option is to see if you can downgrade the firmware on the library.
See if that solves the issue. Lastly, just make sure that your cabling is indeed correct, and terminated accordingly.
No settings in BE that are stopping the job, or creating a situation where it takes the drives offline?

marcsonn
Not applicable
Partner
A_ride_4_ever -- did you ever get resolution on this?  I've got the same configuration you have, and have the SAME problem.  MSL 4048, at least on of the drives go off line after a small period of usage.  I've been working with HP on this for almost three months.  At least I've gotten it down to a process --- library goes offline, stop BE services, power off library, power on library, wait 10 minutes, start BE services.  For a while, we couldn't figure out which combination would get things to work again, but this seems to make it go.  Once the library is offline and I've stop BE services, I run LT&T and it comes back with errors communicating with the library.  After power cycling the library, it sees it just fine.  From that point, I would think it has to be a communication problem, and not a BE problem, but HP is not coming up with anything.

Just looking to see how you made out on this.

Thanks!

CraigV
Moderator
Moderator
Partner    VIP    Accredited
Hi there,

Came across this one again. It references high-polling on HP StorageWorks MSLs.
We've had something similar in our data centre using ARCserve 11.5, and this was a fix. Might prompt you in the right direction.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=uk&taskId=110&prodSeriesId=3936307&prodTypeId=12169&prodSeriesId=3936307&objectID=c01567338

If you make registry changes, don't forget to back it up first!

Laters!

Matt_Alexander
Level 5
A_ride_4_ever did this issue ever get resolved? Please let us know.

CraigV
Moderator
Moderator
Partner    VIP    Accredited
Any news on this one...?

A_ride_4_ever
Level 3
CraigV, your solution was the closest.  What we ended up doing was disabling polling all together. It took a lot more work then should have been required.  It turns out that HP's fiber agent and some of the other HP agents caused excessive TUR's as well as  Qlogics NAS  app.  Removing  Insight manager and all of its agents reduced some but not all of the TUR's.  We then went throughout the registry and either edited or added  the reg key austorun=0 on the Microsoft driver for the library and the symantec drivers for the tape drives. 
This solution worked "for a short time".  I  ended up building up a fresh machine with just the bare minimum, no insite managers etc.. I installed a fresh download of 12.5 and applied a couple of the patches I knew we would need. The same issue happened,  I was shocked to say the least.  We were getting a TUR after every command sent.  I purchased a new fiber HBA and that didn't help.
I was finally able to get the system running stable w/o incident after disabling the PNP autorun option on the storeport service, HPN service as well as a hand full of others.  The process has to be repeated after every firmware revision or driver update as the keys change.  I have posted a full writeup of what had to be done on the spiceworks forums.

Thanks for every ones help in this.