cancel
Showing results for 
Search instead for 
Did you mean: 

upgraded to NBU 6.5.3 -- can't eject tapes

todis
Level 4
I recently upgraded both NBU and our hardware. We went from Solaris 10/SPARC using an old Overland Powerloader to a new Sun x4200 AMD Opteron box running Solaris 10. We were on 6.0 MP4 on the SPARC server. I upgraded to 6.5.3 on the existing server then did a catalog recovery on the new server to bring everything over. At this time we also dumped the old Overland library and add a fiber attached via a Q-logic switch NEO 4000. Backups run fine and tested restores work fine. However when vault runs it will not eject the tapes. I've been working with Symantec tech support and robtest works fine. We can move tapes back and forth but when using netbackup commands to eject a tape it fails with "eject aborted". We ran the following command: 

"./vmchange -multi_eject -w -res -ml 1517L4 -rt tld -verbose -rn 0 -rh dfw-sun-netback01"

At this point Symantec is suggesting downgrading the firmware on the NEO 4000 to get it closer to the version they used when the NEO series was added to their Hardware compatibility list. I thought I would post and see if anybody has any ideas??

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited
Try these commands ...

Can't test them from home hence why I've given all 3 possibilites ...  I think it's reasonable that if these don't work, plus your original command, then detailed investigation is required as previously suggested.  However, Vault should be able to eject, so this probably won't solve the issue, but at least it may be a step forward.

Vault will only be using "standard" NBU commands, so the fault won't be as far as I know, Vault - it's a Netbackup issue.


vmchange -res -multi_eject -w -verbose -rn 0 -rt tld -rh dfw-sun-netback01 -ml 1517L4  -single_cycle

vmchange -res -multi_eject -w -verbose -rn 0 -rt tld -rh dfw-sun-netback01 -ml 1517L4  -single_cycle -unattended

vmchange -res -multi_eject -w -verbose -rn 0 -rt tld -rh dfw-sun-netback01  -ml 1517L4  -unattended

Thinking about this,  NBU is only sending a SCSI command to the robot to eject, nothing particulat clever, so the firmware issue may not be far out, and there is some incompatibility between the two.  Given what you have said, I think this is a reasonable suggestion until it is proved otherwise.  Therefore, I think an escalation is reasonable  as previously suggested.

Martin

View solution in original post

7 REPLIES 7

mph999
Level 6
Employee Accredited
The command looks good from memory ...  

The firmware could be worth a go, if nothing else it may get things up and running quicker.

I'd request an escalation to backline, they can look in the code and see what has to happen to get the "Eject Aborted" message.  A truss on on tldcd process might be useful.

The trick to solving this one is to find out what changed in NBU between the versions, something probably only backline can determine,

Sorry I can't help with an actual solution in this case.

Martin


todis
Level 4
Martin,

Thanks for your feedback. I checked with Overland and I can only revert to one level back because of the LT04 drives. Not sure if this will make a difference. We'll see what happens. Will post with the results.

mph999
Level 6
Employee Accredited
Try these commands ...

Can't test them from home hence why I've given all 3 possibilites ...  I think it's reasonable that if these don't work, plus your original command, then detailed investigation is required as previously suggested.  However, Vault should be able to eject, so this probably won't solve the issue, but at least it may be a step forward.

Vault will only be using "standard" NBU commands, so the fault won't be as far as I know, Vault - it's a Netbackup issue.


vmchange -res -multi_eject -w -verbose -rn 0 -rt tld -rh dfw-sun-netback01 -ml 1517L4  -single_cycle

vmchange -res -multi_eject -w -verbose -rn 0 -rt tld -rh dfw-sun-netback01 -ml 1517L4  -single_cycle -unattended

vmchange -res -multi_eject -w -verbose -rn 0 -rt tld -rh dfw-sun-netback01  -ml 1517L4  -unattended

Thinking about this,  NBU is only sending a SCSI command to the robot to eject, nothing particulat clever, so the firmware issue may not be far out, and there is some incompatibility between the two.  Given what you have said, I think this is a reasonable suggestion until it is proved otherwise.  Therefore, I think an escalation is reasonable  as previously suggested.

Martin

Stumpr2
Level 6
Carefully check the volume group for any extra 0 (zeros) in the name.

todis
Level 4
Update -- I was able to downgrade the NEO 4000 from firmware version 6.04 to 6.02 and everything ran fine. At this point I Symantec wants to figure out if the robot simply needed reset or if the firmware fixed the problem. We didn't try power cycling the robot before the downgrade. I also shut the server down during the downgrade. I think we're going to put the latest firmware back on the library and see if the problem comes back.


mph999 -- thanks for validating the firmware could be the possible problem. I wouldn't have thought about that and was beginning to wonder if Symantec was just throwing stuff against the wall hoping it would work...

Stumpr -- I don't see any extra zeros.


mph999
Level 6
Employee Accredited
Sometimes you do just have to try things, it's like a broken light bulb, is the the buld, switch, cable or fuse ,,,  and you just have to go through things one-by-one.

Would really appreciate it, if you agree, to mark my post as the solution.

Many thanks,

Martin

todis
Level 4
Update -- Everything worked fine on the 6.02 firmware. For testing Symantec wanted to update the firmware to the latest version (6.0.4) again and retest. I did and everything still works fine. At this point all we can tell is rebooting the server and library fixed the problem. Thanks to all who replied.