cancel
Showing results for 
Search instead for 
Did you mean: 

Tape drives in AVR. Could not open drive 2 to unload it, Input/output error, it may not be ready

ziro
Level 4

Hello,

 

Two tape drives are in AVR status and I dont know what can I try to more in investigation of cause. I tried powercycle, reset drives and put it up but still the same, after a while status AVR. GUI webpage of library shows drives as online. All other tape drives for the same library are ok, another two tape drives(different library) assigned to the same media server are ok. There is no tape in mentioned tape drives. Drives were ok since first configuration long time ago and no one change anything.

 

]# /usr/openv/volmgr/bin/vmoprcmd -d

                                PENDING REQUESTS

                                     <NONE>

                                  DRIVE STATUS

Drv Type   Control  User      Label  RecMID  ExtMID  Ready   Wr.Enbl.  ReqId
  0 hcart    TLD                -                     No       -         0
  1 hcart    TLD                -                     No       -         0
  2 hcart    AVR                -                     No       -         0
  3 hcart    AVR                -                     No       -         0

                             ADDITIONAL DRIVE STATUS

Drv DriveName            Shared    Assigned        Comment
  0 IBMES1_F1_R12         No       -
  1 IBMES1_F2_R01         Yes      -               .
  2 IBMEQ1_F1_R08         No       -
  3 IBMEQ1_F1_R09         No       -

 

output of /var/log/messages

May 10 18:12:34 dkeqdata02 ltid[12995]: could not open drive 2 to unload it, Input/output error, it may not be ready
May 10 18:12:43 dkeqdata02 ltid[13034]: could not open drive 3 to unload it, Input/output error, it may not be ready

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited

Only thing I can think of, is that the drives would have stayed as TLD until NBU tried to use them, then they would have gone to AVR.  Had tldcd gone. then this would have affected all the drives at the same time.

View solution in original post

11 REPLIES 11

ontherocks
Level 6
Partner Accredited Certified

From library console you have checked drives are empty.

You have tried reset and made the drives up, but still the issue is there.

 

Steps to follow :-

1. Check if drives need cleaning.

2. Perform the Tape Drives cleaning.

3. Does server OS able to see these drives, check with scan command.

4.Delete these two drives which are having issue from the NetBackup and reconfigure it again with same drive name.

 

Share the results :)

ManuKM1
Level 3
Certified

It looks like connetivity issue between Media Server and Tape Library.

ziro
Level 4

Yes, drives asked for cleaning but I dont have clean cleaning cartridge because all of them are already used. After powercycle tape drives are not ask for cleaning. And all tape drives in library needs cleaning but only these two are in AVR, all others working properly.

I cannot reconfigure tape drives now because of internal rules and permissions, perhaps during the week.

Here is the scan -tape output:

 

# /usr/openv/volmgr/bin/scan -tape
************************************************************
*********************** SDT_TAPE    ************************
************************************************************
------------------------------------------------------------
Device Name  : "/dev/nst2"
Passthru Name: "/dev/sg51"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry    : "IBM     ULT3580-TD4     97F2"
Vendor ID  : "IBM     "
Product ID : "ULT3580-TD4     "
Product Rev: "97F2"
Serial Number: "0007891211"
WWN          : ""
WWN Id Type  : 0
Device Identifier: "IBM     ULT3580-TD4     0007891211"
Device Type    : SDT_TAPE
NetBackup Drive Type: 3
Removable      : Yes
Device Supports: SCSI-3
Flags : 0x0
Reason: 0x0
------------------------------------------------------------
Device Name  : "/dev/nst3"
Passthru Name: "/dev/sg52"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry    : "IBM     ULT3580-TD4     97F2"
Vendor ID  : "IBM     "
Product ID : "ULT3580-TD4     "
Product Rev: "97F2"
Serial Number: "0007890973"
WWN          : ""
WWN Id Type  : 0
Device Identifier: "IBM     ULT3580-TD4     0007890973"
Device Type    : SDT_TAPE
NetBackup Drive Type: 3
Removable      : Yes
Device Supports: SCSI-3
Flags : 0x0
Reason: 0x0
------------------------------------------------------------
Device Name  : "/dev/nst0"
Passthru Name: "/dev/sg48"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry    : "IBM     ULT3580-TD4     97F2"
Vendor ID  : "IBM     "
Product ID : "ULT3580-TD4     "
Product Rev: "97F2"
Serial Number: "0007892007"
WWN          : ""
WWN Id Type  : 0
Device Identifier: "IBM     ULT3580-TD4     0007892007"
Device Type    : SDT_TAPE
NetBackup Drive Type: 3
Removable      : Yes
Device Supports: SCSI-3
Flags : 0x0
Reason: 0x0
------------------------------------------------------------
Device Name  : "/dev/nst1"
Passthru Name: "/dev/sg49"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry    : "IBM     ULT3580-TD4     97F2"
Vendor ID  : "IBM     "
Product ID : "ULT3580-TD4     "
Product Rev: "97F2"
Serial Number: "0007892037"
WWN          : ""
WWN Id Type  : 0
Device Identifier: "IBM     ULT3580-TD4     0007892037"
Device Type    : SDT_TAPE
NetBackup Drive Type: 3
Removable      : Yes
Device Supports: SCSI-3
Flags : 0x0
Reason: 0x0

 

mph999
Level 6
Employee Accredited

Can you post scan command without the -tape option ...  Just scan on its own.

I suspect the library has lost a couple of drives ...

So the scan output for the section that shos the library/ changer should show a list of drives by serial number that are 'in' the library, I suspect you'll only see two serial numbers listed.

AVR usually affects all the drives at once, usually due to NBU losing connectivity to the robot, it's a bit more unusual for only some drives to be affected.  It can be caused by incorrect config, but you haven't mentioned that any config changes have been made, hence my theory that you have a library issue.

It's nothing to do with drives needing cleaning, this won;t make them go AVR.

Martin

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I agree with Martin - looks like a robot issue.
I would like to see output of 'scan' as well as 'tpconfig -l'.

Please ensure that you have VERBOSE entry in vm.conf.
If not, please add it and restart NBU.
This will ensure that all hardware/Media Manager errors are logged in messages file.

ziro
Level 4

Hi,

Thanks for all of your reccomendation, it directed me to check daemons on media server and found out that tldd is not running. After I start this daemon tapes are up again.

I didnt realize that, I thought that all daemons must be ok if two tape drives are ok and two are AVR, so I investigated somewhere else. And it still little bit strange to me why two drives were ok and two in AVR if daemon was down. I think it should put all 4 drives in AVR?

mph999
Level 6
Employee Accredited

Only thing I can think of, is that the drives would have stayed as TLD until NBU tried to use them, then they would have gone to AVR.  Had tldcd gone. then this would have affected all the drives at the same time.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

We would need messages file from verbose Media Manager processes plus other volmgr/debug logs to understand what happened here.

No way to say what happened and why without logs.

It is certainly very unusual what you describe here. Any chance that 'someone' has done some 'fiddling'? Like killing processes?

 

ziro
Level 4

I talked to my colleague, he said that its happenning from time to time on this backup environment that tldd goes down without any reason or action of someone. Solution is up this tldd daemon and it works for a few months. There must be a reason why tldd goes down, its not very often, about once per half year, but its not good.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

It is certainly not 'normal' for tldd to terminate all by itself.

Set up Media Manager logging so that you can troubleshoot next time it happens.

On all media servers, add the following entries to vm.conf: 
VERBOSE
DAYS_TO_KEEP_LOGS = 3

Create debug folder under volmgr, then create the following folders in debug:
daemon
reqlib
ltid

(There are more debug logs, but the above should be enough. Hopefully Martin will be along soon...)

Restart ltid.

A combination of messages file and above logs will hopefully shed some light on why this is happening.

mph999
Level 6
Employee Accredited

If the process terminates it should core (in fact it will ...)

We can investigate, open call when it happens, but in advance configure OS to save cores with unlimited size and pref a unique filename.

I'm feeling particlarly generous this evening, so here are all my core related TN links.

Take your pick ...

 

Basic information to gather when troubleshooting core dump files from NetBackup processes on UNIX or Linux
http://www.symantec.com/docs/TECH52285

How to collect crash dump files on Windows 2008 and Windows 2008 R2
http://www.symantec.com/docs/TECH74145

HP-UX specific commands to run when troubleshooting core files for NetBackup support cases
http://www.symantec.com/docs/TECH52286

To Configure Solaris OS to Generate Core Files
http://docs.oracle.com/cd/E19850-01/820-0437/cores-on-solaris/index.html

Solaris specific commands to run when troubleshooting core files for NetBackup support cases
http://www.symantec.com/docs/TECH52287

AIX specific commands to run when troubleshooting core files for NetBackup support cases
http://www.symantec.com/docs/TECH52288

Linux specific commands to run when troubleshooting core files for NetBackup support cases
http://www.symantec.com/docs/TECH52289

How to gather crash dumps for NetBackup processes on Windows Vista / Server 2008 and Windows 7 / Server 2008 R2
http://www.symantec.com/docs/TECH128146

How to collect crash dump files on Windows 2008 and Windows 2008 R2
http://www.symantec.com/docs/TECH74145

 

Now, the debugger for the OS type has to be available, it doesnlt work if you try and run the core through a debugger on another system as there will be differencies and you get the wrong results (I've tried multiple times, 100% fail ...)

 

This is  a list of what we need for any os

http://www.symantec.com/docs/TECH52285

... the other links are core specific for diff OS.

 

M