β05-10-2014 09:58 AM
Hello,
Two tape drives are in AVR status and I dont know what can I try to more in investigation of cause. I tried powercycle, reset drives and put it up but still the same, after a while status AVR. GUI webpage of library shows drives as online. All other tape drives for the same library are ok, another two tape drives(different library) assigned to the same media server are ok. There is no tape in mentioned tape drives. Drives were ok since first configuration long time ago and no one change anything.
]# /usr/openv/volmgr/bin/vmoprcmd -d
PENDING REQUESTS
<NONE>
DRIVE STATUS
Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId
0 hcart TLD - No - 0
1 hcart TLD - No - 0
2 hcart AVR - No - 0
3 hcart AVR - No - 0
ADDITIONAL DRIVE STATUS
Drv DriveName Shared Assigned Comment
0 IBMES1_F1_R12 No -
1 IBMES1_F2_R01 Yes - .
2 IBMEQ1_F1_R08 No -
3 IBMEQ1_F1_R09 No -
output of /var/log/messages
May 10 18:12:34 dkeqdata02 ltid[12995]: could not open drive 2 to unload it, Input/output error, it may not be ready
May 10 18:12:43 dkeqdata02 ltid[13034]: could not open drive 3 to unload it, Input/output error, it may not be ready
Solved! Go to Solution.
β05-12-2014 12:37 AM
Only thing I can think of, is that the drives would have stayed as TLD until NBU tried to use them, then they would have gone to AVR. Had tldcd gone. then this would have affected all the drives at the same time.
β05-10-2014 12:16 PM
From library console you have checked drives are empty.
You have tried reset and made the drives up, but still the issue is there.
Steps to follow :-
1. Check if drives need cleaning.
2. Perform the Tape Drives cleaning.
3. Does server OS able to see these drives, check with scan command.
4.Delete these two drives which are having issue from the NetBackup and reconfigure it again with same drive name.
Share the results :)
β05-10-2014 09:58 PM
It looks like connetivity issue between Media Server and Tape Library.
β05-11-2014 02:35 AM
Yes, drives asked for cleaning but I dont have clean cleaning cartridge because all of them are already used. After powercycle tape drives are not ask for cleaning. And all tape drives in library needs cleaning but only these two are in AVR, all others working properly.
I cannot reconfigure tape drives now because of internal rules and permissions, perhaps during the week.
Here is the scan -tape output:
# /usr/openv/volmgr/bin/scan -tape
************************************************************
*********************** SDT_TAPE ************************
************************************************************
------------------------------------------------------------
Device Name : "/dev/nst2"
Passthru Name: "/dev/sg51"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry : "IBM ULT3580-TD4 97F2"
Vendor ID : "IBM "
Product ID : "ULT3580-TD4 "
Product Rev: "97F2"
Serial Number: "0007891211"
WWN : ""
WWN Id Type : 0
Device Identifier: "IBM ULT3580-TD4 0007891211"
Device Type : SDT_TAPE
NetBackup Drive Type: 3
Removable : Yes
Device Supports: SCSI-3
Flags : 0x0
Reason: 0x0
------------------------------------------------------------
Device Name : "/dev/nst3"
Passthru Name: "/dev/sg52"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry : "IBM ULT3580-TD4 97F2"
Vendor ID : "IBM "
Product ID : "ULT3580-TD4 "
Product Rev: "97F2"
Serial Number: "0007890973"
WWN : ""
WWN Id Type : 0
Device Identifier: "IBM ULT3580-TD4 0007890973"
Device Type : SDT_TAPE
NetBackup Drive Type: 3
Removable : Yes
Device Supports: SCSI-3
Flags : 0x0
Reason: 0x0
------------------------------------------------------------
Device Name : "/dev/nst0"
Passthru Name: "/dev/sg48"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry : "IBM ULT3580-TD4 97F2"
Vendor ID : "IBM "
Product ID : "ULT3580-TD4 "
Product Rev: "97F2"
Serial Number: "0007892007"
WWN : ""
WWN Id Type : 0
Device Identifier: "IBM ULT3580-TD4 0007892007"
Device Type : SDT_TAPE
NetBackup Drive Type: 3
Removable : Yes
Device Supports: SCSI-3
Flags : 0x0
Reason: 0x0
------------------------------------------------------------
Device Name : "/dev/nst1"
Passthru Name: "/dev/sg49"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry : "IBM ULT3580-TD4 97F2"
Vendor ID : "IBM "
Product ID : "ULT3580-TD4 "
Product Rev: "97F2"
Serial Number: "0007892037"
WWN : ""
WWN Id Type : 0
Device Identifier: "IBM ULT3580-TD4 0007892037"
Device Type : SDT_TAPE
NetBackup Drive Type: 3
Removable : Yes
Device Supports: SCSI-3
Flags : 0x0
Reason: 0x0
β05-11-2014 11:53 AM
Can you post scan command without the -tape option ... Just scan on its own.
I suspect the library has lost a couple of drives ...
So the scan output for the section that shos the library/ changer should show a list of drives by serial number that are 'in' the library, I suspect you'll only see two serial numbers listed.
AVR usually affects all the drives at once, usually due to NBU losing connectivity to the robot, it's a bit more unusual for only some drives to be affected. It can be caused by incorrect config, but you haven't mentioned that any config changes have been made, hence my theory that you have a library issue.
It's nothing to do with drives needing cleaning, this won;t make them go AVR.
Martin
β05-11-2014 06:06 PM
I agree with Martin - looks like a robot issue.
I would like to see output of 'scan' as well as 'tpconfig -l'.
Please ensure that you have VERBOSE entry in vm.conf.
If not, please add it and restart NBU.
This will ensure that all hardware/Media Manager errors are logged in messages file.
β05-11-2014 11:19 PM
Hi,
Thanks for all of your reccomendation, it directed me to check daemons on media server and found out that tldd is not running. After I start this daemon tapes are up again.
I didnt realize that, I thought that all daemons must be ok if two tape drives are ok and two are AVR, so I investigated somewhere else. And it still little bit strange to me why two drives were ok and two in AVR if daemon was down. I think it should put all 4 drives in AVR?
β05-12-2014 12:37 AM
Only thing I can think of, is that the drives would have stayed as TLD until NBU tried to use them, then they would have gone to AVR. Had tldcd gone. then this would have affected all the drives at the same time.
β05-12-2014 01:07 AM
We would need messages file from verbose Media Manager processes plus other volmgr/debug logs to understand what happened here.
No way to say what happened and why without logs.
It is certainly very unusual what you describe here. Any chance that 'someone' has done some 'fiddling'? Like killing processes?
β05-14-2014 01:22 AM
I talked to my colleague, he said that its happenning from time to time on this backup environment that tldd goes down without any reason or action of someone. Solution is up this tldd daemon and it works for a few months. There must be a reason why tldd goes down, its not very often, about once per half year, but its not good.
β05-14-2014 01:58 AM
It is certainly not 'normal' for tldd to terminate all by itself.
Set up Media Manager logging so that you can troubleshoot next time it happens.
On all media servers, add the following entries to vm.conf:
VERBOSE
DAYS_TO_KEEP_LOGS = 3
Create debug folder under volmgr, then create the following folders in debug:
daemon
reqlib
ltid
(There are more debug logs, but the above should be enough. Hopefully Martin will be along soon...)
Restart ltid.
A combination of messages file and above logs will hopefully shed some light on why this is happening.
β06-10-2014 12:10 PM
If the process terminates it should core (in fact it will ...)
We can investigate, open call when it happens, but in advance configure OS to save cores with unlimited size and pref a unique filename.
I'm feeling particlarly generous this evening, so here are all my core related TN links.
Take your pick ...
Basic information to gather when troubleshooting core dump files from NetBackup processes on UNIX or Linux
http://www.symantec.com/docs/TECH52285
How to collect crash dump files on Windows 2008 and Windows 2008 R2
http://www.symantec.com/docs/TECH74145
HP-UX specific commands to run when troubleshooting core files for NetBackup support cases
http://www.symantec.com/docs/TECH52286
To Configure Solaris OS to Generate Core Files
http://docs.oracle.com/cd/E19850-01/820-0437/cores-on-solaris/index.html
Solaris specific commands to run when troubleshooting core files for NetBackup support cases
http://www.symantec.com/docs/TECH52287
AIX specific commands to run when troubleshooting core files for NetBackup support cases
http://www.symantec.com/docs/TECH52288
Linux specific commands to run when troubleshooting core files for NetBackup support cases
http://www.symantec.com/docs/TECH52289
How to gather crash dumps for NetBackup processes on Windows Vista / Server 2008 and Windows 7 / Server 2008 R2
http://www.symantec.com/docs/TECH128146
How to collect crash dump files on Windows 2008 and Windows 2008 R2
http://www.symantec.com/docs/TECH74145
Now, the debugger for the OS type has to be available, it doesnlt work if you try and run the core through a debugger on another system as there will be differencies and you get the wrong results (I've tried multiple times, 100% fail ...)
This is a list of what we need for any os
http://www.symantec.com/docs/TECH52285
... the other links are core specific for diff OS.
M