05-12-2012 08:42 PM
Hi All,
I need some solution on SSO drive which is using ACSLS to manage the drive, We find the drive are going down frequently (Mostly over the weekend)
i have tried cleaning drives...
Solved! Go to Solution.
05-18-2012 11:02 AM
its shows on master but not on media server.
Have you checked/verified at OS level that all devices can be seen and are usable?
No use doing a reconfig in NBU unless you are sure that devices are OK at OS level. Device entries in /dev/rmt is not a good indiciation as Solaris does not automatically clean up. 'boot -r' will add new entries but not clean out entries that longer exists.
Please do the following on Media Server:
Run 'devfsadm -Cv' to scan, cleanup, add device entries in /dev/rmt.
Check contents of /dev/rmt with 'ls -l *cbn'.
Do you see correct amount of tape drives?
If not, you need to investigate. Nothing will work at NBU level.
If you do, run /usr/openv/volmgr/bin/sgscan. Verify that all drives can be seen by NBU as 'usable'.
If not, please post output of /dev/rmt listing. We will assist with rebuild of sg driver.
Only if sgscan produces correct output, run device config wizard on media server.
05-12-2012 11:51 PM
1- try this first and print the output:
netbackup\volmgr\bin\tpconfig -l
netbackup\volmgr\bin\tpautoconfig -report_disc
2- please provide OS, NBU version.
05-13-2012 12:05 AM
You need to find the exact reason why drives are going down - there can be MANY reasons. You will see if you Google it or search the forum.
To find out why, add VERBOSE entry to vm.conf on EACH media server and restart NBU.
Windows: <install-path>\veritas\volmgr
Unix/Linux: /usr/openv/volmgr
Also ensure that bptm log exists on each media server.
After this, the reason for DOWN drives will be logged in Event Viewer on Windows and syslog on Unix/Linix (e.g. /var/adm/messages).
Once you have a reason in the logs, please post here. We should be able to assist when we have an error message.
05-13-2012 12:29 AM
there is no Dcripency in the configuration sinces..tpautoconfig -report_disc shows no output
OS : Solaris 9 on all Master media
NBU : 6.5.6
ACSLS version --- 7 i think..
05-13-2012 12:32 AM
tpconfig -l, what the output, can you please port her to show it
05-13-2012 12:38 AM
Hi Marianne ,
I Have read many of your comments and advice on NBU related issues and i would say ...they have Help me allot to resolve issue on my envinorment..
the other reason would be NBU Tech support takes long to provide us resolution .. since they keep on reading logs and manymore things.. :)
anyway coming to my issue on ACSLS
a )Not sure how to read /var/adm/messages
b) i saw clean tapes and i found they have frequency of 50 and they have been used only 13 - 20 time till end of the month.
c )also i belive i read someting on the Love triangle of NBU -- ACLSL and cleaning media in below link
https://www-secure.symantec.com/connect/forums/netbackup-acsls-and-cleaning-tapes-love-triangle
Not sure How to procedue with ACSLS drive troubleshooting...
cheer,
Rajesh
05-13-2012 03:56 AM
Device Robot Drive Robot Drive Device S econd
Type Num Index Type DrNum Status Comment Name Path D evice Path
robot 0 - ACS - - - - <ACSLS Server>
drive - 0 hcart3 - UP - acs_drive09 /dev/rmt/10cbn A CS=0, LSM=3, PANEL=1, DRIVE=2
drive - 1 hcart3 - UP - acs_drive04 /dev/rmt/9cbn A CS=0, LSM=3, PANEL=1, DRIVE=3
drive - 2 hcart3 - DOWN - acs_drive33 /dev/rmt/15cbn A CS=0, LSM=1, PANEL=1, DRIVE=15
drive - 3 hcart3 - UP - acs_drive21 /dev/rmt/14cbn A CS=0, LSM=2, PANEL=1, DRIVE=14
drive - 4 hcart3 - UP - acs_drive17 /dev/rmt/11cbn A CS=0, LSM=2, PANEL=1, DRIVE=15
drive - 5 hcart3 - UP - acs_drive11 /dev/rmt/29cbn A CS=0, LSM=3, PANEL=1, DRIVE=4
drive - 6 hcart3 - UP - acs_drive49 /dev/rmt/22cbn A CS=0, LSM=0, PANEL=1, DRIVE=15
drive - 7 hcart3 - UP - acs_drive53 /dev/rmt/23cbn A CS=0, LSM=0, PANEL=1, DRIVE=14
drive - 8 hcart3 - UP - acs_drive94 /dev/rmt/24cbn A CS=0, LSM=0, PANEL=1, DRIVE=4
drive - 9 hcart3 - UP - acs_drive12 /dev/rmt/28cbn A CS=0, LSM=1, PANEL=1, DRIVE=10
drive - 10 hcart3 - DOWN - acs_drive20 /dev/rmt/12cbn A CS=0, LSM=2, PANEL=1, DRIVE=3
drive - 11 hcart3 - DOWN - acs_drive95 /dev/rmt/27cbn A CS=0, LSM=0, PANEL=1, DRIVE=8
drive - 12 hcart3 - UP - acs_drive24 /dev/rmt/13cbn A CS=0, LSM=2, PANEL=1, DRIVE=2
drive - 13 hcart3 - UP - acs_drive60 /dev/rmt/26cbn A CS=0, LSM=0, PANEL=1, DRIVE=1
drive - 14 hcart3 - UP - acs_drive56 /dev/rmt/25cbn A CS=0, LSM=0, PANEL=1, DRIVE=2
drive - 15 hcart3 - UP - acs_drive52 /dev/rmt/21cbn A CS=0, LSM=0, PANEL=1, DRIVE=3
drive - 16 hcart3 - UP - acs_drive40 /dev/rmt/20cbn A CS=0, LSM=1, PANEL=1, DRIVE=2
drive - 17 hcart3 - UP - acs_drive05 /dev/rmt/8cbn A CS=0, LSM=3, PANEL=1, DRIVE=14
drive - 18 hcart3 - UP - acs_drive28 /dev/rmt/16cbn A CS=0, LSM=2, PANEL=1, DRIVE=1
drive - 19 hcart3 - UP - acs_drive36 /dev/rmt/17cbn A CS=0, LSM=1, PANEL=1, DRIVE=3
drive - 20 hcart3 - DOWN - acs_drive41 /dev/rmt/19cbn A CS=0, LSM=1, PANEL=1, DRIVE=13
drive - 21 hcart3 - UP - acs_drive37 /dev/rmt/18cbn A CS=0, LSM=1, PANEL=1, DRIVE=14
drive - 22 hcart3 - UP - acs_drive03 /dev/rmt/30cbn A CS=0, LSM=3, PANEL=1, DRIVE=13
drive - 23 hcart3 - UP - acs_drive13 /dev/rmt/31cbn A
CS=0, LSM=2, PANEL=1, DRIVE=7
05-13-2012 04:19 AM
create a logs/bptm, restart netbackup services on master.
configure storage devices for master server and all media server from GUI.
take backup as testing, and check if there is any one goes down again.
05-13-2012 04:35 AM
"The other reason would be NBU Tech support takes long to provide us resolution .. since they keep on reading logs and manymore things"
Perhaps this is beause you need to look in the logs to resolve the issue. I noticed that Marianne has asked for some logs :
To find out why, add VERBOSE entry to vm.conf on EACH media server and restart NBU.
Windows: <install-path>\veritas\volmgr
Unix/Linux: /usr/openv/volmgr
Also ensure that bptm log exists on each media server.
After this, the reason for DOWN drives will be logged in Event Viewer on Windows and syslog on Unix/Linix (e.g. /var/adm/messages).
I would also create this dir on each media server
/usr/open/volmgr/debug/tpcommand
and an empty file - /usr/openv/volmgr/DRIVE_DEBUG
You will need to wait for the next time the drive(s) go down and then post up the logs from that media server.
Is this probelm always happening onm the same drives ?
Is it only happening on certain media servers ?
When did it start ?
If you post up this file (from all the media servers)
/usr/openv/netbackup/db/media/errors I'll run it through a script and see if we see any patterns pointing at particular drives or media.
Althernativly, if you have solaris media servers, you can do it yourself.
https://www-secure.symantec.com/connect/downloads/tperrsh-script-solaris-only
Martin
05-13-2012 04:40 AM
No need to restart ANY serveices when creating bptm.
05-13-2012 05:47 AM
With VERBOSE entry in vm.conf, you will notice the most Media Manager processes/daemons will run with 'v' (verbose).
If drive goes down again go to the media server where drive was downed.
Type:
grep DOWN /var/adm/messages
When connected to ACSLS you MUST remove frequency-based cleaning and ensure that there are no cleaning tapes in 'normal' tape slots. These robots have a special place in the robot and are added to ACSLS as Cleaning tapes.
Cleaning can ONLY be controlled by ACSLS, not by NBU and certainly not by both.
This TN is quite old, but is still the best troubleshooting document for NBU with ACSLS that I've ever seen: http://www.symantec.com/docs/TECH31526
Extract:
05-14-2012 12:12 AM
I had this issue long ago...
Please post bptm logs from robot controller media server... this is probably your tape library robot (hardware) issue...and you have to address this issue on hardware side as well..
05-15-2012 11:14 AM
Hi Marianne,
Your Comments are helping me...to resolve the issue.
But right now i am stuck with ACSLS , since we dont have SL console configured.
I was checking if there is a way to set up auto cleaning via cli on acsls (But till now i have no Luck)
i have type following command on ACSLS for query of clean tapes
q clean all
Identifier Home Location Max Usage Current Usage Status Type
CLN096 0, 0, 5,10, 0 50 19 home LTO-CLNU
CLN097 0, 2, 2, 8, 1 50 13 home LTO-CLNU
CLN098 0, 1, 5, 6, 1 50 7 home LTO-CLNU
CLN099 0, 0,14, 6, 0 50 3 in use LTO-CLNU
and i obsorved that the cleaning media are Not been used by looking at the Numbers.
i need command to set up auto clean via cli in ACSLS.
05-15-2012 12:25 PM
I am a bit 'rusted' as far as ACSLS is concerned, but the output looks to me as if Cleaning tapes are correctly configured. (I have asked a friend to confirm.)
One of them even shows 'In Use'.
Evidence of cleaning requests/action should also be logged in ACS Event log on ACSLS server:
/export/home/ACSSS/log/acsss_event.log
We still don't know if Cleaning or some other hardware error is causing the DOWN drives in your environment.
Please enable logs as suggested above. When a drive goes DOWN, please collect the logs on the media server where drive is DOWN'ed and post as attachments.'
*** EDIT ****
I have managed to find an ACSLS manual online:
http://docs.oracle.com/cd/E19775-01/AEMupdate/AEMupdate.pdf
Please confirm the library model number - see the following on p. 115:
ACSLS controls automatic cleaning for HLI-attached libraries (SL8500, L5500,
9300, 9740, and 4400 serial or TCP/IP attached libraries), but not for SCSI attached
libraries.
05-16-2012 12:05 AM
Unless the drives are flashing the cleaning light - they don't need cleaning.
x1 extra clean to 'make sure ' wil do no harm, but do not just assume a down drive = a drive that needs cleaning.
If you clean drives over and over, they will wear out.
Could be time to get logs as previous suggested
Martin
05-17-2012 11:25 AM
i get this message
<Server_name> avrd[14300]: [ID 517153 daemon.error] Fatal open error on <Drive_name> (device 2, /devices/ssm@0,0/pci@19,600000/fibre-channel@1/sg@2,0:raw), errno = 19 (No such device), DOWN'ing it
also i see the path in /dev/rmt
so do i need to delete the drive from NBU end and reconfigure it..
05-17-2012 12:28 PM
If OS loses connectivity to the device, it does not remove it from /dev/rmt.
Please run the following for this tape drive to verify actual status:
mt -f /dev/rmt/<#> status.
Delete and reconfigure is a short term solution. You need find out what is causing this. There will be more errors in /var/adm/messages.
One possible reason for 'missing devices' could be lack of persistent binding. This can cause device paths to change when system is rebooted.
05-18-2012 07:52 AM
I deleted the drive and reconfigure , its shows on master but not on media server.
also the status is down.
05-18-2012 11:02 AM
its shows on master but not on media server.
Have you checked/verified at OS level that all devices can be seen and are usable?
No use doing a reconfig in NBU unless you are sure that devices are OK at OS level. Device entries in /dev/rmt is not a good indiciation as Solaris does not automatically clean up. 'boot -r' will add new entries but not clean out entries that longer exists.
Please do the following on Media Server:
Run 'devfsadm -Cv' to scan, cleanup, add device entries in /dev/rmt.
Check contents of /dev/rmt with 'ls -l *cbn'.
Do you see correct amount of tape drives?
If not, you need to investigate. Nothing will work at NBU level.
If you do, run /usr/openv/volmgr/bin/sgscan. Verify that all drives can be seen by NBU as 'usable'.
If not, please post output of /dev/rmt listing. We will assist with rebuild of sg driver.
Only if sgscan produces correct output, run device config wizard on media server.