cancel
Showing results for 
Search instead for 
Did you mean: 

SSO Drives on ACSLS are going down frequently.

rajeshthink
Level 4

Hi All,

 

I need some solution on SSO drive which is using ACSLS to manage the drive, We find the drive are going down frequently  (Mostly over the weekend)

i have tried cleaning drives...

1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

its shows on master but not on media server.

Have you checked/verified at OS level that all devices can be seen and are usable?

No use doing a reconfig in NBU unless you are sure that devices are OK at OS level. Device entries in /dev/rmt is not a good indiciation as Solaris does not automatically clean up. 'boot -r' will add new entries but not clean out entries that longer exists.

Please do the following on Media Server:

Run 'devfsadm -Cv' to scan, cleanup, add device entries in /dev/rmt.

Check contents of /dev/rmt with 'ls -l *cbn'.

Do you see correct amount of tape drives?

If not, you need to investigate. Nothing will work at NBU level.

If you do, run /usr/openv/volmgr/bin/sgscan. Verify that all drives can be seen by NBU as 'usable'.
If not, please post output of /dev/rmt listing.  We will assist with rebuild of sg driver.

Only if sgscan produces correct output, run device config wizard on media server.

View solution in original post

18 REPLIES 18

muhanad_daher
Level 6
Partner Accredited Certified

1- try this first and print the output:

netbackup\volmgr\bin\tpconfig -l

netbackup\volmgr\bin\tpautoconfig -report_disc

 

2- please provide OS, NBU version.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

You need to find the exact reason why drives are going down - there can be MANY reasons. You will see if you Google it or search the forum.

To find out why, add VERBOSE entry to vm.conf on EACH media server and restart NBU.
Windows: <install-path>\veritas\volmgr
Unix/Linux: /usr/openv/volmgr

Also ensure that bptm log exists on each media server.

After this, the reason for DOWN drives will be logged in Event Viewer on Windows and syslog on Unix/Linix (e.g. /var/adm/messages).

Once you have a reason in the logs, please post here. We should be able to assist when we have an error message.

rajeshthink
Level 4

there is no Dcripency in the configuration sinces..tpautoconfig -report_disc shows no output

 

OS : Solaris 9 on all Master media

NBU : 6.5.6

ACSLS version --- 7 i think..

 

muhanad_daher
Level 6
Partner Accredited Certified

tpconfig -l, what the output, can you please port her to show it

rajeshthink
Level 4

Hi Marianne   ,

 

I Have read many of your comments and advice on NBU related issues and i would say ...they have Help me allot to resolve issue on my envinorment..

 

the other reason would be NBU Tech support takes long to provide us resolution .. since they keep on reading logs and manymore things.. :)

 

 

anyway coming to my issue on ACSLS

 

a )Not sure how to read /var/adm/messages

b)  i saw clean tapes and i found they have frequency of 50 and they have been used only 13 - 20 time till end of the month.

c )also i belive i read someting on the Love triangle of NBU -- ACLSL and cleaning media in below link

 

https://www-secure.symantec.com/connect/forums/netbackup-acsls-and-cleaning-tapes-love-triangle

Not sure How to procedue with ACSLS drive troubleshooting...

 

cheer,

Rajesh

 

 

rajeshthink
Level 4

Device Robot Drive       Robot                    Drive        Device          S                                                                              econd
Type     Num Index  Type DrNum Status  Comment    Name         Path            D                                                                              evice Path
robot      0    -    ACS    -       -  -          -            <ACSLS Server>
  drive    -    0 hcart3    -      UP  -          acs_drive09  /dev/rmt/10cbn  A                                                                              CS=0, LSM=3, PANEL=1, DRIVE=2
  drive    -    1 hcart3    -      UP  -          acs_drive04  /dev/rmt/9cbn   A                                                                              CS=0, LSM=3, PANEL=1, DRIVE=3
  drive    -    2 hcart3    -    DOWN  -          acs_drive33  /dev/rmt/15cbn  A                                                                              CS=0, LSM=1, PANEL=1, DRIVE=15
  drive    -    3 hcart3    -      UP  -          acs_drive21  /dev/rmt/14cbn  A                                                                              CS=0, LSM=2, PANEL=1, DRIVE=14
  drive    -    4 hcart3    -      UP  -          acs_drive17  /dev/rmt/11cbn  A                                                                              CS=0, LSM=2, PANEL=1, DRIVE=15
  drive    -    5 hcart3    -      UP  -          acs_drive11  /dev/rmt/29cbn  A                                                                              CS=0, LSM=3, PANEL=1, DRIVE=4
  drive    -    6 hcart3    -      UP  -          acs_drive49  /dev/rmt/22cbn  A                                                                              CS=0, LSM=0, PANEL=1, DRIVE=15
  drive    -    7 hcart3    -      UP  -          acs_drive53  /dev/rmt/23cbn  A                                                                              CS=0, LSM=0, PANEL=1, DRIVE=14
  drive    -    8 hcart3    -      UP  -          acs_drive94  /dev/rmt/24cbn  A                                                                              CS=0, LSM=0, PANEL=1, DRIVE=4
  drive    -    9 hcart3    -      UP  -          acs_drive12  /dev/rmt/28cbn  A                                                                              CS=0, LSM=1, PANEL=1, DRIVE=10
  drive    -   10 hcart3    -    DOWN  -          acs_drive20  /dev/rmt/12cbn  A                                                                              CS=0, LSM=2, PANEL=1, DRIVE=3
  drive    -   11 hcart3    -    DOWN  -          acs_drive95  /dev/rmt/27cbn  A                                                                              CS=0, LSM=0, PANEL=1, DRIVE=8
  drive    -   12 hcart3    -      UP  -          acs_drive24  /dev/rmt/13cbn  A                                                                              CS=0, LSM=2, PANEL=1, DRIVE=2
  drive    -   13 hcart3    -      UP  -          acs_drive60  /dev/rmt/26cbn  A                                                                              CS=0, LSM=0, PANEL=1, DRIVE=1
  drive    -   14 hcart3    -      UP  -          acs_drive56  /dev/rmt/25cbn  A                                                                              CS=0, LSM=0, PANEL=1, DRIVE=2
  drive    -   15 hcart3    -      UP  -          acs_drive52  /dev/rmt/21cbn  A                                                                              CS=0, LSM=0, PANEL=1, DRIVE=3
  drive    -   16 hcart3    -      UP  -          acs_drive40  /dev/rmt/20cbn  A                                                                              CS=0, LSM=1, PANEL=1, DRIVE=2
  drive    -   17 hcart3    -      UP  -          acs_drive05  /dev/rmt/8cbn   A                                                                              CS=0, LSM=3, PANEL=1, DRIVE=14
  drive    -   18 hcart3    -      UP  -          acs_drive28  /dev/rmt/16cbn  A                                                                              CS=0, LSM=2, PANEL=1, DRIVE=1
  drive    -   19 hcart3    -      UP  -          acs_drive36  /dev/rmt/17cbn  A                                                                              CS=0, LSM=1, PANEL=1, DRIVE=3
  drive    -   20 hcart3    -    DOWN  -          acs_drive41  /dev/rmt/19cbn  A                                                                              CS=0, LSM=1, PANEL=1, DRIVE=13
  drive    -   21 hcart3    -      UP  -          acs_drive37  /dev/rmt/18cbn  A                                                                              CS=0, LSM=1, PANEL=1, DRIVE=14
  drive    -   22 hcart3    -      UP  -          acs_drive03  /dev/rmt/30cbn  A                                                                              CS=0, LSM=3, PANEL=1, DRIVE=13
  drive    -   23 hcart3    -      UP  -          acs_drive13  /dev/rmt/31cbn  A

                                                                                 CS=0, LSM=2, PANEL=1, DRIVE=7

muhanad_daher
Level 6
Partner Accredited Certified

create a logs/bptm, restart netbackup services on master.

configure storage devices for master server and all media server from GUI.

take backup as testing, and check if there is any one goes down again.

mph999
Level 6
Employee Accredited

 

"The other reason would be NBU Tech support takes long to provide us resolution .. since they keep on reading logs and manymore things"  

Perhaps this is beause you need to look in the logs to resolve the issue. I noticed that Marianne has asked for some logs :

 

To find out why, add VERBOSE entry to vm.conf on EACH media server and restart NBU.

Windows: <install-path>\veritas\volmgr
Unix/Linux: /usr/openv/volmgr

Also ensure that bptm log exists on each media server.

After this, the reason for DOWN drives will be logged in Event Viewer on Windows and syslog on Unix/Linix (e.g. /var/adm/messages).

 

I would also create this dir on each media server

/usr/open/volmgr/debug/tpcommand

 

and an empty file - /usr/openv/volmgr/DRIVE_DEBUG

 

You will need to wait for the next time the drive(s) go down and then post up the logs from that media server.

 

Is this probelm always happening onm the same drives ?

Is it only happening on certain media servers ?

When did it start ?

 

If you post up this file (from all the media servers)

/usr/openv/netbackup/db/media/errors  I'll run it through a script and see if we see any patterns pointing at particular drives or media.

 

Althernativly, if you have solaris media servers, you can do it yourself.

https://www-secure.symantec.com/connect/downloads/tperrsh-script-solaris-only

 

Martin

mph999
Level 6
Employee Accredited

No need to restart ANY serveices when creating bptm.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

With VERBOSE entry in vm.conf, you will notice the most Media Manager processes/daemons will run with 'v' (verbose).

If drive goes down again go to the media server where drive was downed.

Type:
grep DOWN /var/adm/messages

When connected to ACSLS you MUST remove frequency-based cleaning and ensure that there are no cleaning tapes in 'normal' tape slots. These robots have a special place in the robot and are added to ACSLS as Cleaning tapes.
Cleaning can ONLY be controlled by ACSLS, not by NBU and certainly not by both.

This TN is quite old, but is still the best troubleshooting document for NBU with ACSLS that I've ever seen: http://www.symantec.com/docs/TECH31526

Extract:

2.3 ACSLS TAPE CLEANING
ACS robot types are self cleaning. Tape cleaning should not be initiated by NetBackup. If a TapeAlert-based cleaning flag is set by LTID or avrd for an ACS, TLH, or an LMF drive, the vmd/DA will not release the drives.
 
To disable TapeAlert checking and eliminate "TapeAlert is not supported" messages in the syslog, add the NO_TAPEALERT touch file.
For UNIX:
/usr/openv/volmgr/database/NO_TAPEALERT
For Windows:
<install path>\volmgr\database\NO_TAPEALERT
 
The StorageTek library transport control unit tracks how much tape passes through each transport and sends a message to ACSLS when a transport requires cleaning. If auto-cleaning is enabled, ACSLS automatically mounts a cleaning cartridge on the transport. If all the cleaning cartridges have expired (MAX_USAGE), ACSLS will post an error message 376N into the acsss_event log. If auto-cleaning is disabled, ACSLS logs a message in the event log and displays a message at the cmd_proc when cleaning is required.

Usman_Ali1
Level 3
Partner Accredited

I had this issue long ago...  

Please post bptm logs from robot controller media server... this is probably your tape library robot (hardware) issue...and you have to address this issue on hardware side as well.. 

rajeshthink
Level 4

Hi Marianne,

Your Comments are helping me...to resolve the issue.

But right now i am stuck with ACSLS , since we dont have SL console configured.

I was checking if there is a way to set up auto cleaning via cli on acsls (But till now i have no Luck)

i have type following command on ACSLS for query of clean tapes

q clean all

Identifier  Home Location    Max Usage  Current Usage  Status     Type
 CLN096        0, 0, 5,10, 0  50         19             home       LTO-CLNU
 CLN097        0, 2, 2, 8, 1  50         13             home       LTO-CLNU
 CLN098        0, 1, 5, 6, 1  50         7              home       LTO-CLNU
 CLN099        0, 0,14, 6, 0  50         3              in use     LTO-CLNU

 

and i obsorved that the cleaning media are Not been used by looking at the Numbers.

i need command to set up auto clean via cli in ACSLS.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I am a bit 'rusted' as far as ACSLS is concerned, but the output looks to me as if Cleaning tapes are correctly configured. (I have asked a friend to confirm.)
One of them even shows 'In Use'.
Evidence of cleaning requests/action should also be logged in ACS Event log on ACSLS server:
/export/home/ACSSS/log/acsss_event.log

We still don't know if Cleaning or some other hardware error is causing the DOWN drives in your environment.

Please enable logs as suggested above. When a drive goes DOWN, please collect the logs on the media server where drive is DOWN'ed and post as attachments.'

 

*** EDIT ****

I have managed to find an ACSLS manual online:
http://docs.oracle.com/cd/E19775-01/AEMupdate/AEMupdate.pdf

Please confirm the library model number - see the following on p. 115:

ACSLS controls automatic cleaning for HLI-attached libraries (SL8500, L5500,
9300, 9740, and 4400 serial or TCP/IP attached libraries), but not for SCSI attached
libraries.

mph999
Level 6
Employee Accredited

Unless the drives are flashing the cleaning light - they don't need cleaning.

x1 extra clean to 'make sure ' wil do no harm, but do not just assume a down drive = a drive that needs cleaning.

If you clean drives over and over, they will wear out.

Could be time to get logs as previous suggested

 

Martin

rajeshthink
Level 4

i get this message

 

<Server_name>  avrd[14300]: [ID 517153 daemon.error] Fatal open error on <Drive_name> (device 2, /devices/ssm@0,0/pci@19,600000/fibre-channel@1/sg@2,0:raw), errno = 19 (No such device), DOWN'ing it

also i see the path in   /dev/rmt

so do i need to delete the drive from NBU end and reconfigure it..

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

If OS loses connectivity to the device, it does not remove it from /dev/rmt.

Please run the following for this tape drive to verify actual status:

mt -f /dev/rmt/<#> status.

Delete and reconfigure is a short term solution. You need find out what is causing this. There will be more errors in /var/adm/messages.

One possible reason for 'missing devices' could be lack of persistent binding. This can cause device paths to change when system is rebooted.

rajeshthink
Level 4

I deleted the drive and reconfigure , its shows on master but not on media server.

also the status is down.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

its shows on master but not on media server.

Have you checked/verified at OS level that all devices can be seen and are usable?

No use doing a reconfig in NBU unless you are sure that devices are OK at OS level. Device entries in /dev/rmt is not a good indiciation as Solaris does not automatically clean up. 'boot -r' will add new entries but not clean out entries that longer exists.

Please do the following on Media Server:

Run 'devfsadm -Cv' to scan, cleanup, add device entries in /dev/rmt.

Check contents of /dev/rmt with 'ls -l *cbn'.

Do you see correct amount of tape drives?

If not, you need to investigate. Nothing will work at NBU level.

If you do, run /usr/openv/volmgr/bin/sgscan. Verify that all drives can be seen by NBU as 'usable'.
If not, please post output of /dev/rmt listing.  We will assist with rebuild of sg driver.

Only if sgscan produces correct output, run device config wizard on media server.