cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup 7.0.1 robot displaying down-tld

Tony213
Level 3

Netbackup 7.0.1.  Robot displaying down-tld does not respond to reset of drive...or...cleaning of drive.  Resarted daemons...no difference.  Powered off/on tape library...no difference.  Initial check of logs did not show any clear issues.  If anyone has some ideas I would appreciate it.  Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited

All the advice above is excellent.

Generally in NBU if the drive(s) have been working correctly and nothing has been changed, then it is very unlikely NBU is the cause, the main reason being, NBU does not write to drives, it is all done by the OS.

Just occassionally, I find the completely removing and reconfiguring the drive brings it back to life - and by that  mean remove it from the OS and NBU completly, then put it back.

If that does not promote it back into life, then no amount of prodding, poking or tickling it under the chin is likely to make a difference, and it is likely that it needs to go off to the 'tape drive hospital' for some treatment.

NBU has minimum contact with tape drives, the only thing it does is send a few scsi commands and apart from some versions of unix/ linux, even these go via the OS, and even then, these scsi commands are only used so NBU knows when the tape is in the drive.  After that point, it's all OS (NBU just passes the data to the OS, which then writes it onto tape).

It is for this reason, as pointed out by Marianne, that tape drive issue investigation, should start at the os.

We can look in this file /usr/openv/netbackup/db/media/errors (or win <install>\veritas\netbackup\db\media\errors )  and get some idea if there is any pattem to the errors on the drive or media.

If you have access  to Solaris, you can download tperr.sh and run the media/errors file against it, full instructions and download here :

https://www-secure.symantec.com/connect/downloads/tperrsh-script-solaris-only

(the errors file is on each media server).

The system log should show some detail (follow Mariaanes instructions) - if you see any thing that mentions io_ioctl / ASC/ ASCQ or Tapealert - it is almost 100% certain you have a faulty drive).

Martin

View solution in original post

9 REPLIES 9

Will_Restore
Level 6

have you reviewed Martin's  'NetBackup Basics and how to make YOUR life easier'

 

especially section (C) Is NetBackup the cause

Genericus
Moderator
Moderator
   VIP   

I would place a service call to have the drive checked.

If you have power cycled the drive and it stays down it might be due to an issue at the drive or OS level - nothing to do with netbackup.

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Device troubleshooting starts at OS level.

If you tell us which OS, it is easier to provide advice.

To get NBU to log Media Manager actions/errors to OS, please add the following entry to vm.conf on master and/or media server where DOWN drive is seen
(Windows: <install-path\veritas\volmgr\vm.conf   Unix/Linux: /usr/openv/volmgr/vm.conf):

VERBOSE

Restart NBU.

Try to UP drive in Device Monitor.

If it goes DOWN again, the reason will be logged in syslog on Unix/Linux server (e.g. /var/adm/messages on Solaris, /var/log/messages on Linux,) or in Event Viewer Application log on Windows server. Device errors will be logged is System log.

 

Botvitnik
Level 3

Hi Tony.

 

I think that Martin is true.

 

In Netbackup 7.0.1 version I have from time to time some mechanics in SL500 down.

There was combination problem with firmware mechanics too. In one case I had faulty drive too

with same behavior.

 

I would test reconfigure mechanic and if NetBackup wiln't see this mechanic, than it is work for HW man.

 

Best regards

Botvitnik

Tony213
Level 3

Ahhh...forgot about the OS...ugh.  It is solaris10.

Thank you for your input so far...have not had a chance to act on it yet....but thank you so far.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

There are LOTS of possible reasons for drives being down'ed.

No use speculating here without seeing /var/ad/messages after VERBOSE logging has been enabled.

mph999
Level 6
Employee Accredited

All the advice above is excellent.

Generally in NBU if the drive(s) have been working correctly and nothing has been changed, then it is very unlikely NBU is the cause, the main reason being, NBU does not write to drives, it is all done by the OS.

Just occassionally, I find the completely removing and reconfiguring the drive brings it back to life - and by that  mean remove it from the OS and NBU completly, then put it back.

If that does not promote it back into life, then no amount of prodding, poking or tickling it under the chin is likely to make a difference, and it is likely that it needs to go off to the 'tape drive hospital' for some treatment.

NBU has minimum contact with tape drives, the only thing it does is send a few scsi commands and apart from some versions of unix/ linux, even these go via the OS, and even then, these scsi commands are only used so NBU knows when the tape is in the drive.  After that point, it's all OS (NBU just passes the data to the OS, which then writes it onto tape).

It is for this reason, as pointed out by Marianne, that tape drive issue investigation, should start at the os.

We can look in this file /usr/openv/netbackup/db/media/errors (or win <install>\veritas\netbackup\db\media\errors )  and get some idea if there is any pattem to the errors on the drive or media.

If you have access  to Solaris, you can download tperr.sh and run the media/errors file against it, full instructions and download here :

https://www-secure.symantec.com/connect/downloads/tperrsh-script-solaris-only

(the errors file is on each media server).

The system log should show some detail (follow Mariaanes instructions) - if you see any thing that mentions io_ioctl / ASC/ ASCQ or Tapealert - it is almost 100% certain you have a faulty drive).

Martin

Tony213
Level 3

It appears that the drive had a tape in it that it could not read the barcode.  The system wasn't even displaying the persence of the tape media...it showed one less than was actually in the library.  I removed the tape and the drive is now reading other tape media normally.  Thank you all for your help...I will keep your suggestions for future reference as well.  Issue is resolved.

mph999
Level 6
Employee Accredited

Super - so 'almost' a hardware issue then.

Please be generous to mark the most helpful post as the solution ...

Many thanks,

Martin