Forum Discussion

Tony213's avatar
Tony213
Level 3
13 years ago

Netbackup 7.0.1 robot displaying down-tld

Netbackup 7.0.1.  Robot displaying down-tld does not respond to reset of drive...or...cleaning of drive.  Resarted daemons...no difference.  Powered off/on tape library...no difference.  Initial check of logs did not show any clear issues.  If anyone has some ideas I would appreciate it.  Thank you in advance.

  • All the advice above is excellent.

    Generally in NBU if the drive(s) have been working correctly and nothing has been changed, then it is very unlikely NBU is the cause, the main reason being, NBU does not write to drives, it is all done by the OS.

    Just occassionally, I find the completely removing and reconfiguring the drive brings it back to life - and by that  mean remove it from the OS and NBU completly, then put it back.

    If that does not promote it back into life, then no amount of prodding, poking or tickling it under the chin is likely to make a difference, and it is likely that it needs to go off to the 'tape drive hospital' for some treatment.

    NBU has minimum contact with tape drives, the only thing it does is send a few scsi commands and apart from some versions of unix/ linux, even these go via the OS, and even then, these scsi commands are only used so NBU knows when the tape is in the drive.  After that point, it's all OS (NBU just passes the data to the OS, which then writes it onto tape).

    It is for this reason, as pointed out by Marianne, that tape drive issue investigation, should start at the os.

    We can look in this file /usr/openv/netbackup/db/media/errors (or win <install>\veritas\netbackup\db\media\errors )  and get some idea if there is any pattem to the errors on the drive or media.

    If you have access  to Solaris, you can download tperr.sh and run the media/errors file against it, full instructions and download here :

    https://www-secure.symantec.com/connect/downloads/tperrsh-script-solaris-only

    (the errors file is on each media server).

    The system log should show some detail (follow Mariaanes instructions) - if you see any thing that mentions io_ioctl / ASC/ ASCQ or Tapealert - it is almost 100% certain you have a faulty drive).

    Martin

  • I would place a service call to have the drive checked.

    If you have power cycled the drive and it stays down it might be due to an issue at the drive or OS level - nothing to do with netbackup.

  • Device troubleshooting starts at OS level.

    If you tell us which OS, it is easier to provide advice.

    To get NBU to log Media Manager actions/errors to OS, please add the following entry to vm.conf on master and/or media server where DOWN drive is seen
    (Windows: <install-path\veritas\volmgr\vm.conf   Unix/Linux: /usr/openv/volmgr/vm.conf):

    VERBOSE

    Restart NBU.

    Try to UP drive in Device Monitor.

    If it goes DOWN again, the reason will be logged in syslog on Unix/Linux server (e.g. /var/adm/messages on Solaris, /var/log/messages on Linux,) or in Event Viewer Application log on Windows server. Device errors will be logged is System log.

     

  • Hi Tony.

     

    I think that Martin is true.

     

    In Netbackup 7.0.1 version I have from time to time some mechanics in SL500 down.

    There was combination problem with firmware mechanics too. In one case I had faulty drive too

    with same behavior.

     

    I would test reconfigure mechanic and if NetBackup wiln't see this mechanic, than it is work for HW man.

     

    Best regards

    Botvitnik

  • Ahhh...forgot about the OS...ugh.  It is solaris10.

    Thank you for your input so far...have not had a chance to act on it yet....but thank you so far.

  • There are LOTS of possible reasons for drives being down'ed.

    No use speculating here without seeing /var/ad/messages after VERBOSE logging has been enabled.

  • All the advice above is excellent.

    Generally in NBU if the drive(s) have been working correctly and nothing has been changed, then it is very unlikely NBU is the cause, the main reason being, NBU does not write to drives, it is all done by the OS.

    Just occassionally, I find the completely removing and reconfiguring the drive brings it back to life - and by that  mean remove it from the OS and NBU completly, then put it back.

    If that does not promote it back into life, then no amount of prodding, poking or tickling it under the chin is likely to make a difference, and it is likely that it needs to go off to the 'tape drive hospital' for some treatment.

    NBU has minimum contact with tape drives, the only thing it does is send a few scsi commands and apart from some versions of unix/ linux, even these go via the OS, and even then, these scsi commands are only used so NBU knows when the tape is in the drive.  After that point, it's all OS (NBU just passes the data to the OS, which then writes it onto tape).

    It is for this reason, as pointed out by Marianne, that tape drive issue investigation, should start at the os.

    We can look in this file /usr/openv/netbackup/db/media/errors (or win <install>\veritas\netbackup\db\media\errors )  and get some idea if there is any pattem to the errors on the drive or media.

    If you have access  to Solaris, you can download tperr.sh and run the media/errors file against it, full instructions and download here :

    https://www-secure.symantec.com/connect/downloads/tperrsh-script-solaris-only

    (the errors file is on each media server).

    The system log should show some detail (follow Mariaanes instructions) - if you see any thing that mentions io_ioctl / ASC/ ASCQ or Tapealert - it is almost 100% certain you have a faulty drive).

    Martin

  • It appears that the drive had a tape in it that it could not read the barcode.  The system wasn't even displaying the persence of the tape media...it showed one less than was actually in the library.  I removed the tape and the drive is now reading other tape media normally.  Thank you all for your help...I will keep your suggestions for future reference as well.  Issue is resolved.

  • Super - so 'almost' a hardware issue then.

    Please be generous to mark the most helpful post as the solution ...

    Many thanks,

    Martin