Forum Discussion

rajeshthink's avatar
13 years ago

Troubleshoot Drive Down issue

What are the basic Drive down Troubleshooting we do to fix the issue.

there are may scenario

* some drive are down are down on a media server but some are up.

* SSO drive on acsls are down

*drive mostly giving error 84 even with new good tapes

* VTL drives going down

what are the logs we need to look for drive down issue and how and why to reconfigure drives

  • Currently i have VTL Drive issue. I have done everything and most have reconfigure the drives.

    But still Drives are going down, i check the /varadm/messages i got following

    May 24 19:26:22<Master_server> ltid[1413]: [ID 942091 daemon.error] Operator/EMM server has DOWN'ed drive HP.ULTRIUM4-SCSI.001 (device 26)

     

    I dont se any issue with drive in tpautoconf -report_disc.

5 Replies

  • PLEASE mention OS on your media servers. NBU relies on OS for I/O, so troubleshooting should start there. To increase NBU Media manager logging add VERBOSE entry to vm.conf on ALL media servers in your environment and restart NBU.
    Unix: /usr/openv/volmgr
    Windows: install-path\veritas\volmgr
    Also create bptm log folder on all media servers under netbackup/logs.

    In the meantime, please run Tape Logs report for last 24 hours. Filter to exclude INFO type severity.
    Save report as text file and post it here as attachment.

    **** EDIT ****

    Typing too slow when using Connect Mobile cheeky

    Martin's list is the most comprehensive and will catch errors at all possible levels.

    In all honesty, I have always been able to find reason for DOWN drives with just VERBOSE entry in vm.conf.
    Exact reason for DOWNing drive will be logged in Windows Event Viewer Application log and device errors in System log. On Unix media servers these errors will be logged to syslog (e.g. /var/adm/messages on Solaris, var/log/messages on Linux).

    It is normal for only the media server experiencing the problem to DOWN a drive while it remains UP on others. That is why troubleshooting must start at OS level on media server.

  •  

    Add VERBOSE to vm.conf
     
    mkdir /usr/openv/netbackup/logs/bptm
    mkdir /usr/openv/netbackup/logs/bpbrm
    mkdir /usr/openv/volmgr/debug/robots
    mkdir /usr/openv/volmgr/debug/daemon
    mkdir /usr/openv/volmgr/debug/ltid
    mkdir /usr/openv/volmgr/debug/oprd
    mkdir /usr/openv/volmgr/debug/reqlib
    mkdir /usr/openv/volmgr/debug/tpcommand
    mkdir /usr/openv/volmgr/debug/acssi  (ACS)
    mkdir /usr/openv/volmgr/debug/acsd  (ACS)
     
     
    Create the following empty files
     
    touch /usr/openv/volmgr/DRIVE_DEBUG
    touch /usr/openv/volmgr/ROBOT_DEBUG 
    touch /usr/openv/volmgr/AVRD_DEBUG 
    touch /usr/openv/volmgr/SSO_DEBUG 
     
    + system messages log
     
    http://www.symantec.com/docs/TECH169477
     
     
    Martin
     
  • I missed a bit ...

     

     

    To reconfigure devices :
     
    http://www.symantec.com/docs/TECH125956
     
    It is difficult to know when to reconfigure devices.  Sometimes, it is obvious, for example you may see a 'missing path' as things at the os level have changed.
     
    Sometimes, we can look through every log and find no answers, only to discover that for no reason you will ever explain, a re-config of the devices fixes the issue ...  This however is fairly rare, compared to the other possible causes.
     
    Generally (depending on complexity of system) I will reconfigure the drives at the begining of a call, just to eliminate this from being the cause - we then know for sure the config is fine.  If I do this, I'll delete the 'old' devices first.
     
    The thing to remember with drive (and robot) issues is that 95% of these calls are not caused by NBU - tape / drive issues should not automatically be blamed on NBU wich is usually the case.
     
    Why ...
     
    The OS performs most of the operations - eg, read/ write/ positioning - therefore, alhough not impossioble it is very very unilkely that these sort of issues are anything to do with NBU.
     
    Off te top of my head, I can only think of three 'NBU' issues relating to drives/ libraries that are actually 'our' fault, and  all of these are fixed in the latest NBU version.
     
    Martin
     
     
     
     
     
  • Currently i have VTL Drive issue. I have done everything and most have reconfigure the drives.

    But still Drives are going down, i check the /varadm/messages i got following

    May 24 19:26:22<Master_server> ltid[1413]: [ID 942091 daemon.error] Operator/EMM server has DOWN'ed drive HP.ULTRIUM4-SCSI.001 (device 26)

     

    I dont se any issue with drive in tpautoconf -report_disc.

  • You need to create the logs I explained before.  

    Martin