Forum Discussion

nbuno's avatar
nbuno
Level 6
12 years ago

Where can i capture log for Tape drive going Down and PEND-TLD

Please just don't tell me to check in bptm and /var/adm/messages..i couldn't find them there..also,i will appreciate if someone pls can paste the o/p of how they actually looks..i have to create a script so that this data can be captured and sent to monitoring guys..this o/p is exactly what i am looking for.

 

thanks.........

  • Those files are THE ONLY WAY!

    First of all - ensure that VERBOSE entry exists in vm.conf on all media servers and that NBU is restarted on media servers to enable logging to OS syslog.

    If you post YOUR media server(s) messages file(s) and bptm log(s), we will help you to identify issues in your environment. 

    It does not help you one bit if we share our error messages.....

  • Those files are THE ONLY WAY!

    First of all - ensure that VERBOSE entry exists in vm.conf on all media servers and that NBU is restarted on media servers to enable logging to OS syslog.

    If you post YOUR media server(s) messages file(s) and bptm log(s), we will help you to identify issues in your environment. 

    It does not help you one bit if we share our error messages.....

  • i got that Marianne actually i just wanted to see how those error logs look like so that i can tell my OS guys to create a script that will grep 'something' from that content and send us an alert message whenever that happens..also i am not planning to keep my versbosity high so does that mean i will never be able to get those log messages ?

     

  • I don't know what they look like because there are many reasons that drives can go down.

    If you look at the details for the job in activity monitor you may get an idea of the time the problem happens, so you know about where to look in the log.

    If shouldn't be difficult - if you look for the work mount or mounting and check that the tape listed matches the tape shown in activity, you could pull at the lines for that PID , which will shorten things.

    Alternatively you could start by looking for lines containing <16> or <8> (doesn't always work though ... but should for a drive going down).

    The exact time the drive goes down will be found in ltid, as this is the process taht actually downs the drive, but the reason will be in bptm and hopfully messages.  You could also enable the debug log tpcommand  and robots. (/usr/openv/volmgr/debug/robots and tpcommand) .  If you stick VERBOSE in vm.conf this will show more info, and to get the max amount create the empty file DRIVE_DEBUG in volmgr dir.

    Waste of time trying to script this, it would only work for that exact cause, and we don;t now that until we see it anyway.  Trust me, if this could be scripted I would have done it ages ago ...

    Martin

  • Verbosity of bptm higher that 0 is only required if Symantec Support needs level 5 logs, but I firmly believe that VERBOSE entry in vm.conf is ALWAYS needed. No way to troubleshoot DOWN drives without it.
    There are so many reasons for drives being DOWN'ed.
    I always look for the word DOWN in messages, then look at the lines in messages file leading up to drive being DOWN'ed. 

    If you have seen some of these entries, you will understand that this cannot really be automated.

  • Agree with Mariannes excellent points ... I always run volmgr logs with VERBOSE in vm.conf and with the debug touchfiles in place - yes, these days I only 'run' test servers (Symantec support don't run Symantecs own 'real' backup servers ) but even test servers go wrong and need attention sometimes - and if that happens I want to be sure I have the right info in the logs, I don;t want to be having to set logs and then wait for the issue to happen again to capture the info i need to fix it. I strongly believe that log size should be part of the system design - many people don;t consider it, and then have no ability to turn the logs up to any decent level to troubleshoot an issue, which can mean it cannot be fixed until the log issue is resolved by some means. Anyway, back to topic. If there are drive issues, they should also be logged in : /usr/openv/netbackup/db/media/errors If you search for : "tperr.sh download'" in google you will be able to make some sense of the file - it should narrow down any potential issues with particular drives or media. Please note, it will only work on Solaris. It works on statistics, so the more lines in there the better. Martin
  • Something else that might help a person if they are in a tizzy: remember that the down'ed drives will be noted in the /var/log/messages files (I usually grep -i down /var/log/mess*) of the media server that controls the library--and if that is different than the master server, and you're looking at the master server, you may not see anything in its log files.