04-15-2015 07:44 AM
I was checking the logs on the media server (Win 2008R2 SP2 x64, NBU 188.8.131.52) and the NBSL logs have errors in 50% of the lines. Has anyone see this?
The log repeats the following lines about 41,000 times:
0,51216,132,132,1252824,1429106095953,6216,4652,0:,88:ERROR: Callback is down. Will not deliver events, Caller ID:0(EventSupplierTask.cpp:188),31:EventSupplierTask::shallEnqueue,1 0,51216,132,132,1252825,1429106095953,6216,4652,0:,63:Could not able to enqueue NBSL event(EventSupplierTask.cpp:268),26:EventSupplierTask::enqueue,1
Backups are running fine. No failed backups except the ones which are due to non-NBU issues.
Any information about this will be helpful.
Thanks in advan.ce
04-15-2015 09:31 AM
I can't tell you what they're about... but your count of 41000 messages does seem to be indicative of a problem somewhere at somepoint.
It's not obvious from the raw logs what date they're from. They could be old, and no longer relevant.
What are the message counts for the previous hour?
> vxlogview -p 51216 -o 132 -t 01:00:00 | find /c "Callback is down"
FYI - I've just checked two new v184.108.40.206 master/media appliances which are not doing very much at all at the moment, except backing up a few clients and replicating, and I found:
27 messages re "Callback is down"
27 messages re "Could not able"
...on one appliance. And 22 occurences of each message on the other appliance...
...all of which are from 10 days ago, which I assume is probably from when I rebooted the appliances, or bounced the application.
04-15-2015 09:46 AM
The snippet I provided earlier was from yesterday's log file.
Here is the output of vxlogview as requested:
C:\Program Files\Veritas\NetBackup\bin>vxlogview.cmd -p 51216 -o 132 -t 01:00:00 | find /c "Callback is down" 3574
I will check on my second media server where the replication happens for the same issue.
04-15-2015 10:00 AM
There is only one file on the second media server under nbsl log directory
C:\Program Files\Veritas\NetBackup\bin>vxlogview.cmd -p 51216 -o 132 -t 01:00:00 | find /c "Callback is down" V-1-45 No log files found. 0
04-15-2015 10:50 AM
Ok - so you know you're getting c. 3574 alerts per hour (which looks like one alert per second) on one media server.
I can't offer any deep diagnosis. All I can think of is to perhaps check each master and media server... to see whether "nbsl" is running?
Windows: > bpps | find "nbsl"
Unix: # bpps | grep "nbsl"
Anyone else know what this alert means?
04-16-2015 12:45 AM
Can't find much on this that gives a clear answer.
Looks like a previous eTrack , but this was back in 2012 ...
"looks like its rare scenario when due to the network issue the catalog update has failed due
to a TRANSIANT exception on OpsC callback object"
However, it seems the OsCenter Code was then changed to be more accomdating of this - clearly we can't fix a network issue in the code, but we can try and work around it a bit.
That said, there was a memory leak reported on nbsl / 220.127.116.11 but this was on Aix - that is not to say it couldn't happen on Windows - the symptoms were the same messages in the log as you report.
So in summary, not enough is known without further investigation - are you on a very old version, are you seeing high memory usage etc ... If it's a similar issue to the etrack, perhaps you have no obvious 'physical' network issues, but how is the TCP tuning at the OS level. If you reboot the machine, do the errors then disappear before returning after an amount of time.
04-17-2015 12:56 PM
My environment has NetBackup 18.104.22.168 (with the media server in question being Win2k8 SP1 x64). Memory usage seems to be normal, hanging around 34 GB out of 48 GB with spoold taking around 28 GB.
I'm going to reboot the machine on Monday, will keep an eye out for the errors after reboot.
04-27-2015 01:45 PM
My apologies for not updating earlier. The server was rebooted a week ago after applying Windows updates. NBSL logs have not shown any difference though.
05-13-2015 04:12 AM
Probably time to log a Support call with Symantec if this is still bugging you....
05-13-2015 07:39 AM
That is what I was planning on but there are other issues which came up and I have 3 cases already open. Will wait a couple of weeks to get the 3 sorted out and then concentrate on this one.
I wonder if this thread can remain open for a few more weeks so that I can update it after opening the case.
05-15-2015 04:09 AM
Yes, np leaving the thread, just come back to it when you are ready.
05-15-2015 05:16 AM
Some other points to think about:
1) Is the current 132 (nbsl) log level at default value? If not, are these messages still appearing even if you set the "DebugLevel=1" & "DiagnosticLevel=6"?
2) What is your master server's NBU version? Same as this media server at 22.214.171.124? If it's not, try to make it the same. (note that I believe 7.6.1 master server can still work with 126.96.36.199 media server, is that correct?)
3) If your environment has a OpsCenter server monitoring the master server, try to disable its data collection maybe for a short period, and check if you still get these NBSL messages? OpsCenter data collection is known to be gathering info via nbsl processes.
06-05-2015 06:01 AM
I am having a similar issue. Seeing this across all my 188.8.131.52 and 184.108.40.206 media servers (Windows and AIX). New installs and upgrades.Logging levels all set to minumum. Master Server is 220.127.116.11 recently upgraded from 18.104.22.168, upgraded before any other client/media server. No OpsCenter in this environment.
I have opened a case with Symantec. Will try to report back what they find.
06-09-2015 11:09 AM
Mine is specific to the fact I do not have OpsCenter in my environemnt. It is a known issue. Currently there is no fix. Workaround is to set DebugLevel on NBSL to 0 (default = 1):
vxlogcfg -a -p 51216 -o 132 -s DebugLevel=0
According to my SE NBSL's sole purpose is to report to OpsCenter. If you are not using it in your environment, no need to pay attention to nbsl logs.
If there is an OpsCenter in the environment, then further investigation may be needed.