Forum Discussion

Sadfad's avatar
Sadfad
Level 3
8 years ago

nbevtmgr not running

Hi,

we shortly migratet our Master Server from Solaris 10 to Solaris 11 and to new Hardware.
After migrating we have several issues. For Example Enable/Disable of a policy takes up to 3 Minutes.

We also found out that nbevtmgr is not running. The service starts and stops in under 1 second.
In the Java GUI under Reports/Problems we can see many errors:
The Error Description is: connection to nbevtmgr:VRTS_NBU_PEM_Channel on host *master* failed after XXXX attempts.
We are getting this error message every minute.

I hope you can help me.

  • We found the error.

    It was a missing priviledge for 'proc_procntl'
    After we are adding this priviledge the nbevtmgr startet and everything else worked fine.
    Anyway thank you for your time and answers.

  • Is the process stopping or crashing ?

    if it is stopping, then the nbevtmgr log may be useful.  This is unified log number 231

    (vxlogview -p 51216 -o 231 -d all -t 00:10:00 )

    Might be worth increaseing the log level first, recreating issue, and then looking at log.

    vxlogcfg -a -p 51216 -o 231 -s DebugLevel=6 -s DiagnosticLevel=6

    To decrease log, just run the same command but use DebugLevel=1

    If the process is crashing, the log is still needed, but usually doesn't give a clear answer (as the process crashed, it couldn't log, so you only see what happened just before it crased, which may not be enough).  In this case you need to collect the crash dump, and if Unix/ Linux, process through the OS debugger and then log a call with Veritas.  You would also need to gather the system messages log, the process log (as mentioned) a copy of the core file (or crash dump if windows), a copy of the binary itself and the nbsu -c -t output.

  •  

    You have a comms issue …  I will hazard a guess it’s something to do with endpoints.

     

    The Corba error marianne points out is an excellent start, but by no means the complete story.  Maybe others better than I know different, but that error alone is pretty much useless, apart from telling us you have a comms issue.

    The problem happens at some point before that.  These issues can be very hard to find.

     

    From your log

     

    02/23/17 14:34:08.822 [Orb::getConfig] found service entry: name = veritas_pbx port = 1556 proto = tcp(Orb.cpp:542)

    02/23/17 14:34:08.822 [Orb::getConfig] required_interface: xxx defined and assigned(Orb.cpp:582)

     

    <snip>

     

    02/23/17 14:34:08.835 [TAO] Using 5 threads for all Consumers.

    02/23/17 14:34:08.835 [TAO] Using 5 threads for all Suppliers.

    02/23/17 14:34:08.836 [ServiceContainer::doInit] successfully processed directive: dynamic CosNotify_Service Service_Object * "/usr/openv/lib/libvxTAO_CosNotification.so.6":_make_TAO_NS_Notify_Service() "-AllowReconnect -DispatchingThreads 5 -SourceThreads 5 -NoUpdates"(ServiceContainer.cpp:192)

    02/23/17 14:34:08.836 [ServiceContainer::doInit] processing directive: dynamic AsyncEventMgr Service_Object * "/usr/openv/lib/libAsyncEventService.so":_make_AsyncEventMgr()(ServiceContainer.cpp:166)

    02/23/17 14:34:08.836 [NBLogFactory::locateLogger(fid, oid)] Found a logger (#2). FID - 231, OID - 231(NBLogFactory.cpp:199)

    02/23/17 14:34:08.836 [Orb::setOrbTimeoutPolicy] setting ORB request timeout policy: tv = 300(Orb.cpp:1388)

    02/23/17 14:34:08.836 [Orb::setOrbRequestTimeout] timeout seconds: 300(Orb.cpp:1374)

    02/23/17 14:34:08.836 [Info] V-231-134 The TAO Notification Service (CosNotify_Service) has been started.

    02/23/17 14:34:08.837 [vnet_sortaddrs] [vnet_addrinfo.c:3992] sorted addrs: 1 0x1

    02/23/17 14:34:08.837 [NBIORInterceptor::establish_components] Encoding IP Address [xxx] in IOR(NBIORInterceptor.cpp:120)

    02/23/17 14:34:08.837 [vnet_sortaddrs] [vnet_addrinfo.c:3992] sorted addrs: 1 0x1

    02/23/17 14:34:08.837 [NBIORInterceptor::establish_components] Encoding IP Address [xxx] in IOR(NBIORInterceptor.cpp:120)

    02/23/17 14:34:08.839 [Info] V-231-135 TAO Notification Service (CosNotify_Service) Initialization complete.

    02/23/17 14:34:08.839 [Info] V-231-140 Creating default Event Manager objects.

     

     

     

    Now lets look at the log from my server

     

    02/24/17 20:23:21.032 [Orb::getConfig] found service entry: name = veritas_pbx port = 1556 proto = tcp(Orb.cpp:542)

    02/24/17 20:23:21.032 [Orb::getConfig] cluster_name and required_interface not defined, using ANY(Orb.cpp:594)

     

    <snip>

     

    02/24/17 20:23:21.314 [TAO] Using 5 threads for all Consumers.

    02/24/17 20:23:21.314 [TAO] Using 5 threads for all Suppliers.

    02/24/17 20:23:21.315 [ServiceContainer::doInit] successfully processed directive: dynamic CosNotify_Service Service_Object * "/usr/openv/lib/libvxTAO_CosNotification.so.6":_make_TAO_NS_Notif

    y_Service() "-AllowReconnect -DispatchingThreads 5 -SourceThreads 5 -NoUpdates"(ServiceContainer.cpp:192)

    02/24/17 20:23:21.315 [ServiceContainer::doInit] processing directive: dynamic AsyncEventMgr Service_Object * "/usr/openv/lib/libAsyncEventService.so":_make_AsyncEventMgr()(ServiceContainer.c

    pp:166)

    02/24/17 20:23:21.316 [NBLogFactory::locateLogger(fid, oid)] Found a logger (#2). FID - 231, OID - 231(NBLogFactory.cpp:199)

    02/24/17 20:23:21.317 [AsyncEventMgr::init()] +++ ENTERING +++ : obj = 1002ba900 (../AsyncEventMgr.cpp:99)

    02/24/17 20:23:21.317 [LifecycleTools::startCosNotifSvc()] +++ ENTERING +++ : obj = ffffffff7fffce10 (../LifecycleTools.cpp:108)

    02/24/17 20:23:21.317 [Orb::setOrbTimeoutPolicy] setting ORB request timeout policy: tv = 300(Orb.cpp:1388)

    02/24/17 20:23:21.317 [Orb::setOrbRequestTimeout] timeout seconds: 300(Orb.cpp:1374)

    02/24/17 20:23:21.318 [Info] V-231-134 The TAO Notification Service (CosNotify_Service) has been started.

    packet_write_wait: Connection to 10.12.235.41: Broken pipents] Encoding IP Address [10.12.235.41] in IOR(NBIORInterceptor.cpp:120)

    M071744HTG3QD:~ martin.holt$ nterceptor::establish_components] Encoding IP Address [10.12.235.41] in IOR(NBIORInterceptor.cpp:120)

    02/24/17 20:23:21.321 [NBIORInterceptor::establish_components] Encoding IP Address [10.12.235.41] in IOR(NBIORInterceptor.cpp:120)

    02/24/17 20:23:21.321 [NBIORInterceptor::establish_components] Encoding IP Address [10.12.235.41] in IOR(NBIORInterceptor.cpp:120)

    02/24/17 20:23:21.347 [Info] V-231-135 TAO Notification Service (CosNotify_Service) Initialization complete.

    02/24/17 20:23:21.347 [Info] V-231-140 Creating default Event Manager objects.

     

     

     

    You can see the difference, my log shows my ip address, your log shows xxx

     

    As a matter of interest, in pbx log at the time of startup, do you see lines like this:

     

    02/24/17 20:23:21.251 [Info] V-103-10 Adding server: nbevtmgr

    02/24/17 20:23:21.399 [Info] PBX_Client_Proxy::parse_line, line = extension=nbevtmgr  From 127.0.0.1

     

    To get the pbx log  (PBX log levels work a bit differently, it should be at max debug level of 10 by default)

     

    vxlogview -p 50936 -o 103 -t 00:40:00  (last 40 mins of log relative to when you run the command, adjust time as required, of just restart and get new logs …)

     

     

    I will hazard a guess, that you won’t see the lines in PBX log, as nbevtmgr can’t connect to it.

     

    I’d suggest checking name resolution, if you have multiple interfaces on master, even though you may only use one address, ALL address must be fully resolvable (both forwards and backwards) as the way NBU works,  it advertises all interfaces on a machine, so even if not used, every interface must be resolvable.

     

    If you use hosts file, make sure the localhost entry is correct.  If on Unix/ Linux, check cat -v /etc/hosts for bad characters, I had a corrupted host file the other week that caused this sort of issue.  Might be worth re-creating the hosts file to be safe (obviously, don’t copy it …).

     

    As a matter of interest, is this a cluster ???

     

    I’ve also just had a look for previous cases with that same Corba error.  Various solutions, some were fixed by an EEB (not the version you have so N/A), quite a few had name resolution as the cause - so again, further evidence to check this.  I will apologies, if I had a £1 for every time I am told ‘there are no lookup issues’ only to find several hours later -  oh look ‘name resolution issue’ …

    We might need to increase the vx log 156 and 137, these are loads of ‘network’ stuff into the nbevtmgr log, you need to recollect ONLY nbevtmgr log but run vxlogview -p 51216 -i 231, not -o 231.  To be honest, if it’s not a easily found lookup issue then you’ll probably need a support call, and in that case send raw unprocessed logs.

  • I've just had a horrible thought ...

    Did you edit the log and remove the ip addresses ? and put in xxx

    02/23/17 14:34:08.822 [Orb::getConfig] required_interface: xxx defined and assigned(Orb.cpp:582)

    If this is done, you need to substitute IP addreses not remove them, and be consistent

    Eg.  10.20.30.40 become 1.1.1.2

    20.30.40.50 becomes 1.1.1.3

    ... especially when you have some sort of communications problem.

    • Sadfad's avatar
      Sadfad
      Level 3

      Hi,

      I did remove the ip adresses, because I have to. The ip adress which i removed was the IP of the Master Server.
      It was always the same ip.
      I'm sorry for that, I should have mentioned it in my reply with the log.

      • Marianne's avatar
        Marianne
        Level 6

        Sadfad -
        You have not replied to any of Martin's requests in his lengthy reply posted on Friday?

  • OK ...

    Can't reallly add much more at the moment, suggest checking in PBX log as before for lines mentioned before - as well as other steps.  

     

    If no go, then need to see if the full logs help I think,  no need to stop anything as nbevtmgr isn't running, I'd clear the logs though

    cd /usr/openv/logs/nbevtmgr and delete the log files

    Then

    vxlogcfg -a -p 51216 -o 231 -s DebugLevel=6 -s DiagnosticLevel=6

    vxlogcfg -a -p 51216 -o 256 -s DebugLevel=6 -s DiagnosticLevel=6

    vxlogcfg -a -p 51216 -o 137 -s DebugLevel=6 -s DiagnosticLevel=6

    As soon as it fails, cp the 51216-231... file to nbevtmgr.raw

    Run

    vxlogview -p 51216 -i 231 >231.txx

    vxlogview -p 51216 -i 231 -d all >231_all.txt  (this gives all the PIDs and TIDs)

    Could also grab pbx log

    vxlogview -p 50936 -i 103 -d all -t 00:15:00 >pbx.txt  (adjust time as necessary)

    Turn logs down (same commands as before but use DebugLevel=1

    • Sadfad's avatar
      Sadfad
      Level 3

      We found the error.

      It was a missing priviledge for 'proc_procntl'
      After we are adding this priviledge the nbevtmgr startet and everything else worked fine.
      Anyway thank you for your time and answers.

      • mph999's avatar
        mph999
        Level 6

        Excellent that you found that, may I ask what led you to the answer ?

    • Sadfad's avatar
      Sadfad
      Level 3

      We checked this technote before and thats sadly not the same fault we have.

      • Marianne's avatar
        Marianne
        Level 6

        Can you please share the process that you have followed to migrate to new OS and tell us which NBU version(s)? 

        I have seen this some time ago when there was a corruption in some binaries that were copied manually from one server to the next. When downloaded software was ftp'ed and extracted on new server and re-installed, all was fine. (File sizes and checksum must be verified as well.)