Forum Discussion

virankumar's avatar
virankumar
Level 4
12 years ago

Unexpected Nebackup restart in cluster environment

There is an Plan activity to restart the netbackup and its done successfully

And there are two Unexpected restarts happen in the same day ...will you be able to give me support On this case

 

If you need any logs ill povide

  • Unfortunately messages file is too new - starts 21 Jan.

    engine_A log also does not contain all the detail that I've expected, for example:

     

    2012/12/03 21:09:21 VCS ERROR V-16-2-13027 (bnsdc-mast-01) Resource(nbu_server) - monitor procedure did not complete within the expected time.

    But we cannot see action taken by VCS.

    NBU was taken offline using VCS at the time we see the 'unexpected' disconnect in log posted above:

     

    2013/01/29 13:58:15 VCS INFO V-16-1-50135 User admin fired command: hares -offline nbu_server  bnsdc-mast-01  from 10.134.64.12
    2013/01/29 14:02:25 VCS INFO V-16-2-13001 (bnsdc-mast-01) Resource(nbu_server): Output of the completed operation (offline) 
    
    And started again a couple of minutes later:
    2013/01/29 14:08:49 VCS INFO V-16-1-50135 User admin fired command: hares -online nbu_server  bnsdc-mast-01  from 10.134.64.12

     

    Here we see 'unexpected offline' of VCS:

     

    2013/01/29 17:17:07 VCS ERROR V-16-2-13067 (bnsdc-mast-01) Agent is calling clean for resource(nbu_server) because the resource became OFFLINE unexpectedly, on its own.

    But 'clean' process is battling to complete:

     

    2013/01/29 17:18:09 VCS ERROR V-16-2-13006 (bnsdc-mast-01) Resource(nbu_server): clean procedure did not complete within the expected time.
    2013/01/29 17:53:56 VCS INFO V-16-1-50133 User patrol has logged in from 127.0.0.1
    2013/01/29 20:53:22 VCS ERROR V-16-2-13079 (bnsdc-mast-01) Resource(nbu_server): The last 10 invocations of the clean procedure have failed.
    2013/01/29 21:26:08 VCS ERROR V-16-2-13079 (bnsdc-mast-01) Resource(nbu_server): The last 20 invocations of the clean procedure have failed.

     

    2013/01/30 07:21:06 VCS ERROR V-16-2-13079 (bnsdc-mast-01) Resource(nbu_server): The last 200 invocations of the clean procedure have failed.
    2013/01/30 07:33:51 VCS INFO V-16-2-13001 (bnsdc-mast-01) Resource(nbu_server): Output of the completed operation (clean) 
    
    Looking for NetBackup processes that need to be terminated.
    Killing jnbSA/jbpSA processes...
    
    Looking for Media Manager processes that need to be terminated.
    
    
    Looking for VxDBMS processes that need to be terminated.
    /usr/openv/netbackup/bin/bp.kill_all FORCEKILL SKIPBPCD 2>&1 < /dev/null succeeded.
    2013/01/30 07:33:51 VCS INFO V-16-2-13078 (bnsdc-mast-01) Resource(nbu_server) - clean completed successfully after 203 failed attempts.

    So - maybe you can tell us what happened between the afternoon of 29 Jan and the morning of 30 Jan?

    VCS managed to restart NBU after this:

     

    2013/01/30 07:33:51 VCS ERROR V-16-2-13073 (bnsdc-mast-01) Resource(nbu_server) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 2) the resource.
    2013/01/30 07:34:11 VCS INFO V-16-2-13001 (bnsdc-mast-01) Resource(nbu_server): Output of the completed operation (online) 
    NetBackup Database Server started.

     

    PLEASE convince your management that NBU as well as SF/HA need to be upgraded as a matter of urgency.

    NBU 6.x ran out of support in October last year.
    Seems SF/HA 5.0 was installed in Feb 2007 and never patched? Not good.........

     

     

     

12 Replies