Forum Discussion

rakeman's avatar
rakeman
Level 2
7 years ago

Solaris zone backup terminates status (41) after 100% complete

Hi,

A Solaris 10 Non-global zone... after the backup reaches 100% complete the backup fails with a 41 error as below,

11/05/2018 15:20:48 - Error bpbrm (pid=15515) db_FLISTsend failed: network connection timed out (41)
11/05/2018 15:20:50 - Error bptm (pid=15528) media manager terminated by parent process
11/05/2018 15:21:32 - Info bpbkar (pid=27365) done. status: 41: network connection timed out
11/05/2018 15:21:32 - end writing; write time: 11:27:41
network connection timed out  (41)

According to the client log:

15:21:41.748 [27365] <16> bpbkar: ERR - bpbkar killed by SIGPIPE
15:21:41.839 [27365] <16> bpbkar: ERR - bpbkar FATAL exit status = 40: network connection broken
15:21:41.839 [27365] <4> bpbkar: INF - EXIT STATUS 40: network connection broken
15:21:41.873 [27365] <4> bpbkar: INF - setenv FINISHED=0

Media server log:

15:10:48.192 [15515] <2> ConnectionCache::connectAndCache: Acquiring new connection for host nbmaster, query type 78
15:10:48.206 [15515] <2> vnet_pbxConnect: pbxConnectEx Succeeded
15:10:48.206 [15515] <2> logconnections: BPDBM CONNECT FROM 207.207.100.84.63205 TO 1.15.1.99.1556 fd = 8
15:10:48.328 [15515] <2> db_end: Need to collect reply
15:20:48.358 [15515] <2> get_long: (1) cannot read (byte 1) from network: Interrupted system call (4)
15:20:48.358 [15515] <2> db_getdata: get_string() failed: Interrupted system call (4) connection timed out (-2)
15:20:48.358 [15515] <2> db_end: no DONE from db_getreply(): network connection timed out
15:20:48.359 [15515] <16> bpbrm main: db_FLISTsend failed: network connection timed out (41)
15:20:48.360 [15515] <2> ConnectionCache::connectAndCache: Acquiring new connection for host nbmaster, query type 1
15:20:48.382 [15515] <2> vnet_pbxConnect: pbxConnectEx Succeeded
15:20:48.383 [15515] <2> logconnections: BPDBM CONNECT FROM 207.207.100.84.63240 TO 1.15.1.99.1556 fd = 8
15:20:48.446 [15515] <2> db_end: Need to collect reply
15:20:48.453 [15515] <2> bpbrm kill_child_process: start
15:20:48.453 [15515] <2> bpbrm wait_for_child: start
15:21:32.206 [15515] <2> bpbrm wait_for_child: child exit_status = 82 signal_status = 0
15:21:32.206 [15515] <2> bpbrm Exit: client backup EXIT STATUS 41: network connection timed out

There is another non-global zone on the same global zone which is working successfully, I have checked all Network connections and there are no errors or waits so am at a loss why this is occurring.

Any advice would be appreciated.

 

Thanks.

 

 

  • NetBackup 7.1.x and earlier? 
    OUCH.... 

    This looks like a timeout between bpbrm on the media server and bpdbm on the master server:

    15:10:48.206 [15515] <2> logconnections: BPDBM CONNECT FROM 207.207.100.84.63205 TO 1.15.1.99.1556 fd = 8
    15:10:48.328 [15515] <2> db_end: Need to collect reply
    15:20:48.358 [15515] <2> get_long: (1) cannot read (byte 1) from network: Interrupted system call (4)
    15:20:48.358 [15515] <2> db_getdata: get_string() failed: Interrupted system call (4) connection timed out (-2)
    15:20:48.358 [15515] <2> db_end: no DONE from db_getreply(): network connection timed out
    15:20:48.359 [15515] <16> bpbrm main: db_FLISTsend failed: network connection timed out (41)

    Level 3 logs for bpbrm (media server) and bpdbm (master) may help to see what was happening in the 10 minutes between the 2 timestamps. 

    See if there are errors in verbose bpdbm and bpbrm logs similar to this:
    https://www.veritas.com/support/en_US/article.000082098

    DB_TIMEOUT touchfile might be an option. 

    • rakeman's avatar
      rakeman
      Level 2

      Thanks Marianne, ill go through the article and hopefully get to the bottom of this annoying error.