Forum Discussion

symsonu's avatar
symsonu
Level 6
6 years ago

backup failing with error code 24 and 50

Hi ,

My backup policy is taking a backup of 10 different clients and it fails randomly for  3 or 4 with error code 24 and 50as shown below

 

08/10/2019 03:01:53 - Error bpbrm (pid=28472) from client prl-ercenmsvc2bkp: ERR - Cannot write to STDOUT. Errno = 110: Connection timed out

08/10/2019 03:01:55 - Error bptm (pid=28513) media manager terminated by parent process

08/10/2019 03:02:08 - Info bpbkar (pid=25895) done. status: 24: socket write failed

08/10/2019 03:02:08 - end writing; write time: 1:44:00

socket write failed  (24)

 

 

bpbkar  logs :

 

05:27:08.827 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/spool/postfix/public/cleanup] is a socket special file. Skipping
05:27:14.968 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/VRTSvcs/uds/.hadsocket] is a socket special file. Skipping
05:27:22.650 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/multipathd.sock] is a socket special file. Skipping
05:27:22.777 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/rpcbind.sock] is a socket special file. Skipping
05:27:22.880 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/libvirt/libvirt-sock-ro] is a socket special file. Skipping
05:27:22.882 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/libvirt/libvirt-sock] is a socket special file. Skipping
05:27:24.587 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/dbus/system_bus_socket] is a socket special file. Skipping
05:27:24.626 [11187] <4> bpbkar expand_wildcards: end backup for filelist /.ENM_BACKUP/root_lvm/var
05:27:24.629 [11187] <4> bpbkar main: INF - Client completed sending data for backup
05:27:24.629 [11187] <2> bpbkar main: INF - Total Size:0
05:27:24.629 [11187] <2> accl_post_nbfeds_backup_list: Posting Accurate Licensing data
05:27:24.629 [11187] <4> accl_post_nbfeds_backup_list: Master server name: prl-ercenmombsbkp
05:27:24.629 [11187] <4> post_nbfeds_backup_list: Json String value: [{"virtualName":"","jobid":"","backupId":"prl-ercenmsvc3bkp_1566872326","backupSelection":"/.ENM_BA
CKUP/root_lvm/var","frontEndDataSizeMb":4651}]
05:27:24.629 [11187] <2> post_nbfeds_backup_list: Post data URL: netbackup/catalog/frontenddata
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Curl initialized successfully.
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Performing curl_easy_init()
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Disabling any proxy option set in the environment.
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Using paths as caCertFilePath [/usr/openv/var/webtruststore/cacert.pem] privateKeyPath [/usr/openv/var/vxss/credent
ials/keystore/PrivKeyFile.pem]
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Setting default CURL timeout = [300] seconds
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Disable signal during name resolution.
05:27:24.637 [11187] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: ENABLE_NBCURL_VERBOSE is not set
05:27:24.637 [11187] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_CONNECTIONS is not set
05:27:24.637 [11187] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_PREFERRED_NETWORKS is not set
05:27:24.655 [11187] <2> NBClientCURL::setupCurlOperationTimeout: Setting CURL timeout = [180] seconds
05:27:24.655 [11187] <2> NBClientCURL::Post: POSTing to URL https://prl-ercenmombsbkp:1556/netbackup/catalog/frontenddata
05:27:24.655 [11187] <2> NBClientCURL::Post: Sending payload ...
05:27:24.655 [11187] <2> NBClientCURL::setPostOptions: setting curl post options ...
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Configuration suggests use of tunnel.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Looking up tunnels for master server [prl-ercenmombsbkp].
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's current state = [0].
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's state reset to [0].
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Querying client router for a list of tunnels.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Read NBU configuration settings.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel server from in-memory cache.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from user-specified (override) settings in NBU Configuration.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getTunnelsFromCache: Cache File /usr/openv/netbackup/websvctunnels.cache does not exist.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from Client router's tunnel servers' cache.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers. Will try to directly connect to the Web service.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Received a list of potential tunnels.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Empty tunnel entry encountered. Will try a direct connection to the Master server.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:27:24.657 [11187] <2> LoginWithCertManager::getLocalToken: tokenIssueTimeSec = 1566856007
05:27:24.657 [11187] <2> LoginWithCertManager::getLocalToken: tokenExpTimeSec = 1566942407
05:27:24.657 [11187] <2> LoginWithCertManager::getLocalToken: tokenLastFailTimeSec = 0
05:27:24.657 [11187] <2> LoginWithCertManager::isJWTRefreshRequired: tokenExpTimeSec = 1566942407, timeNow = 1566872844, (tokenExpTimeSec-timeNow) = 69563
05:27:24.657 [11187] <2> LoginWithCertManager::isJWTRefreshRequired: Token-refresh IS NOT required
05:27:24.657 [11187] <2> NBClientCURL::performCurlOperation: Obtained login token
05:27:24.685 [11187] <4> NBClientCURL::performCurlOperation: HTTP response header : HTTP/1.1 201 X-Request-ID: 280E34AC43382593 X-Request-ID: 280E34AC43382593 Content-
Length: 0 Date: Tue, 27 Aug 2019 02:27:24 GMT
05:27:24.685 [11187] <2> NBClientCURL::performCurlOperation: Fetched data = [xxxxx], httpcode = 201
05:27:24.685 [11187] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:27:24.685 [11187] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:27:24.685 [11187] <2> NBClientCURL:~NBClientCURL: Performing curl_easy_cleanup()
05:27:24.685 [11187] <2> NBClientCURL:~NBClientCURL: Performing curl_global_cleanup()
05:27:24.686 [11187] <4> bpbkar Exit: INF - bpbkar exit normal
05:27:24.686 [11187] <4> bpbkar Exit: INF - EXIT STATUS 0: the requested operation was successfully completed
05:27:24.686 [11187] <2> bpbkar Exit: INF - Close of stdout complete
05:27:24.686 [11187] <4> bpbkar Exit: INF - setenv FINISHED=1
05:28:25.083 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/proxy.d/proxy_helper-outbound_proxy-0-10861.uds] is a socket special file.
Skipping
05:28:25.440 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/vnetd/bpcd.uds] is a socket special file. Skipping
05:28:25.447 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/vnetd/terminate_vnetd.uds] is a socket special file. Skipping
05:28:25.498 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/vnetd/terminate_bpcd.uds] is a socket special file. Skipping
05:37:12.549 [10751] <4> check_file_sparseness: File </.ENM_BACKUP/root_lvm/root/etc/openldap/certs/key3.db> is probably sparse. Size = 16384 st_blocks = 24 block_size
= 512, fstype = 37
05:37:12.549 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/key3.db is sparse: stat.st_size = 16384, stat.st_blocks * 512 = 12288
05:37:12.549 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/key3.db is now size 8448
05:37:12.552 [10751] <4> check_file_sparseness: File </.ENM_BACKUP/root_lvm/root/etc/openldap/certs/cert8.db> is probably sparse. Size = 65536 st_blocks = 72 block_size
= 512, fstype = 37
05:37:12.552 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/cert8.db is sparse: stat.st_size = 65536, stat.st_blocks * 512 = 36864
05:37:12.553 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/cert8.db is now size 33024
05:37:12.557 [10751] <4> check_file_sparseness: File </.ENM_BACKUP/root_lvm/root/etc/openldap/certs/secmod.db> is probably sparse. Size = 16384 st_blocks = 24 block_siz
e = 512, fstype = 37
05:37:12.557 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/secmod.db is sparse: stat.st_size = 16384, stat.st_blocks * 512 = 1228
8
05:37:12.558 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/secmod.db is now size 5120
05:37:17.417 [10751] <4> bpbkar expand_wildcards: end backup for filelist /.ENM_BACKUP/root_lvm/root
05:37:17.422 [10751] <4> bpbkar main: INF - Client completed sending data for backup
05:37:17.422 [10751] <2> bpbkar main: INF - Total Size:0
05:37:17.422 [10751] <2> accl_post_nbfeds_backup_list: Posting Accurate Licensing data
05:37:17.422 [10751] <4> accl_post_nbfeds_backup_list: Master server name: prl-ercenmombsbkp
05:37:17.422 [10751] <4> post_nbfeds_backup_list: Json String value: [{"virtualName":"","jobid":"","backupId":"prl-ercenmsvc3bkp_1566872317","backupSelection":"/.ENM_BA
CKUP/root_lvm/root","frontEndDataSizeMb":3903}]
05:37:17.422 [10751] <2> post_nbfeds_backup_list: Post data URL: netbackup/catalog/frontenddata
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Curl initialized successfully.
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Performing curl_easy_init()
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Disabling any proxy option set in the environment.
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Using paths as caCertFilePath [/usr/openv/var/webtruststore/cacert.pem] privateKeyPath [/usr/openv/var/vxss/credent
ials/keystore/PrivKeyFile.pem]
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Setting default CURL timeout = [300] seconds
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Disable signal during name resolution.
05:37:17.430 [10751] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: ENABLE_NBCURL_VERBOSE is not set
05:37:17.430 [10751] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_CONNECTIONS is not set
05:37:17.430 [10751] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_PREFERRED_NETWORKS is not set
05:37:17.441 [10751] <2> NBClientCURL::setupCurlOperationTimeout: Setting CURL timeout = [180] seconds
05:37:17.441 [10751] <2> NBClientCURL::Post: POSTing to URL https://prl-ercenmombsbkp:1556/netbackup/catalog/frontenddata
05:37:17.441 [10751] <2> NBClientCURL::Post: Sending payload ...
05:37:17.441 [10751] <2> NBClientCURL::setPostOptions: setting curl post options ...
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Configuration suggests use of tunnel.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Looking up tunnels for master server [prl-ercenmombsbkp].
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's current state = [0].
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's state reset to [0].
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Querying client router for a list of tunnels.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Read NBU configuration settings.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel server from in-memory cache.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from user-specified (override) settings in NBU Configuration.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getTunnelsFromCache: Cache File /usr/openv/netbackup/websvctunnels.cache does not exist.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from Client router's tunnel servers' cache.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers. Will try to directly connect to the Web service.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Received a list of potential tunnels.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Empty tunnel entry encountered. Will try a direct connection to the Master server.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:37:17.443 [10751] <2> LoginWithCertManager::getLocalToken: tokenIssueTimeSec = 1566856007
05:37:17.443 [10751] <2> LoginWithCertManager::getLocalToken: tokenExpTimeSec = 1566942407
05:37:17.443 [10751] <2> LoginWithCertManager::getLocalToken: tokenLastFailTimeSec = 0
05:37:17.443 [10751] <2> LoginWithCertManager::isJWTRefreshRequired: tokenExpTimeSec = 1566942407, timeNow = 1566873437, (tokenExpTimeSec-timeNow) = 68970
05:37:17.443 [10751] <2> LoginWithCertManager::isJWTRefreshRequired: Token-refresh IS NOT required
05:37:17.443 [10751] <2> NBClientCURL::performCurlOperation: Obtained login token
05:37:17.473 [10751] <4> NBClientCURL::performCurlOperation: HTTP response header : HTTP/1.1 201 X-Request-ID: A38307B46654BD6E X-Request-ID: A38307B46654BD6E Content-
Length: 0 Date: Tue, 27 Aug 2019 02:37:17 GMT
05:37:17.473 [10751] <2> NBClientCURL::performCurlOperation: Fetched data = [xxxxx], httpcode = 201
05:37:17.473 [10751] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:37:17.473 [10751] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:37:17.473 [10751] <2> NBClientCURL:~NBClientCURL: Performing curl_easy_cleanup()
05:37:17.473 [10751] <2> NBClientCURL:~NBClientCURL: Performing curl_global_cleanup()
05:37:17.473 [10751] <4> bpbkar Exit: INF - bpbkar exit normal
05:37:17.473 [10751] <4> bpbkar Exit: INF - EXIT STATUS 0: the requested operation was successfully completed
05:37:17.473 [10751] <2> bpbkar Exit: INF - Close of stdout complete
05:37:17.473 [10751] <4> bpbkar Exit: INF - setenv FINISHED=1
05:54:17.497 [11904] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
05:54:17.499 [11904] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 24: socket write failed
05:54:17.499 [11904] <4> bpbkar Exit: INF - EXIT STATUS 24: socket write failed
05:54:17.499 [11904] <2> bpbkar Exit: INF - Close of stdout complete
05:54:17.499 [11904] <4> bpbkar Exit: INF - setenv FINISHED=0

Above bpbkar logs shows that  at 5:37 when  bpbkar  finished sending data ..it didnot cose the socket and after 16 minutes at 5:54  its timed out

 

 

on days when it is succesful : nothing after finish

 

05:35:56.186 [40463] <2> NBWebsvcTunnelRouter::getTunnelsFromCache: Cache File /usr/openv/netbackup/websvctunnels.cache does not exist.

05:35:56.186 [40463] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from Client router's tunnel servers' cache.

05:35:56.186 [40463] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers. Will try to directly connect to the Web service.

05:35:56.186 [40463] <2> NBClientCURL::performOperation: Received a list of potential tunnels.

05:35:56.186 [40463] <2> NBClientCURL::performOperation: Empty tunnel entry encountered. Will try a direct connection to the Master server.

05:35:56.186 [40463] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.

05:35:56.186 [40463] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.

05:35:56.186 [40463] <2> LoginWithCertManager::getLocalToken: tokenIssueTimeSec = 1566872278

05:35:56.186 [40463] <2> LoginWithCertManager::getLocalToken: tokenExpTimeSec = 1566958678

05:35:56.186 [40463] <2> LoginWithCertManager::getLocalToken: tokenLastFailTimeSec = 0

05:35:56.186 [40463] <2> LoginWithCertManager::isJWTRefreshRequired: tokenExpTimeSec = 1566958678, timeNow = 1566873356, (tokenExpTimeSec-timeNow) = 85322

05:35:56.186 [40463] <2> LoginWithCertManager::isJWTRefreshRequired: Token-refresh IS NOT required

05:35:56.186 [40463] <2> NBClientCURL::performCurlOperation: Obtained login token

05:35:56.252 [40463] <4> NBClientCURL::performCurlOperation: HTTP response header : HTTP/1.1 201  X-Request-ID: 1086FD2A20531977 X-Request-ID: 1086FD2A20531977 Content-Length: 0 Date: Tue, 27 Aug 2019 02:35:56 GMT

05:35:56.252 [40463] <2> NBClientCURL::performCurlOperation: Fetched data = [xxxxx], httpcode = 201

05:35:56.252 [40463] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.

05:35:56.252 [40463] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.

05:35:56.252 [40463] <2> NBClientCURL:~NBClientCURL: Performing curl_easy_cleanup()

05:35:56.252 [40463] <2> NBClientCURL:~NBClientCURL: Performing curl_global_cleanup()

05:35:56.252 [40463] <4> bpbkar Exit: INF - bpbkar exit normal

05:35:56.252 [40463] <4> bpbkar Exit: INF - EXIT STATUS 0: the requested operation was successfully completed

05:35:56.252 [40463] <2> bpbkar Exit: INF - Close of stdout complete

05:35:56.252 [40463] <4> bpbkar Exit: INF - setenv FINISHED=1

 

 

During successful backup : 5 Finished=1 entries

[root@prl-ercenmsvc3 bpbkar]# grep -i FINISHED root.082919_00001.log

05:17:47.137 [12954] <4> bpbkar Exit: INF - setenv FINISHED=1

05:18:19.415 [13541] <4> bpbkar Exit: INF - setenv FINISHED=1

05:18:54.266 [12920] <4> bpbkar Exit: INF - setenv FINISHED=1

05:25:22.667 [12983] <4> bpbkar Exit: INF - setenv FINISHED=1

05:29:12.226 [12670] <4> bpbkar Exit: INF - setenv FINISHED=1

and during failure : after 5 finished=1 entries socket dinot close and one more entry after 16 minutes Finished=0

 

root@prl-ercenmsvc3 bpbkar]# grep -i finished root.082719_00001.log

05:18:51.366 [11239] <4> bpbkar Exit: INF - setenv FINISHED=1

05:19:20.380 [11842] <4> bpbkar Exit: INF - setenv FINISHED=1

05:19:47.308 [11044] <4> bpbkar Exit: INF - setenv FINISHED=1

05:27:24.686 [11187] <4> bpbkar Exit: INF - setenv FINISHED=1

05:37:17.473 [10751] <4> bpbkar Exit: INF - setenv FINISHED=1

05:54:17.499 [11904] <4> bpbkar Exit: INF - setenv FINISHED=0

 

Is my understanding correct that sometimes bpbkar is not closing socket after sending all stream data to media server.

and how can I correct it ?

any suggestions please ?

 

Regards

S

    • symsonu's avatar
      symsonu
      Level 6

      Hi,

       

      There is no firewall in between master and clients.

      Also, do you think we should tune keep alive or

      client should disconnect the socket after it finds that it has sent all stream data to netbackup

       

       

      Regards

      S

      • Marianne's avatar
        Marianne
        Level 6

        I would also look at bpbrm log (at least level 3) on the media server to see what happened w.r.t. client updates received in the period between the 2 timestamps:
        05:37:17.473 [10751] <2> bpbkar Exit: INF - Close of stdout complete
        05:37:17.473 [10751] <4> bpbkar Exit: INF - setenv FINISHED=1
        05:54:17.497 [11904] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 110: Connection timed out

        (PS : The PID's are different, so, they do not seem to be for the same stream - best to post full logs as attachment)