cancel
Showing results for 
Search instead for 
Did you mean: 

backup failing with error code 24 and 50

symsonu
Level 6

Hi ,

My backup policy is taking a backup of 10 different clients and it fails randomly for  3 or 4 with error code 24 and 50as shown below

 

08/10/2019 03:01:53 - Error bpbrm (pid=28472) from client prl-ercenmsvc2bkp: ERR - Cannot write to STDOUT. Errno = 110: Connection timed out

08/10/2019 03:01:55 - Error bptm (pid=28513) media manager terminated by parent process

08/10/2019 03:02:08 - Info bpbkar (pid=25895) done. status: 24: socket write failed

08/10/2019 03:02:08 - end writing; write time: 1:44:00

socket write failed  (24)

 

 

bpbkar  logs :

 

05:27:08.827 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/spool/postfix/public/cleanup] is a socket special file. Skipping
05:27:14.968 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/VRTSvcs/uds/.hadsocket] is a socket special file. Skipping
05:27:22.650 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/multipathd.sock] is a socket special file. Skipping
05:27:22.777 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/rpcbind.sock] is a socket special file. Skipping
05:27:22.880 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/libvirt/libvirt-sock-ro] is a socket special file. Skipping
05:27:22.882 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/libvirt/libvirt-sock] is a socket special file. Skipping
05:27:24.587 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/dbus/system_bus_socket] is a socket special file. Skipping
05:27:24.626 [11187] <4> bpbkar expand_wildcards: end backup for filelist /.ENM_BACKUP/root_lvm/var
05:27:24.629 [11187] <4> bpbkar main: INF - Client completed sending data for backup
05:27:24.629 [11187] <2> bpbkar main: INF - Total Size:0
05:27:24.629 [11187] <2> accl_post_nbfeds_backup_list: Posting Accurate Licensing data
05:27:24.629 [11187] <4> accl_post_nbfeds_backup_list: Master server name: prl-ercenmombsbkp
05:27:24.629 [11187] <4> post_nbfeds_backup_list: Json String value: [{"virtualName":"","jobid":"","backupId":"prl-ercenmsvc3bkp_1566872326","backupSelection":"/.ENM_BA
CKUP/root_lvm/var","frontEndDataSizeMb":4651}]
05:27:24.629 [11187] <2> post_nbfeds_backup_list: Post data URL: netbackup/catalog/frontenddata
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Curl initialized successfully.
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Performing curl_easy_init()
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Disabling any proxy option set in the environment.
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Using paths as caCertFilePath [/usr/openv/var/webtruststore/cacert.pem] privateKeyPath [/usr/openv/var/vxss/credent
ials/keystore/PrivKeyFile.pem]
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Setting default CURL timeout = [300] seconds
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Disable signal during name resolution.
05:27:24.637 [11187] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: ENABLE_NBCURL_VERBOSE is not set
05:27:24.637 [11187] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_CONNECTIONS is not set
05:27:24.637 [11187] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_PREFERRED_NETWORKS is not set
05:27:24.655 [11187] <2> NBClientCURL::setupCurlOperationTimeout: Setting CURL timeout = [180] seconds
05:27:24.655 [11187] <2> NBClientCURL::Post: POSTing to URL https://prl-ercenmombsbkp:1556/netbackup/catalog/frontenddata
05:27:24.655 [11187] <2> NBClientCURL::Post: Sending payload ...
05:27:24.655 [11187] <2> NBClientCURL::setPostOptions: setting curl post options ...
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Configuration suggests use of tunnel.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Looking up tunnels for master server [prl-ercenmombsbkp].
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's current state = [0].
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's state reset to [0].
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Querying client router for a list of tunnels.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Read NBU configuration settings.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel server from in-memory cache.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from user-specified (override) settings in NBU Configuration.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getTunnelsFromCache: Cache File /usr/openv/netbackup/websvctunnels.cache does not exist.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from Client router's tunnel servers' cache.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers. Will try to directly connect to the Web service.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Received a list of potential tunnels.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Empty tunnel entry encountered. Will try a direct connection to the Master server.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:27:24.657 [11187] <2> LoginWithCertManager::getLocalToken: tokenIssueTimeSec = 1566856007
05:27:24.657 [11187] <2> LoginWithCertManager::getLocalToken: tokenExpTimeSec = 1566942407
05:27:24.657 [11187] <2> LoginWithCertManager::getLocalToken: tokenLastFailTimeSec = 0
05:27:24.657 [11187] <2> LoginWithCertManager::isJWTRefreshRequired: tokenExpTimeSec = 1566942407, timeNow = 1566872844, (tokenExpTimeSec-timeNow) = 69563
05:27:24.657 [11187] <2> LoginWithCertManager::isJWTRefreshRequired: Token-refresh IS NOT required
05:27:24.657 [11187] <2> NBClientCURL::performCurlOperation: Obtained login token
05:27:24.685 [11187] <4> NBClientCURL::performCurlOperation: HTTP response header : HTTP/1.1 201 X-Request-ID: 280E34AC43382593 X-Request-ID: 280E34AC43382593 Content-
Length: 0 Date: Tue, 27 Aug 2019 02:27:24 GMT
05:27:24.685 [11187] <2> NBClientCURL::performCurlOperation: Fetched data = [xxxxx], httpcode = 201
05:27:24.685 [11187] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:27:24.685 [11187] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:27:24.685 [11187] <2> NBClientCURL:~NBClientCURL: Performing curl_easy_cleanup()
05:27:24.685 [11187] <2> NBClientCURL:~NBClientCURL: Performing curl_global_cleanup()
05:27:24.686 [11187] <4> bpbkar Exit: INF - bpbkar exit normal
05:27:24.686 [11187] <4> bpbkar Exit: INF - EXIT STATUS 0: the requested operation was successfully completed
05:27:24.686 [11187] <2> bpbkar Exit: INF - Close of stdout complete
05:27:24.686 [11187] <4> bpbkar Exit: INF - setenv FINISHED=1
05:28:25.083 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/proxy.d/proxy_helper-outbound_proxy-0-10861.uds] is a socket special file.
Skipping
05:28:25.440 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/vnetd/bpcd.uds] is a socket special file. Skipping
05:28:25.447 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/vnetd/terminate_vnetd.uds] is a socket special file. Skipping
05:28:25.498 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/vnetd/terminate_bpcd.uds] is a socket special file. Skipping
05:37:12.549 [10751] <4> check_file_sparseness: File </.ENM_BACKUP/root_lvm/root/etc/openldap/certs/key3.db> is probably sparse. Size = 16384 st_blocks = 24 block_size
= 512, fstype = 37
05:37:12.549 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/key3.db is sparse: stat.st_size = 16384, stat.st_blocks * 512 = 12288
05:37:12.549 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/key3.db is now size 8448
05:37:12.552 [10751] <4> check_file_sparseness: File </.ENM_BACKUP/root_lvm/root/etc/openldap/certs/cert8.db> is probably sparse. Size = 65536 st_blocks = 72 block_size
= 512, fstype = 37
05:37:12.552 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/cert8.db is sparse: stat.st_size = 65536, stat.st_blocks * 512 = 36864
05:37:12.553 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/cert8.db is now size 33024
05:37:12.557 [10751] <4> check_file_sparseness: File </.ENM_BACKUP/root_lvm/root/etc/openldap/certs/secmod.db> is probably sparse. Size = 16384 st_blocks = 24 block_siz
e = 512, fstype = 37
05:37:12.557 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/secmod.db is sparse: stat.st_size = 16384, stat.st_blocks * 512 = 1228
8
05:37:12.558 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/secmod.db is now size 5120
05:37:17.417 [10751] <4> bpbkar expand_wildcards: end backup for filelist /.ENM_BACKUP/root_lvm/root
05:37:17.422 [10751] <4> bpbkar main: INF - Client completed sending data for backup
05:37:17.422 [10751] <2> bpbkar main: INF - Total Size:0
05:37:17.422 [10751] <2> accl_post_nbfeds_backup_list: Posting Accurate Licensing data
05:37:17.422 [10751] <4> accl_post_nbfeds_backup_list: Master server name: prl-ercenmombsbkp
05:37:17.422 [10751] <4> post_nbfeds_backup_list: Json String value: [{"virtualName":"","jobid":"","backupId":"prl-ercenmsvc3bkp_1566872317","backupSelection":"/.ENM_BA
CKUP/root_lvm/root","frontEndDataSizeMb":3903}]
05:37:17.422 [10751] <2> post_nbfeds_backup_list: Post data URL: netbackup/catalog/frontenddata
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Curl initialized successfully.
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Performing curl_easy_init()
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Disabling any proxy option set in the environment.
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Using paths as caCertFilePath [/usr/openv/var/webtruststore/cacert.pem] privateKeyPath [/usr/openv/var/vxss/credent
ials/keystore/PrivKeyFile.pem]
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Setting default CURL timeout = [300] seconds
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Disable signal during name resolution.
05:37:17.430 [10751] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: ENABLE_NBCURL_VERBOSE is not set
05:37:17.430 [10751] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_CONNECTIONS is not set
05:37:17.430 [10751] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_PREFERRED_NETWORKS is not set
05:37:17.441 [10751] <2> NBClientCURL::setupCurlOperationTimeout: Setting CURL timeout = [180] seconds
05:37:17.441 [10751] <2> NBClientCURL::Post: POSTing to URL https://prl-ercenmombsbkp:1556/netbackup/catalog/frontenddata
05:37:17.441 [10751] <2> NBClientCURL::Post: Sending payload ...
05:37:17.441 [10751] <2> NBClientCURL::setPostOptions: setting curl post options ...
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Configuration suggests use of tunnel.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Looking up tunnels for master server [prl-ercenmombsbkp].
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's current state = [0].
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's state reset to [0].
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Querying client router for a list of tunnels.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Read NBU configuration settings.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel server from in-memory cache.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from user-specified (override) settings in NBU Configuration.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getTunnelsFromCache: Cache File /usr/openv/netbackup/websvctunnels.cache does not exist.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from Client router's tunnel servers' cache.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers. Will try to directly connect to the Web service.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Received a list of potential tunnels.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Empty tunnel entry encountered. Will try a direct connection to the Master server.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:37:17.443 [10751] <2> LoginWithCertManager::getLocalToken: tokenIssueTimeSec = 1566856007
05:37:17.443 [10751] <2> LoginWithCertManager::getLocalToken: tokenExpTimeSec = 1566942407
05:37:17.443 [10751] <2> LoginWithCertManager::getLocalToken: tokenLastFailTimeSec = 0
05:37:17.443 [10751] <2> LoginWithCertManager::isJWTRefreshRequired: tokenExpTimeSec = 1566942407, timeNow = 1566873437, (tokenExpTimeSec-timeNow) = 68970
05:37:17.443 [10751] <2> LoginWithCertManager::isJWTRefreshRequired: Token-refresh IS NOT required
05:37:17.443 [10751] <2> NBClientCURL::performCurlOperation: Obtained login token
05:37:17.473 [10751] <4> NBClientCURL::performCurlOperation: HTTP response header : HTTP/1.1 201 X-Request-ID: A38307B46654BD6E X-Request-ID: A38307B46654BD6E Content-
Length: 0 Date: Tue, 27 Aug 2019 02:37:17 GMT
05:37:17.473 [10751] <2> NBClientCURL::performCurlOperation: Fetched data = [xxxxx], httpcode = 201
05:37:17.473 [10751] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:37:17.473 [10751] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:37:17.473 [10751] <2> NBClientCURL:~NBClientCURL: Performing curl_easy_cleanup()
05:37:17.473 [10751] <2> NBClientCURL:~NBClientCURL: Performing curl_global_cleanup()
05:37:17.473 [10751] <4> bpbkar Exit: INF - bpbkar exit normal
05:37:17.473 [10751] <4> bpbkar Exit: INF - EXIT STATUS 0: the requested operation was successfully completed
05:37:17.473 [10751] <2> bpbkar Exit: INF - Close of stdout complete
05:37:17.473 [10751] <4> bpbkar Exit: INF - setenv FINISHED=1
05:54:17.497 [11904] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
05:54:17.499 [11904] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 24: socket write failed
05:54:17.499 [11904] <4> bpbkar Exit: INF - EXIT STATUS 24: socket write failed
05:54:17.499 [11904] <2> bpbkar Exit: INF - Close of stdout complete
05:54:17.499 [11904] <4> bpbkar Exit: INF - setenv FINISHED=0

Above bpbkar logs shows that  at 5:37 when  bpbkar  finished sending data ..it didnot cose the socket and after 16 minutes at 5:54  its timed out

 

 

on days when it is succesful : nothing after finish

 

05:35:56.186 [40463] <2> NBWebsvcTunnelRouter::getTunnelsFromCache: Cache File /usr/openv/netbackup/websvctunnels.cache does not exist.

05:35:56.186 [40463] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from Client router's tunnel servers' cache.

05:35:56.186 [40463] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers. Will try to directly connect to the Web service.

05:35:56.186 [40463] <2> NBClientCURL::performOperation: Received a list of potential tunnels.

05:35:56.186 [40463] <2> NBClientCURL::performOperation: Empty tunnel entry encountered. Will try a direct connection to the Master server.

05:35:56.186 [40463] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.

05:35:56.186 [40463] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.

05:35:56.186 [40463] <2> LoginWithCertManager::getLocalToken: tokenIssueTimeSec = 1566872278

05:35:56.186 [40463] <2> LoginWithCertManager::getLocalToken: tokenExpTimeSec = 1566958678

05:35:56.186 [40463] <2> LoginWithCertManager::getLocalToken: tokenLastFailTimeSec = 0

05:35:56.186 [40463] <2> LoginWithCertManager::isJWTRefreshRequired: tokenExpTimeSec = 1566958678, timeNow = 1566873356, (tokenExpTimeSec-timeNow) = 85322

05:35:56.186 [40463] <2> LoginWithCertManager::isJWTRefreshRequired: Token-refresh IS NOT required

05:35:56.186 [40463] <2> NBClientCURL::performCurlOperation: Obtained login token

05:35:56.252 [40463] <4> NBClientCURL::performCurlOperation: HTTP response header : HTTP/1.1 201  X-Request-ID: 1086FD2A20531977 X-Request-ID: 1086FD2A20531977 Content-Length: 0 Date: Tue, 27 Aug 2019 02:35:56 GMT

05:35:56.252 [40463] <2> NBClientCURL::performCurlOperation: Fetched data = [xxxxx], httpcode = 201

05:35:56.252 [40463] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.

05:35:56.252 [40463] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.

05:35:56.252 [40463] <2> NBClientCURL:~NBClientCURL: Performing curl_easy_cleanup()

05:35:56.252 [40463] <2> NBClientCURL:~NBClientCURL: Performing curl_global_cleanup()

05:35:56.252 [40463] <4> bpbkar Exit: INF - bpbkar exit normal

05:35:56.252 [40463] <4> bpbkar Exit: INF - EXIT STATUS 0: the requested operation was successfully completed

05:35:56.252 [40463] <2> bpbkar Exit: INF - Close of stdout complete

05:35:56.252 [40463] <4> bpbkar Exit: INF - setenv FINISHED=1

 

 

During successful backup : 5 Finished=1 entries

[root@prl-ercenmsvc3 bpbkar]# grep -i FINISHED root.082919_00001.log

05:17:47.137 [12954] <4> bpbkar Exit: INF - setenv FINISHED=1

05:18:19.415 [13541] <4> bpbkar Exit: INF - setenv FINISHED=1

05:18:54.266 [12920] <4> bpbkar Exit: INF - setenv FINISHED=1

05:25:22.667 [12983] <4> bpbkar Exit: INF - setenv FINISHED=1

05:29:12.226 [12670] <4> bpbkar Exit: INF - setenv FINISHED=1

and during failure : after 5 finished=1 entries socket dinot close and one more entry after 16 minutes Finished=0

 

root@prl-ercenmsvc3 bpbkar]# grep -i finished root.082719_00001.log

05:18:51.366 [11239] <4> bpbkar Exit: INF - setenv FINISHED=1

05:19:20.380 [11842] <4> bpbkar Exit: INF - setenv FINISHED=1

05:19:47.308 [11044] <4> bpbkar Exit: INF - setenv FINISHED=1

05:27:24.686 [11187] <4> bpbkar Exit: INF - setenv FINISHED=1

05:37:17.473 [10751] <4> bpbkar Exit: INF - setenv FINISHED=1

05:54:17.499 [11904] <4> bpbkar Exit: INF - setenv FINISHED=0

 

Is my understanding correct that sometimes bpbkar is not closing socket after sending all stream data to media server.

and how can I correct it ?

any suggestions please ?

 

Regards

S

3 REPLIES 3

Nicolai
Moderator
Moderator
Partner    VIP   

Is there a firewall betweem Netbackup and the clients ?

If yes, you need to set TCP_KEEP_ALIVE on client(s), master and media server(s). The setting is feature in the OS

TCP Keepalive Best Practices - detecting network drops and preventing idle socket timeout

https://www.veritas.com/content/support/en_US/article.100028680.html

Hi,

 

There is no firewall in between master and clients.

Also, do you think we should tune keep alive or

client should disconnect the socket after it finds that it has sent all stream data to netbackup

 

 

Regards

S

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I would also look at bpbrm log (at least level 3) on the media server to see what happened w.r.t. client updates received in the period between the 2 timestamps:
05:37:17.473 [10751] <2> bpbkar Exit: INF - Close of stdout complete
05:37:17.473 [10751] <4> bpbkar Exit: INF - setenv FINISHED=1
05:54:17.497 [11904] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 110: Connection timed out

(PS : The PID's are different, so, they do not seem to be for the same stream - best to post full logs as attachment)