08-29-2019 11:42 PM
Hi ,
My backup policy is taking a backup of 10 different clients and it fails randomly for 3 or 4 with error code 24 and 50as shown below
08/10/2019 03:01:53 - Error bpbrm (pid=28472) from client prl-ercenmsvc2bkp: ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
08/10/2019 03:01:55 - Error bptm (pid=28513) media manager terminated by parent process
08/10/2019 03:02:08 - Info bpbkar (pid=25895) done. status: 24: socket write failed
08/10/2019 03:02:08 - end writing; write time: 1:44:00
socket write failed (24)
bpbkar logs :
05:27:08.827 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/spool/postfix/public/cleanup] is a socket special file. Skipping
05:27:14.968 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/VRTSvcs/uds/.hadsocket] is a socket special file. Skipping
05:27:22.650 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/multipathd.sock] is a socket special file. Skipping
05:27:22.777 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/rpcbind.sock] is a socket special file. Skipping
05:27:22.880 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/libvirt/libvirt-sock-ro] is a socket special file. Skipping
05:27:22.882 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/libvirt/libvirt-sock] is a socket special file. Skipping
05:27:24.587 [11187] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/var/run/dbus/system_bus_socket] is a socket special file. Skipping
05:27:24.626 [11187] <4> bpbkar expand_wildcards: end backup for filelist /.ENM_BACKUP/root_lvm/var
05:27:24.629 [11187] <4> bpbkar main: INF - Client completed sending data for backup
05:27:24.629 [11187] <2> bpbkar main: INF - Total Size:0
05:27:24.629 [11187] <2> accl_post_nbfeds_backup_list: Posting Accurate Licensing data
05:27:24.629 [11187] <4> accl_post_nbfeds_backup_list: Master server name: prl-ercenmombsbkp
05:27:24.629 [11187] <4> post_nbfeds_backup_list: Json String value: [{"virtualName":"","jobid":"","backupId":"prl-ercenmsvc3bkp_1566872326","backupSelection":"/.ENM_BA
CKUP/root_lvm/var","frontEndDataSizeMb":4651}]
05:27:24.629 [11187] <2> post_nbfeds_backup_list: Post data URL: netbackup/catalog/frontenddata
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Curl initialized successfully.
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Performing curl_easy_init()
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Disabling any proxy option set in the environment.
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Using paths as caCertFilePath [/usr/openv/var/webtruststore/cacert.pem] privateKeyPath [/usr/openv/var/vxss/credent
ials/keystore/PrivKeyFile.pem]
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Setting default CURL timeout = [300] seconds
05:27:24.637 [11187] <2> NBClientCURL::NBClientCURL: Disable signal during name resolution.
05:27:24.637 [11187] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: ENABLE_NBCURL_VERBOSE is not set
05:27:24.637 [11187] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_CONNECTIONS is not set
05:27:24.637 [11187] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_PREFERRED_NETWORKS is not set
05:27:24.655 [11187] <2> NBClientCURL::setupCurlOperationTimeout: Setting CURL timeout = [180] seconds
05:27:24.655 [11187] <2> NBClientCURL::Post: POSTing to URL https://prl-ercenmombsbkp:1556/netbackup/catalog/frontenddata
05:27:24.655 [11187] <2> NBClientCURL::Post: Sending payload ...
05:27:24.655 [11187] <2> NBClientCURL::setPostOptions: setting curl post options ...
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Configuration suggests use of tunnel.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Looking up tunnels for master server [prl-ercenmombsbkp].
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's current state = [0].
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's state reset to [0].
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Querying client router for a list of tunnels.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Read NBU configuration settings.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel server from in-memory cache.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from user-specified (override) settings in NBU Configuration.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getTunnelsFromCache: Cache File /usr/openv/netbackup/websvctunnels.cache does not exist.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from Client router's tunnel servers' cache.
05:27:24.656 [11187] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers. Will try to directly connect to the Web service.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Received a list of potential tunnels.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Empty tunnel entry encountered. Will try a direct connection to the Master server.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:27:24.656 [11187] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:27:24.657 [11187] <2> LoginWithCertManager::getLocalToken: tokenIssueTimeSec = 1566856007
05:27:24.657 [11187] <2> LoginWithCertManager::getLocalToken: tokenExpTimeSec = 1566942407
05:27:24.657 [11187] <2> LoginWithCertManager::getLocalToken: tokenLastFailTimeSec = 0
05:27:24.657 [11187] <2> LoginWithCertManager::isJWTRefreshRequired: tokenExpTimeSec = 1566942407, timeNow = 1566872844, (tokenExpTimeSec-timeNow) = 69563
05:27:24.657 [11187] <2> LoginWithCertManager::isJWTRefreshRequired: Token-refresh IS NOT required
05:27:24.657 [11187] <2> NBClientCURL::performCurlOperation: Obtained login token
05:27:24.685 [11187] <4> NBClientCURL::performCurlOperation: HTTP response header : HTTP/1.1 201 X-Request-ID: 280E34AC43382593 X-Request-ID: 280E34AC43382593 Content-
Length: 0 Date: Tue, 27 Aug 2019 02:27:24 GMT
05:27:24.685 [11187] <2> NBClientCURL::performCurlOperation: Fetched data = [xxxxx], httpcode = 201
05:27:24.685 [11187] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:27:24.685 [11187] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:27:24.685 [11187] <2> NBClientCURL:~NBClientCURL: Performing curl_easy_cleanup()
05:27:24.685 [11187] <2> NBClientCURL:~NBClientCURL: Performing curl_global_cleanup()
05:27:24.686 [11187] <4> bpbkar Exit: INF - bpbkar exit normal
05:27:24.686 [11187] <4> bpbkar Exit: INF - EXIT STATUS 0: the requested operation was successfully completed
05:27:24.686 [11187] <2> bpbkar Exit: INF - Close of stdout complete
05:27:24.686 [11187] <4> bpbkar Exit: INF - setenv FINISHED=1
05:28:25.083 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/proxy.d/proxy_helper-outbound_proxy-0-10861.uds] is a socket special file.
Skipping
05:28:25.440 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/vnetd/bpcd.uds] is a socket special file. Skipping
05:28:25.447 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/vnetd/terminate_vnetd.uds] is a socket special file. Skipping
05:28:25.498 [10751] <8> bpbkar process_file: WRN - [/.ENM_BACKUP/root_lvm/root/usr/openv/var/vnetd/terminate_bpcd.uds] is a socket special file. Skipping
05:37:12.549 [10751] <4> check_file_sparseness: File </.ENM_BACKUP/root_lvm/root/etc/openldap/certs/key3.db> is probably sparse. Size = 16384 st_blocks = 24 block_size
= 512, fstype = 37
05:37:12.549 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/key3.db is sparse: stat.st_size = 16384, stat.st_blocks * 512 = 12288
05:37:12.549 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/key3.db is now size 8448
05:37:12.552 [10751] <4> check_file_sparseness: File </.ENM_BACKUP/root_lvm/root/etc/openldap/certs/cert8.db> is probably sparse. Size = 65536 st_blocks = 72 block_size
= 512, fstype = 37
05:37:12.552 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/cert8.db is sparse: stat.st_size = 65536, stat.st_blocks * 512 = 36864
05:37:12.553 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/cert8.db is now size 33024
05:37:12.557 [10751] <4> check_file_sparseness: File </.ENM_BACKUP/root_lvm/root/etc/openldap/certs/secmod.db> is probably sparse. Size = 16384 st_blocks = 24 block_siz
e = 512, fstype = 37
05:37:12.557 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/secmod.db is sparse: stat.st_size = 16384, stat.st_blocks * 512 = 1228
8
05:37:12.558 [10751] <2> bpbkar process_file: INF - /.ENM_BACKUP/root_lvm/root/etc/openldap/certs/secmod.db is now size 5120
05:37:17.417 [10751] <4> bpbkar expand_wildcards: end backup for filelist /.ENM_BACKUP/root_lvm/root
05:37:17.422 [10751] <4> bpbkar main: INF - Client completed sending data for backup
05:37:17.422 [10751] <2> bpbkar main: INF - Total Size:0
05:37:17.422 [10751] <2> accl_post_nbfeds_backup_list: Posting Accurate Licensing data
05:37:17.422 [10751] <4> accl_post_nbfeds_backup_list: Master server name: prl-ercenmombsbkp
05:37:17.422 [10751] <4> post_nbfeds_backup_list: Json String value: [{"virtualName":"","jobid":"","backupId":"prl-ercenmsvc3bkp_1566872317","backupSelection":"/.ENM_BA
CKUP/root_lvm/root","frontEndDataSizeMb":3903}]
05:37:17.422 [10751] <2> post_nbfeds_backup_list: Post data URL: netbackup/catalog/frontenddata
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Curl initialized successfully.
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Performing curl_easy_init()
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Disabling any proxy option set in the environment.
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Using paths as caCertFilePath [/usr/openv/var/webtruststore/cacert.pem] privateKeyPath [/usr/openv/var/vxss/credent
ials/keystore/PrivKeyFile.pem]
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Setting default CURL timeout = [300] seconds
05:37:17.430 [10751] <2> NBClientCURL::NBClientCURL: Disable signal during name resolution.
05:37:17.430 [10751] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: ENABLE_NBCURL_VERBOSE is not set
05:37:17.430 [10751] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_CONNECTIONS is not set
05:37:17.430 [10751] <2> NBClientCURL::nbcurl_bpconf_keyvalue_state: DISABLE_NBCURL_PREFERRED_NETWORKS is not set
05:37:17.441 [10751] <2> NBClientCURL::setupCurlOperationTimeout: Setting CURL timeout = [180] seconds
05:37:17.441 [10751] <2> NBClientCURL::Post: POSTing to URL https://prl-ercenmombsbkp:1556/netbackup/catalog/frontenddata
05:37:17.441 [10751] <2> NBClientCURL::Post: Sending payload ...
05:37:17.441 [10751] <2> NBClientCURL::setPostOptions: setting curl post options ...
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Configuration suggests use of tunnel.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Looking up tunnels for master server [prl-ercenmombsbkp].
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's current state = [0].
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::resetState: Tunnel Router's state reset to [0].
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Querying client router for a list of tunnels.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Read NBU configuration settings.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel server from in-memory cache.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from user-specified (override) settings in NBU Configuration.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getTunnelsFromCache: Cache File /usr/openv/netbackup/websvctunnels.cache does not exist.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from Client router's tunnel servers' cache.
05:37:17.443 [10751] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers. Will try to directly connect to the Web service.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Received a list of potential tunnels.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Empty tunnel entry encountered. Will try a direct connection to the Master server.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:37:17.443 [10751] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:37:17.443 [10751] <2> LoginWithCertManager::getLocalToken: tokenIssueTimeSec = 1566856007
05:37:17.443 [10751] <2> LoginWithCertManager::getLocalToken: tokenExpTimeSec = 1566942407
05:37:17.443 [10751] <2> LoginWithCertManager::getLocalToken: tokenLastFailTimeSec = 0
05:37:17.443 [10751] <2> LoginWithCertManager::isJWTRefreshRequired: tokenExpTimeSec = 1566942407, timeNow = 1566873437, (tokenExpTimeSec-timeNow) = 68970
05:37:17.443 [10751] <2> LoginWithCertManager::isJWTRefreshRequired: Token-refresh IS NOT required
05:37:17.443 [10751] <2> NBClientCURL::performCurlOperation: Obtained login token
05:37:17.473 [10751] <4> NBClientCURL::performCurlOperation: HTTP response header : HTTP/1.1 201 X-Request-ID: A38307B46654BD6E X-Request-ID: A38307B46654BD6E Content-
Length: 0 Date: Tue, 27 Aug 2019 02:37:17 GMT
05:37:17.473 [10751] <2> NBClientCURL::performCurlOperation: Fetched data = [xxxxx], httpcode = 201
05:37:17.473 [10751] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:37:17.473 [10751] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:37:17.473 [10751] <2> NBClientCURL:~NBClientCURL: Performing curl_easy_cleanup()
05:37:17.473 [10751] <2> NBClientCURL:~NBClientCURL: Performing curl_global_cleanup()
05:37:17.473 [10751] <4> bpbkar Exit: INF - bpbkar exit normal
05:37:17.473 [10751] <4> bpbkar Exit: INF - EXIT STATUS 0: the requested operation was successfully completed
05:37:17.473 [10751] <2> bpbkar Exit: INF - Close of stdout complete
05:37:17.473 [10751] <4> bpbkar Exit: INF - setenv FINISHED=1
05:54:17.497 [11904] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
05:54:17.499 [11904] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 24: socket write failed
05:54:17.499 [11904] <4> bpbkar Exit: INF - EXIT STATUS 24: socket write failed
05:54:17.499 [11904] <2> bpbkar Exit: INF - Close of stdout complete
05:54:17.499 [11904] <4> bpbkar Exit: INF - setenv FINISHED=0
Above bpbkar logs shows that at 5:37 when bpbkar finished sending data ..it didnot cose the socket and after 16 minutes at 5:54 its timed out
on days when it is succesful : nothing after finish
05:35:56.186 [40463] <2> NBWebsvcTunnelRouter::getTunnelsFromCache: Cache File /usr/openv/netbackup/websvctunnels.cache does not exist.
05:35:56.186 [40463] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers from Client router's tunnel servers' cache.
05:35:56.186 [40463] <2> NBWebsvcTunnelRouter::getWebsvcTunnelServers: Returning [0] tunnel servers. Will try to directly connect to the Web service.
05:35:56.186 [40463] <2> NBClientCURL::performOperation: Received a list of potential tunnels.
05:35:56.186 [40463] <2> NBClientCURL::performOperation: Empty tunnel entry encountered. Will try a direct connection to the Master server.
05:35:56.186 [40463] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:35:56.186 [40463] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:35:56.186 [40463] <2> LoginWithCertManager::getLocalToken: tokenIssueTimeSec = 1566872278
05:35:56.186 [40463] <2> LoginWithCertManager::getLocalToken: tokenExpTimeSec = 1566958678
05:35:56.186 [40463] <2> LoginWithCertManager::getLocalToken: tokenLastFailTimeSec = 0
05:35:56.186 [40463] <2> LoginWithCertManager::isJWTRefreshRequired: tokenExpTimeSec = 1566958678, timeNow = 1566873356, (tokenExpTimeSec-timeNow) = 85322
05:35:56.186 [40463] <2> LoginWithCertManager::isJWTRefreshRequired: Token-refresh IS NOT required
05:35:56.186 [40463] <2> NBClientCURL::performCurlOperation: Obtained login token
05:35:56.252 [40463] <4> NBClientCURL::performCurlOperation: HTTP response header : HTTP/1.1 201 X-Request-ID: 1086FD2A20531977 X-Request-ID: 1086FD2A20531977 Content-Length: 0 Date: Tue, 27 Aug 2019 02:35:56 GMT
05:35:56.252 [40463] <2> NBClientCURL::performCurlOperation: Fetched data = [xxxxx], httpcode = 201
05:35:56.252 [40463] <2> NBClientCURL::performOperation: Disabling any --proxy option previously set.
05:35:56.252 [40463] <2> NBClientCURL::performOperation: Disabling any --proxytunnel option previously set.
05:35:56.252 [40463] <2> NBClientCURL:~NBClientCURL: Performing curl_easy_cleanup()
05:35:56.252 [40463] <2> NBClientCURL:~NBClientCURL: Performing curl_global_cleanup()
05:35:56.252 [40463] <4> bpbkar Exit: INF - bpbkar exit normal
05:35:56.252 [40463] <4> bpbkar Exit: INF - EXIT STATUS 0: the requested operation was successfully completed
05:35:56.252 [40463] <2> bpbkar Exit: INF - Close of stdout complete
05:35:56.252 [40463] <4> bpbkar Exit: INF - setenv FINISHED=1
During successful backup : 5 Finished=1 entries
[root@prl-ercenmsvc3 bpbkar]# grep -i FINISHED root.082919_00001.log
05:17:47.137 [12954] <4> bpbkar Exit: INF - setenv FINISHED=1
05:18:19.415 [13541] <4> bpbkar Exit: INF - setenv FINISHED=1
05:18:54.266 [12920] <4> bpbkar Exit: INF - setenv FINISHED=1
05:25:22.667 [12983] <4> bpbkar Exit: INF - setenv FINISHED=1
05:29:12.226 [12670] <4> bpbkar Exit: INF - setenv FINISHED=1
and during failure : after 5 finished=1 entries socket dinot close and one more entry after 16 minutes Finished=0
root@prl-ercenmsvc3 bpbkar]# grep -i finished root.082719_00001.log
05:18:51.366 [11239] <4> bpbkar Exit: INF - setenv FINISHED=1
05:19:20.380 [11842] <4> bpbkar Exit: INF - setenv FINISHED=1
05:19:47.308 [11044] <4> bpbkar Exit: INF - setenv FINISHED=1
05:27:24.686 [11187] <4> bpbkar Exit: INF - setenv FINISHED=1
05:37:17.473 [10751] <4> bpbkar Exit: INF - setenv FINISHED=1
05:54:17.499 [11904] <4> bpbkar Exit: INF - setenv FINISHED=0
Is my understanding correct that sometimes bpbkar is not closing socket after sending all stream data to media server.
and how can I correct it ?
any suggestions please ?
Regards
S
08-30-2019 01:55 AM - edited 08-30-2019 01:56 AM
Is there a firewall betweem Netbackup and the clients ?
If yes, you need to set TCP_KEEP_ALIVE on client(s), master and media server(s). The setting is feature in the OS
TCP Keepalive Best Practices - detecting network drops and preventing idle socket timeout
https://www.veritas.com/content/support/en_US/article.100028680.html
09-02-2019 12:16 AM
Hi,
There is no firewall in between master and clients.
Also, do you think we should tune keep alive or
client should disconnect the socket after it finds that it has sent all stream data to netbackup
Regards
S
09-02-2019 06:34 AM
I would also look at bpbrm log (at least level 3) on the media server to see what happened w.r.t. client updates received in the period between the 2 timestamps:
05:37:17.473 [10751] <2> bpbkar Exit: INF - Close of stdout complete
05:37:17.473 [10751] <4> bpbkar Exit: INF - setenv FINISHED=1
05:54:17.497 [11904] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
(PS : The PID's are different, so, they do not seem to be for the same stream - best to post full logs as attachment)