Backup Failure on Multiple Clients with Exit Status 40
I have received this case over a month ago and have been wracking the mind of my senior engineer at a solution. We appear to be at a standstill and possibly a fresh set of eyes may help. You may see that we're missing a part of the conversation in the attached logs, we've attempted to get the right logs three times. Each time we had to wait until the completion of a backup, which took a long time. Unfortunately, I cannot go back and ask for more logs from the customer, he's been patient thus far for a resolution, not for more logs...understandably so. During a meeting we saw the customer conducts a checkpoint every 7 minutes. One of his clients, bkoweb26, began to backup successfully after creating an exclusions list. Also noticed a timeout of infinity in bpbrm, see snippet. We have made suggestions and recommendations, below. Some were made, but not all. Snippet of the backup log: Mar 28, 2020 5:53:14 AM - Error bpbrm (pid=10289380) db_FLISTsend failed: network connection broken (40) Mar 28, 2020 5:53:15 AM - Info bpbrm (pid=7471230) sending message to media manager: STOP BACKUP bkoweb26_1585391368 Mar 28, 2020 5:53:17 AM - Info bpbrm (pid=7471230) media manager for backup id bkoweb26_1585391368 exited with status 150: termination requested by administrator Mar 28, 2020 5:53:17 AM - end writing; write time: 0:23:42 network connection broken (40) Snippet of Media bpbrm 00:07:52.377 [18415728.1] <2> db_getdata: timeout is 0 (infinite) 00:07:52.390 [18415728.1] <2> db_end: Need to collect reply 00:07:52.390 [18415728.1] <2> db_getdata: timeout is 0 (infinite) Environment info: Master: nmbackup01, configured on third-party, NBU version 8.1.1, Platform: AIX ver. 7.1 Media: nmbpmed05, configured on third-party server, NBU version 8.1.1, Platform AIX ver. 7.1 Clients: nmocmi02, bkoweb26 and bkoweb25 Logs attached: Clients (nmocmi02, nmsplkstore01): bpbkar Master (nmbackup01): bpbrm, bptm Media (nmbmed03, nmbpmed05): bpbrm, bptm Recommendations: Increase checkpoints to every 60min. (not done) Decrease timeouts to 7200 instead of infinite (not done). I'm thinking these areCLIENT_READ_TIMEOUT or CLIENT_CONNECT_TIMEOUT = 7200 in bp.conf Since exclusions helped one client succeed, I wondered if the test in this technote could apply to this situation:https://www.veritas.com/support/en_US/article.100003560 Check for communication related patches (tried but did not find any online) Run bppllist and bpplinfo on media. Run bpgetconfig from a failing and a successful client, to compare. (not done) Any other recommendation you may have, will be very much appreciated!3KViews0likes11Comments