I upgraded my NetBackup domain to 9.0 recently (Master/Media on RHEL 7.9). This backup domain performs DB backup primarily. While upgrade the clients, I noticed that a couple of DBs using an MSSQL intelligent policy are taking longer. On checking, I found that the longer duration was immediately after the client upgrade. This behaviour was not observed for traditional MSSQL backups.
In the detailed status, the relevant text is copied below. You can see that it takes 300s to accept the connection.
21-Jun-2021 2:46:42 PM - started process bpbrm (pid=32769) 21-Jun-2021 2:46:43 PM - Info bpbrm (pid=32769) client.fqdn is the host to backup data from 21-Jun-2021 2:46:43 PM - Info bpbrm (pid=32769) reading file list for client 21-Jun-2021 2:46:44 PM - connecting 21-Jun-2021 2:46:44 PM - Info bpbrm (pid=32769) listening for client connection 21-Jun-2021 2:51:45 PM - Info bpbrm (pid=32769) INF - Client read timeout = 7200 21-Jun-2021 2:51:46 PM - Info bpbrm (pid=32769) accepted connection from client
and before the upgrade, the detailed status shows that it took much less time to accept the connection.
21-Jun-2021 1:46:40 PM - Info bpbrm (pid=14292) client.fqdn is the host to backup data from 21-Jun-2021 1:46:40 PM - Info bpbrm (pid=14292) reading file list for client 21-Jun-2021 1:46:40 PM - connecting 21-Jun-2021 1:46:40 PM - Info bpbrm (pid=14292) listening for client connection 21-Jun-2021 1:47:11 PM - Info bpbrm (pid=14292) INF - Client read timeout = 7200 21-Jun-2021 1:47:12 PM - Info bpbrm (pid=14292) accepted connection from client
Has anyone see this? Any suggestions?
It might be worth checking out the client-side logs to see if maybe it's doing additional work now vs previously due to the SIP stuff. Extra security checks, time waiting for the OS to do some new task, etc.
Anything interesting in bpcd or dbclient ?
It does seem odd that it's adding exactly 5 minutes though - that's the sort of thing that normally screams timeout. If you were to run a vanilla bch script backup as a test (of say the MASTER database), would it still have that 5 minute delay ? Might be interesting to know.
Apologies for a late update. Other things came up and did get much cycles to troubleshoot this issue (as the backup works and is only slower).
Checked dbclient, bpcd, bphdb, and bpbrm and didn't find anything around that time when nothing was happening. Finally, opened a case with support and spent 3hrs investigating. One discovery was that the issue is not only for Intelligent policy (MSSQL) as the behaviour was same when we ran a sample bch script.
Waiting for support to come back after consulting with backline.
Just an update that support is still working on the case. It has been established that the issue happens only with the MSSQL Intelligent policy when it uses Encryption.
PS: Updating late as we were managing capacity issue with the backup system for the last 10 days.