SQL Restore stay in "begin reading" for more than ...

robertoaxity · ‎12-04-2022

Hello,

I try to restore a BD SQL (250GB) from in the same server, but the restore stuck in "begin reading" for more than 3 hours and then fail with the status 2850 but in the detail job is Error 62 timer expired, i change the "client read time out" in the media and the client many time ( 3000, 3600, 9000, 10800 and 30000) and in all this case the job fail.

So i dont know what happens because the backup job SQL in the server work fine, but the restore not.

Detail job

04-12-2022 12:16:22 - mounting DE0027
04-12-2022 12:16:23 - Info bptm (pid=70162) INF - Waiting for mount of media id DE0027 on server lprpaplnetbk.resp.derco.cl for reading.
04-12-2022 12:17:18 - mounted DE0027; mount time: 0:00:56
04-12-2022 12:17:18 - Info bptm (pid=70162) DE0027
04-12-2022 12:17:19 - Info bptm (pid=70162) INF - Waiting for positioning of media id DE0027 on server lprpaplnetbk.resp.derco.cl for reading.
04-12-2022 12:17:19 - positioning DE0027 to file 565
04-12-2022 12:18:24 - positioned DE0027; position time: 0:01:05
04-12-2022 12:18:24 - begin reading
04-12-2022 15:18:26 - restored from image dcqlc02sql10.resp.derco.cl_1667026834; restore time: 3:02:11
04-12-2022 15:18:27 - Error bpbrm (pid=70192) socket read failed: errno = 62 - Timer expired
04-12-2022 15:18:27 - Info bpbrm (pid=70130) sending message to media manager: STOP RESTORE dcqlc02sql10.resp.derco.cl_1667026834
04-12-2022 15:18:27 - Info bpbrm (pid=70130) media manager for backup id dcqlc02sql10.resp.derco.cl_1667026834 exited with status 150: termination requested by administrator
04-12-2022 15:18:27 - end Restore; elapsed time 3:02:15
Restore error (2850)

Netbackup version is 8.3.0.2.

Can you help me please?

Regards

Nicolai · ‎12-05-2022

hi @robertoaxity

Long reading times can because the image is stored on a a multiplex image. If by change the other "multiplexing partners" are high volume, high speed, then a lot of tape has to be read to extract the restore data.

Timer Expired: Do you have a firewall between the SQL server and the Netbackup environment ?

Have you set CLIENT_READTIME on both master, media and client ?

Michal_Mikulik1 · ‎12-05-2022

Hello,

yes this is a well-known problem with SQL restores. Last month we encountered the same issue and here the CLIENT_READ_TIMEOUT = 5400 sec was enough for 1.5TB database (housed in a virtual machine).

250GB is a size of the backup itself, or it is a preallocated size of the db datafiles? We must tailor client_read_timeout to the preallocated size. Try to create an empty database (not related to the restored one) with for example 100GB datafile and how long the creation will take on the same system, this could give you a hint about required timeout.

Regards

Michal

robertoaxity · ‎12-05-2022

Hello Nicolai,

The firewall in the SQL client is off, and the Master o Media server the selinux, firewall and iptables are off.

Regards

robertoaxity · ‎12-05-2022

Hello Michal,

The 250GB is the real size with BD and Logs, the last change in client_read_timeout was 30000 and fail.

04-12-2022 15:28:42 - mounting DE0027
04-12-2022 15:28:43 - Info bptm (pid=146091) INF - Waiting for mount of media id DE0027 on server lprpaplnetbk.resp.derco.cl for reading.
04-12-2022 15:29:21 - mounted DE0027; mount time: 0:00:39
04-12-2022 15:29:21 - Info bptm (pid=146091) DE0027
04-12-2022 15:29:22 - Info bptm (pid=146091) INF - Waiting for positioning of media id DE0027 on server lprpaplnetbk.resp.derco.cl for reading.
04-12-2022 15:29:22 - positioning DE0027 to file 566
04-12-2022 15:30:44 - positioned DE0027; position time: 0:01:22
04-12-2022 15:30:44 - begin reading
04-12-2022 23:50:44 - restored from image dcqlc02sql10.resp.derco.cl_1667026840; restore time: 8:22:10
04-12-2022 23:50:46 - Error bpbrm (pid=146098) socket read failed: errno = 62 - Timer expired
04-12-2022 23:50:46 - end Restore; elapsed time 8:22:14
04-12-2022 23:50:47 - Info bpbrm (pid=146088) sending message to media manager: STOP RESTORE dcqlc02sql10.resp.derco.cl_1667026840
04-12-2022 23:50:47 - Info bpbrm (pid=146088) media manager for backup id dcqlc02sql10.resp.derco.cl_1667026840 exited with status 150: termination requested by administrator
Restore error (2850)

I dont know why fail but i need restore the BD.

Regards.

robertoaxity · ‎12-05-2022

Hello,

Any update? the restore continue failing.

Thanks very much

Regards

Nicolai · ‎12-07-2022

Hi @robertoaxity

When I mentioned firewall, I ment network firewalls.

If you do have any in between the path from client to master/media you must set TCP_KEEPALIVE

TCP Keepalive Best Practices - detecting network drops and preventing idle socket timeout

https://www.veritas.com/support/en_US/article.100028680

If we take a look at the text below, it looks like backup image is restored at 23:50:44, but two seconds after you get the timer expired. 2 seconds could match Netbackup trying connect on port 1556, failback to 13724 before last try at 13782.

04-12-2022 23:50:44 - restored from image dcqlc02sql10.resp.derco.cl_1667026840; restore time: 8:22:10
04-12-2022 23:50:46 - Error bpbrm (pid=146098) socket read failed: errno = 62 - Timer expired

try

1: Crank CLIENT_READ_TIMEOUT up to 12 hours on master/media and client (just to rule it out)

2: set TCP_KEEPALIVE. It has no impact on performance.

Azhar4 · ‎12-07-2022

we faced such issues before but below steps as part of our SQL installation helped in mitigiating these delays

Local Security Policy must have Administrators added to “Perform volume maintenance tasks”

Administrators group on the machine must have <machinename>_ADMIN account added 9 basically DB admin account )

The user that the SQL Services are running as must be added to the above group, so this will need to be reviewed for each machine for the correct account.

VOX

SQL Restore stay in "begin reading" for more than 6 hours