cancel
Showing results for 
Search instead for 
Did you mean: 

SQL database backup is getting failed running through SQL agent

Born2rise
Level 4

SQL database backup is getting failed which is running through SQL database agent. We are unable to capture the backup of all database once it tiggers as per its schedule but we are able to take backup if we trigger the backup of failed Dbs saparately(Some time in single attempt and some time in mutiple attempts).But still we are facing issue on 2-3 database on which we are unable to capture backup since 1 month.SQL host has 36 database which has size in TBs(Approx 20 TB)

We are using batch file given below to capture the Dbs.

OPERATION BACKUP
BATCHSIZE 4
STRIPES 2
DATABASE $ALL
SQLHOST "SQLHostName"
NBSERVER "MasterServer"
MAXTRANSFERSIZE 0
BLOCKSIZE 7
POLICY Policy_Name
NUMBUFS 2
ENDOPER TRUE

Plateform:

Windows 2008 R2

Microsoft SQL Server 2008 R2

Master server and client: Netbackup 7.6.0.1.

 

Error in Detail Jobs of failed Db

Error bptm (pid=13503) system call failed - Connection reset by peer (at child.c.1306)
05/11/2015 15:43:15 - Error bptm (pid=13503) unable to perform read from client socket, connection may have been broken
05/11/2015 15:43:15 - Error bptm (pid=13494) media manager terminated by parent process

the backup failed to back up the requested files  (6)

 

And in SQL error logs

BackupIoRequest::ReportIoError: write failure on backup device 'VNBU0-xxx-xxxxxx'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).

18 REPLIES 18

Roger_C
Level 4
Employee

When you run Automatic Schedules - is this during peak backup times ie. 12AM?

If so, consider running the manual backup around that time as you will mostly likely get the same errors.

If that is the case then you should look to schedule the backup windows on that SQL Host not to clash with other backups on the Master.

Not really a lot to go on otherwise, I would consider looking at the dbclient on the SQL node to see if you get any significant errors coming back on SQL API. They might indicate an issue.

For testing minimise the number of Databases ie do not use $ALL.

Create new scripts highlighting the Databases say half and see if you get any progress that way.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
We often see that large databases fail due to default 5 minute timeout. Increase Client Connect and Client Read timeout on the media server to 1800. Create dbclient log folder on the client and increase logging level to 3 for further troubleshooting. Create bptm and bpbrm logs on the media server.

Born2rise
Level 4

Hi Roger,

Schedule backup is running during off hours and there is minimum loads on production.

I have trigger the backup of failed Dbs manually but it is not getting completed. I am unable to schedule the backup of this SQL host when there is no other backup running on master server due to very large number of backup jobs running on master server.

 

Marianne: I have increase the client connect and read timeout on media server upto 32000 but no result.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
You will need dbclient log to troubleshoot.

Roger_C
Level 4
Employee

Agreed, populate the Troubleshooting TAB and reviewing the dbclient has to be the way forward:

Invoke the "Backup, Archive and Restore" GUI. --> Troubleshooting
Set the following Debug Levels
General = 2
Verbose = 5
Database = 9

Review the dbclient - - if you can narrrow the down the main error then post on here.

Born2rise
Level 4

Here is the DB client log

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Level 0 log? You will need higher log level as indicated above. Please copy log file to dbclient.txt and upload as File attachment.

Born2rise
Level 4

Dbclient log file with high logging uploaded in attachment

Roger_C
Level 4
Employee

I'm afraid this log does not look in Verbose still,

Did you carry out this on the SQL client:
Invoke the "Backup, Archive and Restore" GUI. --> Troubleshooting
Set the following Debug Levels
General = 2
Verbose = 5
Database = 9


Before and around 3pm - you are multiple lines of the following:
15:03:43.127 [7492.6660] <16> serverResponse: ERR - server exited with status 230: the specified policy does not exist in the configuration database
15:03:43.127 [7492.6660] <16> CreateNewImage: ERR - serverResponse() failed

Can you ensure that the batch script is going to a validated policy?
It really is impossible to make an accurate count becuase log is so low.

Increase the logging and just run one backup attempt and send in the dbclient.

Born2rise
Level 4

Hi Roger,

Log level increased arround 5:00 PM and schedule backup triggered at 6:00 PM.

Please check the logs after 5.

Roger_C
Level 4
Employee

The only failure you had past 6pm was one of the database failed for <Policy_SQL_Daily>,
still looks in low verbosity.

dbclient.log
18:08:22.499 [7228.8760] <16> CODBCaccess::LogODBCerr: DBMS MSG - SQL Message <3013><[Microsoft][ODBC SQL Server Driver][SQL Server]BACKUP DATABASE is terminating abnormally.>
18:08:22.511 [7228.8760] <2> vnet_pbxConnect: pbxConnectEx Succeeded
18:08:22.511 [7228.8760] <2> logconnections: BPRD CONNECT FROM X.X.X.X.64258 TO X.X.X.X.1556 fd = 1028
18:08:22.575 [7228.8760] <16> Dbbackrec::PerformNBOperation: ERR - Error found executing <backup database "Database1" to VIRTUAL_DEVICE='VNBU0-7228-8760-1431986485', VIRTUAL_DEVICE='VNBU1-7228-8760-1431986485' with  stats = 10, blocksize = 65536, maxtransfersize = 65536, buffercount = 4, differential>.
18:08:22.586 [7228.8760] <2> vnet_pbxConnect: pbxConnectEx Succeeded

You have "Database1" setup in the bch script?
That logical database doesn't exist in your environment. So look to remove it.

Born2rise
Level 4

Hey i was out for a while.

I have again uploaded the dbclient log with high verbosity=3.

 

Both full backup and INC backup is getting failed for 2 databases. I observed, INC backup is getting failed exactly writing same byte count approx 100 Gb and taking 1 hour time. Not sure as getting failure in full as well. Do not have good backup since long.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Any reason why the log file starts at 09:29? 

It starts with a network error.
10053 is a TCP error.
You need to get network team involved or look at TCP tuning at OS-level on SQL client and media server.

 

Michael_G_Ander
Level 6
Certified

Some times increasing the VDITIMEOUT in the bch script and CLIENT_READ_TIMEOUT, CLIENT_CONNECT_TIMEOUT on the client and media server can help with this kind of problem. If it is not possible to find anything on the network/OS side.

 

 

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

Born2rise
Level 4
Michael, I did not try with VDItimeout yet. Let me try the backup with it. Lets see the result.

Born2rise
Level 4
Hi Marriane, i did not notice it. Backup of other databases with has size prox 600 is working fine. Lets w8 for backup result triggered with VDItimeout in bch

Born2rise
Level 4

VDItimeout did not worked.

I have found the below mention warning only on two problematic databases. These are showing in parent SQL backup job and it is thrown prior to backup start writing on any database or i can say prior to generate stripes.

Info bpbrm (pid=26784) reading file list for client
05/29/2015 18:00:49 - Info bpbrm (pid=26784) starting bphdb on client
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database10..sysfiles for filegroup Archive (group id 2)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database10..sysfiles for filegroup BKT1 (group id 3)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database10..sysfiles for filegroup BKT14 (group id 4)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database10..sysfiles for filegroup BKT213 (group id 5)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database10..sysfiles for filegroup BKTMNLY (group id 6)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database20..sysfiles for filegroup BKT1 (group id 4)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database20..sysfiles for filegroup BKT14 (group id 5)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database20..sysfiles for filegroup BKT213 (group id 6)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database20..sysfiles for filegroup BKTMNLY (group id 7)
05/29/2015 18:00:50 - Info bphdb (pid=4792) Backup started
05/29/2015 18:00:52 - Info dbclient (pid=8060) INF - BACKUP STARTED USING

Born2rise
Level 4

This issue has been resolved by making the TCP tuning on client and change the paramter to 0 like maxtransfer speed,block size and numbuffer 1.