05-11-2015 03:17 PM
SQL database backup is getting failed which is running through SQL database agent. We are unable to capture the backup of all database once it tiggers as per its schedule but we are able to take backup if we trigger the backup of failed Dbs saparately(Some time in single attempt and some time in mutiple attempts).But still we are facing issue on 2-3 database on which we are unable to capture backup since 1 month.SQL host has 36 database which has size in TBs(Approx 20 TB)
We are using batch file given below to capture the Dbs.
OPERATION BACKUP
BATCHSIZE 4
STRIPES 2
DATABASE $ALL
SQLHOST "SQLHostName"
NBSERVER "MasterServer"
MAXTRANSFERSIZE 0
BLOCKSIZE 7
POLICY Policy_Name
NUMBUFS 2
ENDOPER TRUE
Plateform:
Windows 2008 R2
Microsoft SQL Server 2008 R2
Master server and client: Netbackup 7.6.0.1.
Error in Detail Jobs of failed Db
Error bptm (pid=13503) system call failed - Connection reset by peer (at child.c.1306)
05/11/2015 15:43:15 - Error bptm (pid=13503) unable to perform read from client socket, connection may have been broken
05/11/2015 15:43:15 - Error bptm (pid=13494) media manager terminated by parent process
the backup failed to back up the requested files (6)
And in SQL error logs
BackupIoRequest::ReportIoError: write failure on backup device 'VNBU0-xxx-xxxxxx'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).
05-16-2015 03:19 AM
When you run Automatic Schedules - is this during peak backup times ie. 12AM?
If so, consider running the manual backup around that time as you will mostly likely get the same errors.
If that is the case then you should look to schedule the backup windows on that SQL Host not to clash with other backups on the Master.
Not really a lot to go on otherwise, I would consider looking at the dbclient on the SQL node to see if you get any significant errors coming back on SQL API. They might indicate an issue.
For testing minimise the number of Databases ie do not use $ALL.
Create new scripts highlighting the Databases say half and see if you get any progress that way.
05-16-2015 05:54 AM
05-16-2015 07:32 AM
Hi Roger,
Schedule backup is running during off hours and there is minimum loads on production.
I have trigger the backup of failed Dbs manually but it is not getting completed. I am unable to schedule the backup of this SQL host when there is no other backup running on master server due to very large number of backup jobs running on master server.
Marianne: I have increase the client connect and read timeout on media server upto 32000 but no result.
05-16-2015 09:24 AM
05-16-2015 11:47 PM
Agreed, populate the Troubleshooting TAB and reviewing the dbclient has to be the way forward:
Invoke the "Backup, Archive and Restore" GUI. --> Troubleshooting
Set the following Debug Levels
General = 2
Verbose = 5
Database = 9
Review the dbclient - - if you can narrrow the down the main error then post on here.
05-17-2015 06:06 AM
Here is the DB client log
05-17-2015 09:57 PM
05-19-2015 12:07 PM
Dbclient log file with high logging uploaded in attachment
05-21-2015 03:58 AM
I'm afraid this log does not look in Verbose still,
Did you carry out this on the SQL client:
Invoke the "Backup, Archive and Restore" GUI. --> Troubleshooting
Set the following Debug Levels
General = 2
Verbose = 5
Database = 9
Before and around 3pm - you are multiple lines of the following:
15:03:43.127 [7492.6660] <16> serverResponse: ERR - server exited with status 230: the specified policy does not exist in the configuration database
15:03:43.127 [7492.6660] <16> CreateNewImage: ERR - serverResponse() failed
Can you ensure that the batch script is going to a validated policy?
It really is impossible to make an accurate count becuase log is so low.
Increase the logging and just run one backup attempt and send in the dbclient.
05-21-2015 03:14 PM
Hi Roger,
Log level increased arround 5:00 PM and schedule backup triggered at 6:00 PM.
Please check the logs after 5.
05-22-2015 02:08 AM
The only failure you had past 6pm was one of the database failed for <Policy_SQL_Daily>,
still looks in low verbosity.
dbclient.log
18:08:22.499 [7228.8760] <16> CODBCaccess::LogODBCerr: DBMS MSG - SQL Message <3013><[Microsoft][ODBC SQL Server Driver][SQL Server]BACKUP DATABASE is terminating abnormally.>
18:08:22.511 [7228.8760] <2> vnet_pbxConnect: pbxConnectEx Succeeded
18:08:22.511 [7228.8760] <2> logconnections: BPRD CONNECT FROM X.X.X.X.64258 TO X.X.X.X.1556 fd = 1028
18:08:22.575 [7228.8760] <16> Dbbackrec::PerformNBOperation: ERR - Error found executing <backup database "Database1" to VIRTUAL_DEVICE='VNBU0-7228-8760-1431986485', VIRTUAL_DEVICE='VNBU1-7228-8760-1431986485' with stats = 10, blocksize = 65536, maxtransfersize = 65536, buffercount = 4, differential>.
18:08:22.586 [7228.8760] <2> vnet_pbxConnect: pbxConnectEx Succeeded
You have "Database1" setup in the bch script?
That logical database doesn't exist in your environment. So look to remove it.
05-28-2015 03:16 PM
Hey i was out for a while.
I have again uploaded the dbclient log with high verbosity=3.
Both full backup and INC backup is getting failed for 2 databases. I observed, INC backup is getting failed exactly writing same byte count approx 100 Gb and taking 1 hour time. Not sure as getting failure in full as well. Do not have good backup since long.
05-29-2015 03:35 AM
Any reason why the log file starts at 09:29?
It starts with a network error.
10053 is a TCP error.
You need to get network team involved or look at TCP tuning at OS-level on SQL client and media server.
05-29-2015 04:27 AM
Some times increasing the VDITIMEOUT in the bch script and CLIENT_READ_TIMEOUT, CLIENT_CONNECT_TIMEOUT on the client and media server can help with this kind of problem. If it is not possible to find anything on the network/OS side.
05-29-2015 04:27 PM
05-29-2015 04:30 PM
05-29-2015 06:42 PM
VDItimeout did not worked.
I have found the below mention warning only on two problematic databases. These are showing in parent SQL backup job and it is thrown prior to backup start writing on any database or i can say prior to generate stripes.
Info bpbrm (pid=26784) reading file list for client
05/29/2015 18:00:49 - Info bpbrm (pid=26784) starting bphdb on client
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database10..sysfiles for filegroup Archive (group id 2)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database10..sysfiles for filegroup BKT1 (group id 3)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database10..sysfiles for filegroup BKT14 (group id 4)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database10..sysfiles for filegroup BKT213 (group id 5)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database10..sysfiles for filegroup BKTMNLY (group id 6)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database20..sysfiles for filegroup BKT1 (group id 4)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database20..sysfiles for filegroup BKT14 (group id 5)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database20..sysfiles for filegroup BKT213 (group id 6)
05/29/2015 18:00:49 - Info dbclient (pid=8060) WARN - No files were found in Database20..sysfiles for filegroup BKTMNLY (group id 7)
05/29/2015 18:00:50 - Info bphdb (pid=4792) Backup started
05/29/2015 18:00:52 - Info dbclient (pid=8060) INF - BACKUP STARTED USING
06-30-2015 03:11 PM
This issue has been resolved by making the TCP tuning on client and change the paramter to 0 like maxtransfer speed,block size and numbuffer 1.