Highlighted

SQL Server backups fail with status 6, secondary status 150

I have already looked at

http://www.symantec.com/connect/forums/sql-agent-backup-jobs-end-status-6-or-150 (which reasonably summarises the behavior I see)

https://support.symantec.com/en_US/article.HOWTO105736.html (how to set the client read timeout, among other things)

The specific database transaction log has generally been failing every 2 hours during its window, for days.  I had some successes last night and this morning.

The most recent successful job reported that it wrote 1,362,181 kb over 5 files, which sounds like it was a 1.3 GB transaction log (my DBA confirmed the size for me).

The log from the most recent failed job follows:

06/23/2015 08:14:52 - Info nbjm (pid=13262) starting backup job (jobid=942582) for client dbserver.example.com, policy MSSQL-DBSERVER-TRANS, schedule Default-Application-Backup
06/23/2015 08:14:52 - Info nbjm (pid=13262) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=942582, request id:{73C64FC0-19A1-11E5-A7BD-35CD8B58A6C1})
06/23/2015 08:14:52 - requesting resource MS-SQL-TAPE
06/23/2015 08:14:52 - requesting resource master-server.example.com.NBU_CLIENT.MAXJOBS.dbserver.example.com
06/23/2015 08:14:52 - requesting resource master-server.example.com.NBU_POLICY.MAXJOBS.MSSQL-DBSERVER-TRANS
06/23/2015 08:14:52 - granted resource  master-server.example.com.NBU_CLIENT.MAXJOBS.dbserver.example.com
06/23/2015 08:14:52 - granted resource  master-server.example.com.NBU_POLICY.MAXJOBS.MSSQL-DBSERVER-TRANS
06/23/2015 08:14:52 - granted resource  0544L5
06/23/2015 08:14:52 - granted resource  HP.ULTRIUM5-SCSI.000
06/23/2015 08:14:52 - granted resource  media-server-hcart2-robot-tld-4
06/23/2015 08:14:52 - estimated 0 kbytes needed
06/23/2015 08:14:52 - Info nbjm (pid=13262) started backup (backupid=dbserver.example.com_1435061692) job for client dbserver.example.com, policy MSSQL-DBSERVER-TRANS, schedule Default-Application-Backup on storage unit media-server-hcart2-robot-tld-4
06/23/2015 08:15:41 - Info bpbrm (pid=8288) dbserver.example.com is the host to backup data from
06/23/2015 08:15:41 - Info bptm (pid=8290) using 262144 data buffer size
06/23/2015 08:15:41 - Info bpbrm (pid=8288) telling media manager to start backup on client
06/23/2015 08:15:41 - mounted 0544L5
06/23/2015 08:15:41 - positioning 0544L5 to file 2157
06/23/2015 08:15:46 - Info bptm (pid=8290) using 256 data buffers
06/23/2015 08:15:46 - Info bpbrm (pid=8288) spawning a brm child process
06/23/2015 08:15:46 - Info bpbrm (pid=8288) child pid: 8401
06/23/2015 08:15:51 - Info bpbrm (pid=8288) sending bpsched msg: CONNECTING TO CLIENT FOR dbserver.example.com_1435061692
06/23/2015 08:15:51 - Info bpbrm (pid=8288) listening for client connection
06/23/2015 08:15:51 - connecting
06/23/2015 08:15:56 - Info bpbrm (pid=8288) INF - Client read timeout = 3600
06/23/2015 08:15:57 - Info bpbrm (pid=8288) accepted connection from client
06/23/2015 08:15:57 - Info dbclient (pid=4264) Backup started
06/23/2015 08:15:57 - Info bpbrm (pid=8288) Sending the file list to the client
06/23/2015 08:15:57 - Info dbclient (pid=4264) dbclient(pid=4264) wrote first buffer(size=65536)
06/23/2015 08:15:57 - Info bptm (pid=8400) setting receive network buffer to 262144 bytes
06/23/2015 08:15:57 - connected; connect time: 0:00:00
06/23/2015 08:15:57 - positioned 0544L5; position time: 0:00:16
06/23/2015 08:15:57 - begin writing
06/23/2015 08:17:28 - Info dbclient (pid=4264) done. status: 6
06/23/2015 08:17:28 - Info bpbrm (pid=8288) sending message to media manager: STOP BACKUP dbserver.example.com_1435061692
06/23/2015 08:17:29 - Info bpbrm (pid=8288) media manager for backup id dbserver.example.com_1435061692 exited with status 150: termination requested by administrator
06/23/2015 08:17:29 - end writing; write time: 0:01:32
the backup failed to back up the requested files  (6)

The relevant snippet from the SQL server's dbclient log is:

08:15:57.234 [4264.6944] <4> serverResponse: Not a candidate for alternate buffer method: multiplexing is in use.
08:16:17.627 [4264.4464] <4> CODBCaccess::LogODBCerr: INF - ODBC return code <2>, SQL State <01000>, SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]10 percent processed.>.
08:16:17.631 [4264.4464] <2> vnet_pbxConnect: pbxConnectEx Succeeded
08:16:17.632 [4264.4464] <2> logconnections: BPRD CONNECT FROM 192.168.1.26.51564 TO 192.168.2.249.1556 fd = 1056
08:17:04.938 [4264.4464] <4> CODBCaccess::LogODBCerr: INF - ODBC return code <2>, SQL State <01000>, SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]20 percent processed.>.
08:17:04.951 [4264.4464] <2> vnet_pbxConnect: pbxConnectEx Succeeded
08:17:04.952 [4264.4464] <2> logconnections: BPRD CONNECT FROM 192.168.1.26.51566 TO 192.168.2.249.1556 fd = 1068
08:17:28.757 [4264.6944] <16> writeToServer: ERR - send() to server on socket failed:
08:17:28.757 [4264.6944] <16> dbc_put: ERR - failed sending data to server
08:17:28.868 [4264.6944] <16> VxBSASendData: ERR - Could not do a bsa_put().
08:17:28.870 [4264.6944] <4> getServerName: Read server name from nb_master_config: master-server.example.com
08:17:28.882 [4264.6944] <2> vnet_pbxConnect: pbxConnectEx Succeeded
08:17:28.882 [4264.6944] <2> logconnections: BPRD CONNECT FROM 192.168.1.26.51567 TO 192.168.2.249.1556 fd = 1072
08:17:46.518 [4264.4464] <4> CODBCaccess::LogODBCerr: INF - SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]30 percent processed.>
08:17:46.530 [4264.4464] <2> vnet_pbxConnect: pbxConnectEx Succeeded
08:17:46.530 [4264.4464] <2> logconnections: BPRD CONNECT FROM 192.168.1.26.51577 TO 192.168.2.249.1556 fd = 880
08:18:15.125 [4264.6944] <4> getServerName: Read server name from nb_master_config: master-server.example.com
08:18:15.127 [4264.6944] <2> vnet_pbxConnect: pbxConnectEx Succeeded
08:18:15.128 [4264.6944] <2> logconnections: BPRD CONNECT FROM 192.168.1.26.51579 TO 192.168.2.249.1556 fd = 1056
08:18:35.203 [4264.4464] <4> CODBCaccess::LogODBCerr: INF - SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]40 percent processed.>
08:18:35.216 [4264.4464] <2> vnet_pbxConnect: pbxConnectEx Succeeded

Based on the log entries, my research suggested that the client timeout setting might be too low.  So I checked my timeout settings, which currently are as follows for both my master and my media servers:

  1. Client connect timeout:  3600 seconds
  2. Client read timeout:  3600 seconds
  3. Backup start notify timeout:  1800 seconds
  4. Backup end notify timeout:  1800 seconds
  5. File browse timeout:  7200 seconds
  6. Media server connect timeout:  1800 seconds

which seem reasonably high--particularly since the elapsed time between when the job started writing and when it returned a status of 6 was only a minute and a half (90 seconds).

Any suggestions what I should check next?

1 Reply

Is it possible to extract all

Is it possible to extract all dbclient entries for PID 4264 ? 

You may need to increase logging level (level 3 or higher), as the only error we see in above extract is a network error:

08:17:28.757 [4264.6944] <16> writeToServer: ERR - send() to server on socket failed:
08:17:28.757 [4264.6944] <16> dbc_put: ERR - failed sending data to server 
08:17:28.868 [4264.6944] <16> VxBSASendData: ERR - Could not do a bsa_put().

 

Please also ensure that bptm and bpbrm log folders exist on the media server.