01-31-2013 08:58 AM
We have MSSQL jobs that run for application and logs backups. From time to time I will get a failure with Status 23 but when I look at the log it shows all succeeded and none failed. I've seen other's posts with this problem and Symantec's response via EEB and future releases. We are running: Master Server: 7.1.0.4 on RHEL, Media Servers: 7.1.0.4 on Win 2008 R2 and the clients are MSSQL 2008 (patched) running 7.1 and 7.1.0.4. What does this mean? Are the backups actually cataloged or is the catalog info discarded due to the Status > 1? See the AM details excerpts below.
01/27/2013 04:48:55 - Info dbclient (pid=4416) INF - OPERATION #12 of batch C:\Program Files\Veritas\NetBackup\DbExt\MsSql\FULL.bch SUCCEEDED with STATUS 0 (0 is normal). Elapsed time = 31(0) seconds
01/27/2013 04:48:56 - Info dbclient (pid=4416) INF - ODBC return code <2>, SQL State <01000>, SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]10 percent processed.>.
01/27/2013 04:48:56 - Info dbclient (pid=4416) INF - SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]20 percent processed.>
01/27/2013 04:48:56 - Info dbclient (pid=4416) INF - SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]30 percent processed.>
01/27/2013 04:48:56 - Info dbclient (pid=4416) INF - SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]40 percent processed.>
01/27/2013 04:48:56 - Info dbclient (pid=4416) INF - SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]50 percent processed.>
01/27/2013 04:48:56 - Info dbclient (pid=4416) INF - SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]60 percent processed.>
01/27/2013 04:48:56 - Info dbclient (pid=4416) INF - SQL Message <3211><[Microsoft][ODBC SQL Server Driver][SQL Server]70 percent processed.>
01/27/2013 04:49:02 - Info dbclient (pid=4416) INF - Thread has been closed for stripe #0
01/27/2013 04:49:04 - Info dbclient (pid=4416) INF - OPERATION #11 of batch C:\Program Files\Veritas\NetBackup\DbExt\MsSql\FULL.bch SUCCEEDED with STATUS 0 (0 is normal). Elapsed time = 42(0) seconds
01/27/2013 04:49:44 - Info dbclient (pid=4416) INF - Results of executing <C:\Program Files\Veritas\NetBackup\DbExt\MsSql\FULL.bch>:
01/27/2013 04:49:44 - Info dbclient (pid=4416) <14> operations succeeded. <0> operations failed.
Solved! Go to Solution.
04-03-2013 04:01 AM
01-31-2013 10:52 AM
Are the backups actually cataloged or is the catalog info discarded due to the Status > 1? See the AM details excerpts below.
any backup ends with status code 0 or 1 haves the catalog stoted in database, other codes do not.
you would probably needs to confirm this by checking the image database and by doing the test restore..
03-08-2013 11:45 AM
Sorry for not being back here in over a month. I was expecting consultants to help with this error but hasn't happened yet.
The backups are all confirmed as successful and verified as such by the db admin thus in the catalog:
01/27/2013 04:49:44 - Info dbclient (pid=4416) <14> operations succeeded. <0> operations failed
I suspect each db job is recorded at it's individual status which is status 0 even though the job finishes reporting status 23. It's like it finishes all the work then breaks the socket before reporting overall job success.
No solution yet :)
03-08-2013 12:03 PM
My long guess is that you are not backing up all databases.
maybe there is one or more databases that fails with error 23, before a job appear at the activity monitor and the parent job fails with the same error.
count your databases and the backup jobs and check if I'm right. Do not count the parent job
The better way to check if a database backup is valid is to go to catalog find the backups and run a verification. If you still have doubts you can do an alternate restore.
03-08-2013 12:15 PM
Done all that plus the point in time restore with DB admin. Went line by line through the Detailed status and verified every backup (db) was accounted for and showed status 0. Also filtered on the Parent Job ID, highlighted all jobs and the count at the top was the number of db's plus the parent. Brought my colleague's in to look at it as well for anything I missed and nothing was found. The db admin confirmed the number of databases on the servers. There are generally around 8 servers out of 29 that this is happening to. All servers are running from the same policy. I have another person tracking if all servers are always the same ones that get the status 23s and will followup regarding that angle.
03-08-2013 12:30 PM
Some other long shots.
Does the 8 servers physical servers and you have implement network bonding? If yes, try to deactivate it on one server.
Does the servers have multiple networks? Check that netbackup can communicate with only one network or force it, for test.
Check the bpcd logs of all involving servers.
And finally, I imaging that you have already increase to the sky the timeouts on server, media and clients.
04-02-2013 01:34 PM
The consultants I thought could help have not come through yet.
What log files might provide me the best clues? Verbosity and Debug? And which ones on which servers:
Client logs:
Media Server logs:
Master Server logs:
04-02-2013 02:11 PM
Logs that I would have a look at:
client: dbclient
media server: bpbrm and bptm
bpbrm will contain update info back to the master server.
Question: is there a firewall anywhere in the picture?
We have seen this where a firewall between the media server and the master times out, causing status 23 or 24.
See this (very) old TN: http://www.symantec.com/docs/TECH91271
This one is for status 24, but similar issue: http://www.symantec.com/docs/TECH145234
04-02-2013 08:31 PM
Have you checked the backed up databases in restore window?
Also let us know that have you checked to restore them. Please provide us parent node detailed status along with required logs(Updated by Marianne).
04-03-2013 04:01 AM