NetBackup SQL job failures - network connection timed out 41
Hi All, This is not a query for help but a note on a solution to a rather pesky problem with NetBackup SQL job failures and status message "network connection timed out 41". While it took me some time to figure out the root cause behind these failures, in the end it turned out to be nothing related to the usual suspects such as DNS/Hosts resolution, network bottlenecks or similar issues. I've been checking literally all aspects of all involved components (NBU Master, Media, Client, SQL DB, networking, etc.) but unfortunately nothing useful appeared in any of the logs. The most misleading part was when tracing error messages such as: ERR - Error in VxBSACreateObject: 3. CONTINUATION: - System detected error, operation aborted. ERR - Error in GetCommand: 0x80770004. DBMS MSG - ODBC return code <-1>, SQL State <37000>, SQL Message <3202><[Microsoft][SQL Server Native Client 11.0][SQL Server]Write on "VNBU0-9280-5168-1609373275" failed: 995(The I/O operation has been aborted because of either a thread exit or an application request.)>. CONTINUATION: - An abort request is preventing anything except termination actions. INFO Server Status: Communication with the server has not been initiated or the server status has not been retrieved from the server. INFO Error in VxBSACreateObject: 3. INFO System detected error, operation aborted. INFO Error in GetCommand: 0x80770004. INFO An abort request is preventing anything except termination actions. INFO ODBC return code <-1>, SQL State <37000>, SQL Message <3202><[Microsoft][SQL Server Native Client 11.0][SQL Server]Write on "VNBU0-8380-13048-1609410056" failed: 995(The I/O operation has been aborted because of either a thread exit or an application request.)>. ...and also SQL errors like: 2020-12-30 23:52:19.11 Backup BackupIoRequest::ReportIoError: write failure on backup device 'VNBU0-9808-5248-1609367774'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.). 2020-12-30 23:52:19.11 Backup Error: 3041, Severity: 16, State: 1. 2020-12-30 23:52:19.11 Backup BACKUP failed to complete the command BACKUP DATABASE DB01. Check the backup application log for detailed messages. 2020-12-30 23:52:19.11 spid56 Error: 18210, Severity: 16, State: 1. 2020-12-30 23:52:19.11 spid56 BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device 'VNBU0-9808-5248-1609367774'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.). 2020-12-31 00:00:19.10 spid31s This instance of SQL Server has been using a process ID of 8844 since 12/30/2020 10:23:10 PM (local) 12/30/2020 9:23:10 PM (UTC). This is an informational message only; no user action is required. 2020-12-31 00:10:03.71 Backup Error: 3041, Severity: 16, State: 1. Unfortunately neither "dbclient" nor "bprd" provided any useful indication on what's going on, but at the same time taking SQL backups natively via SQL Management Studio worked just fine. So, long story short... If you encounter such messages and you are using SQL instances in a VMware virtualized environment, make sure to check also against your "VSS appliaction quiescing" settings. In case you may have this set to "disabled", then this could be very much the source of the issue as it was in my case. In normal circumstances quiescing shouldn't be disabled at all, however I've seen many discussions over the years where users are suggesting to do so in order to avoid a long existing problem with failures over generic VM backups about which vendors are fingerpointing to each other but no effective long-term solution has been made available yet. The process of disabling VSS quiesced application based snapshots is outlined on the link below where in my case I had a legacy VM which was applied with the "tools.conf" option. https://kb.vmware.com/s/article/2146204 After removing the "vss.disableAppQuiescing = true" line from the "tools.conf" file the backups started to work immediately. I didn't test if disabling the disk UUID would end up with the same results but if I get a spare moment, then I may actually check it and report back here on the results. Perhaps this whole scenario should be tested by NetBackup engineers and in case they can confirm the behavior then a side note on the below link would be appropriate: https://www.veritas.com/content/support/en_US/doc/44037985-142651971-0/v15097395-142651971 Cheers1.3KViews1like0CommentsVeritas takes virtual to the next level with NetBackup 8.3
NetBackup has taken virtual data protection to the next level with the recent release of NetBackup 8.3, designed to optimize security, recover at speed, unify cloud capabilities, simplify even the most dynamic environments, and supercharge performance even in large environments.4.1KViews3likes0CommentsNetBackup 8.3 - Exchange Policy Bugs
Hi All, Just upgraded my NetBackup environment from 8.2 to 8.3 and I noticed the following bugs with Exchange policies. 1. When I try to create a new policy I simply can't finish the procedure because of the following error: Please select a valid snapshot method using 'Snapshot Client Options'. Now when I hit the corresponding "Options" button on the policy screen, nothing happens and I'm not able to see the usual screen where the VSS method is chosen by default with it's corresponding parameters. Everytime I hit the "Options" button literally nothing happens and the only way to save the policy is by unticking the "Perform snapshot backups" checkbox which at least will let you hit the "OK" button at the bottom. When I reopen the same policy, the "Perform snapshot backups" checkbox is once again ticked but still there is no way to modify anything within the "Options" section. 2. When I run the newly created policy, I receive the following error message: ThistypeofbackupisnotsupportedonthisversionofExchange theclienttypeisincorrectintheconfigurationdatabase (72) By my best knowledge both Windows Server 2019 and Exchange 2019 CU6 are fully supported and my policy is set just as per the expected rules according to the NBU documentation. So all in all right now it is impossible to backup Exchange with the new NBU 8.3 version. Any suggestions you may have are highly welcome. For the record I don't have active support contract with Veritas therefore I can't open a support ticket with the vendor.5.3KViews0likes18CommentsNetBackup 8.3 - Active Directory GRT is Broken
Hi All, Just found another bug in NetBackup 8.3 related to AD GRT jobs. Apparently the image duplication phase where the real mapping should happen between the Backup Image and AD is failing miserably. I noticed couple of things while I was observing the entire process. For the moment I'm not sure if the issue is with the NBU client only but I will make few more tests including a client downgrade to rule out possibilities. 1. When the initial "System State" backup is being taken from a Domain Controller, the jobs sometimes fail with the following message: error encountered while attempting to get additional files for system state:\ Obviously there is also an indication that initial GRT information assesment was unsuccessful. Unfortunately this behavior doesn't seem to be very consistent over the jobs so the failure seems to appear randomly. This have happened on 4 different DC's some of which are RODC. Once I managed to get a healthy image from the DC's I moved on to the duplication part. 2. During the duplication phase I noticed that the "nblbc.exe" binary is constantly consuming an entire CPU core for a while but it is not yet clear what happens in the background. The binary runs like that for about 20-30 minutes non-stop then it terminates with the following message within the "ncflbc" log folder: 0,51216,158,351,52,1596142489535,6192,1152,0:,53:SpsRestoreCurrentVer:False (../ResourceChild.cpp:719),28:ResourceChildBEDS::_attach(),1 0,51216,159,351,1,1596144159553,6192,1152,0:,89:write() failed, wrote 0 of 276, error 232 (Unknown error) (../TransporterConsole.cpp:219),27:TransporterConsole::write(),1 0,51216,159,351,2,1596144159568,6192,1152,0:,256: An Exception of type [FileWriteException] has occured at: Module: ../TransporterConsole.cpp, Function: TransporterConsole::write(), Line: 225 File:*Console* OS Error: 22 (The device does not recognize the command. ) (../TransporterConsole.cpp:225),27:TransporterConsole::write(),1 0,51216,311,351,1,1596144159568,6192,1152,0:,69:exception in m_transport->write() (../BRMObserverDepreciated.cpp:407),29:BRMObserverDepreciated::write,1 0,51216,159,351,3,1596144159568,6192,1152,0:,50:Terminate Signal called . (../TfiExitEvent.cpp:43),22:TfiExitEvent::signal(),1 2,51216,309,351,128,1596144159568,6192,1152,0:,0:,0:,2,(37|) 2,51216,309,351,129,1596144159615,6192,1152,0:,0:,0:,2,(37|) 1,51216,309,351,130,1596144159615,6192,1152,0:,0:,38:ConfigDataIterator::metadataIdCallback,3,(35|A21:3:Error_Messages_Max;|A2:10|) 1,51216,309,351,131,1596144159615,6192,1152,0:,0:,38:ConfigDataIterator::metadataIdCallback,3,(35|A25:3:CLIENT_CONNECT_TIMEOUT;|A3:300|) After about another 15 minutes the duplication job times out in NBU console with status code 191 and the following error: Jul30,202011:37:44PM-Errorbpduplicate(pid=11388)db_IMAGE()failed:databasesystemerror(220) Jul30,202011:37:44PM-Errorbpduplicate(pid=11388)Status=noimagesweresuccessfullyprocessed. Jul30,202011:37:44PM-Errorbpduplicate(pid=11388)Duplicateofbackupidsns01dcvm04.skynet.local_1596138967failed,databasesystemerror(220). Jul30,202011:37:44PM-Errorbpduplicate(pid=11388)Status=noimagesweresuccessfullyprocessed. Jul30,202011:37:44PM-endDuplicate;elapsedtime0:43:17 3. When I try to view the content of the images throuhg "BAR" client, I have partial success. Although I'm able to open the "System State" and "Active Directory" folders any subsequent tries further through the OU's are making the client to get stuck with "Communicating with server" messages. Those folders that I managed to open were indicated throuh respective restore jobs in NBU console but no further jobs have been generated. Once the timeout has been reached approximately 5 minutes later. another "ERROR: file read failed" message appeared on the screen. I haven't yet turned on debugging for "bpbkar" or "ncflbc" logs but I will collect those in case NBU client 8.2 wouldn't solve these issues. Also I noticed that it doesn't matter between which type of storage pools I would engage the deduplication. I tried Adv. Disk to Adv. Disk and also Adv. Disk to MSDP but the result is all the same. As it seems the process is stuck somewhere on the client side. My GRT AD implementation used to work perfectly with NBU 8.2 and except for the recent upgrade to NBU 8.3 nothing else has changed in the environment. Any suggestions on the debug logs or other aspects are highly welcomed. Cheers.1.6KViews0likes3Comments