NetBackup 8.3 - Active Directory GRT is Broken
Hi All,
Just found another bug in NetBackup 8.3 related to AD GRT jobs. Apparently the image duplication phase where the real mapping should happen between the Backup Image and AD is failing miserably. I noticed couple of things while I was observing the entire process. For the moment I'm not sure if the issue is with the NBU client only but I will make few more tests including a client downgrade to rule out possibilities.
1. When the initial "System State" backup is being taken from a Domain Controller, the jobs sometimes fail with the following message:
error encountered while attempting to get additional files for system state:\
Obviously there is also an indication that initial GRT information assesment was unsuccessful. Unfortunately this behavior doesn't seem to be very consistent over the jobs so the failure seems to appear randomly. This have happened on 4 different DC's some of which are RODC. Once I managed to get a healthy image from the DC's I moved on to the duplication part.
2. During the duplication phase I noticed that the "nblbc.exe" binary is constantly consuming an entire CPU core for a while but it is not yet clear what happens in the background. The binary runs like that for about 20-30 minutes non-stop then it terminates with the following message within the "ncflbc" log folder:
0,51216,158,351,52,1596142489535,6192,1152,0:,53:SpsRestoreCurrentVer:False (../ResourceChild.cpp:719),28:ResourceChildBEDS::_attach(),1
0,51216,159,351,1,1596144159553,6192,1152,0:,89:write() failed, wrote 0 of 276, error 232 (Unknown error) (../TransporterConsole.cpp:219),27:TransporterConsole::write(),1
0,51216,159,351,2,1596144159568,6192,1152,0:,256:
An Exception of type [FileWriteException] has occured at:
Module: ../TransporterConsole.cpp, Function: TransporterConsole::write(), Line: 225
File:*Console*
OS Error: 22 (The device does not recognize the command.
)
(../TransporterConsole.cpp:225),27:TransporterConsole::write(),1
0,51216,311,351,1,1596144159568,6192,1152,0:,69:exception in m_transport->write() (../BRMObserverDepreciated.cpp:407),29:BRMObserverDepreciated::write,1
0,51216,159,351,3,1596144159568,6192,1152,0:,50:Terminate Signal called . (../TfiExitEvent.cpp:43),22:TfiExitEvent::signal(),1
2,51216,309,351,128,1596144159568,6192,1152,0:,0:,0:,2,(37|)
2,51216,309,351,129,1596144159615,6192,1152,0:,0:,0:,2,(37|)
1,51216,309,351,130,1596144159615,6192,1152,0:,0:,38:ConfigDataIterator::metadataIdCallback,3,(35|A21:3:Error_Messages_Max;|A2:10|)
1,51216,309,351,131,1596144159615,6192,1152,0:,0:,38:ConfigDataIterator::metadataIdCallback,3,(35|A25:3:CLIENT_CONNECT_TIMEOUT;|A3:300|)
After about another 15 minutes the duplication job times out in NBU console with status code 191 and the following error:
Jul 30, 2020 11:37:44 PM - Error bpduplicate (pid=11388) db_IMAGE() failed: database system error (220)
Jul 30, 2020 11:37:44 PM - Error bpduplicate (pid=11388) Status = no images were successfully processed.
Jul 30, 2020 11:37:44 PM - Error bpduplicate (pid=11388) Duplicate of backupid sns01dcvm04.skynet.local_1596138967 failed, database system error (220).
Jul 30, 2020 11:37:44 PM - Error bpduplicate (pid=11388) Status = no images were successfully processed.
Jul 30, 2020 11:37:44 PM - end Duplicate; elapsed time 0:43:17
3. When I try to view the content of the images throuhg "BAR" client, I have partial success. Although I'm able to open the "System State" and "Active Directory" folders any subsequent tries further through the OU's are making the client to get stuck with "Communicating with server" messages. Those folders that I managed to open were indicated throuh respective restore jobs in NBU console but no further jobs have been generated. Once the timeout has been reached approximately 5 minutes later. another "ERROR: file read failed" message appeared on the screen.
I haven't yet turned on debugging for "bpbkar" or "ncflbc" logs but I will collect those in case NBU client 8.2 wouldn't solve these issues.
Also I noticed that it doesn't matter between which type of storage pools I would engage the deduplication. I tried Adv. Disk to Adv. Disk and also Adv. Disk to MSDP but the result is all the same. As it seems the process is stuck somewhere on the client side.
My GRT AD implementation used to work perfectly with NBU 8.2 and except for the recent upgrade to NBU 8.3 nothing else has changed in the environment.
Any suggestions on the debug logs or other aspects are highly welcomed.
Cheers.