Solved: Duplications randomly appear to hang

Maj_Hazard1 · ‎03-11-2022

I don't know if this is a Netbackup issue or more a 5150 Flex Issue. I'll post here first and redirect if necessary.

3 weeks ago I upgraded a regional office from it's own Master server to a 5150 Flex appliance configured as a media server and connected to our Master. All my full backups in that regional office duplicate back to another media server's MSDP for disaster recovery protection via SLPs.

The problem I am running into is that random duplication jobs appear to hang with no progress. I've left some of these jobs running for as long as 5 days with no completion nor an error message. The problem is not specific to any one client as it's random. The target MSDP does have full completed images from previous duplications so the deduplication percentage should be above 95%.

5150 Flex 2.1 and Netbackup 8.2 on all Media servers as well as Master.

Target MSDP is Hosted on a Windows 2012 R2 server. Master server is the same OS.

Patches and EEbs are current.

I did open a ticket with Veritas 2 weeks ago which has been escalated but I'm not getting much back from them yet.

I have two other 5150s in different offices duplicating to the same MSDP target without issue.

Possible incorrect networking and firewall configurations have been ruled out.

I'll post the details from the Activity Monitor for a job that has been stuck for almost 24 hours. It's just over 1.7 TB but the same client has previously duplicated in under 60 mins.

Host file entries and DNS have been confirmed as correct.

I can't wait to hear what I've done wrong or missed as this must be a configuration issue.. right ??

Any suggestions or logs that I should be looking at would be appreciated.

10-Mar-2022 12:31:27 PM - begin Duplicate
10-Mar-2022 12:31:27 PM - requesting resource  LCM_364
10-Mar-2022 12:31:27 PM - granted resource  LCM_364
10-Mar-2022 12:31:27 PM - started process RUNCMD (pid=13452)
10-Mar-2022 12:31:27 PM - ended process 0 (pid=13452)
10-Mar-2022 12:31:27 PM - requesting resource  364
10-Mar-2022 12:31:27 PM - reserving resource @aaaaW
10-Mar-2022 12:31:27 PM - resource @aaaaW reserved
10-Mar-2022 12:31:27 PM - granted resource  MediaID=@aaaaM;DiskVolume=PureDiskVolume;DiskPool=364;Path=PureDiskVolume;StorageServer=XXXXX364.intra.pri;MediaServer=XXXXX0364.intra.pri
10-Mar-2022 12:31:27 PM - granted resource  364
10-Mar-2022 12:31:29 PM - requesting resource  @aaaaW
10-Mar-2022 12:31:29 PM - granted resource  MediaID=@aaaaW;DiskVolume=PureDiskVolume;DiskPool=5150_Edmonton;Path=PureDiskVolume;StorageServer=XXXXXV002a.intra.pri;MediaServer=XXXXX0364.intra.pri
10-Mar-2022 12:31:30 PM - Info Duplicate (pid=13452) Initiating optimized duplication from @aaaaW to @aaaaM
10-Mar-2022 12:31:31 PM - Info bpduplicate (pid=13452) Suspend window close behavior is not supported for optimized duplications
10-Mar-2022 12:31:31 PM - Info bpduplicate (pid=13452) window close behavior: Continue processing the current image
10-Mar-2022 12:31:31 PM - Info bpdm (pid=2368) started
10-Mar-2022 12:31:31 PM - started process bpdm (pid=2368)
10-Mar-2022 12:31:32 PM - Info bpdm (pid=2368) requesting nbjm for media
10-Mar-2022 12:31:38 PM - begin writing
10-Mar-2022 12:31:38 PM - end writing; write time: 0:00:00
10-Mar-2022 12:31:39 PM - begin writing
10-Mar-2022 12:31:39 PM - end writing; write time: 0:00:00
10-Mar-2022 12:31:40 PM - begin writing
10-Mar-2022 12:31:40 PM - end writing; write time: 0:00:00
10-Mar-2022 12:32:03 PM - begin writing
10-Mar-2022 12:32:03 PM - end writing; write time: 0:00:00
10-Mar-2022 12:32:04 PM - begin writing
10-Mar-2022 12:32:04 PM - end writing; write time: 0:00:00
10-Mar-2022 12:32:05 PM - begin writing
10-Mar-2022 12:32:05 PM - end writing; write time: 0:00:00
10-Mar-2022 12:32:06 PM - begin writing
10-Mar-2022 12:32:06 PM - end writing; write time: 0:00:00
10-Mar-2022 12:32:07 PM - begin writing
10-Mar-2022 12:32:07 PM - end writing; write time: 0:00:00
10-Mar-2022 12:32:08 PM - begin writing
10-Mar-2022 12:32:08 PM - end writing; write time: 0:00:00
10-Mar-2022 12:32:09 PM - begin writing
10-Mar-2022 12:32:09 PM - end writing; write time: 0:00:00
10-Mar-2022 12:32:10 PM - begin writing
10-Mar-2022 12:32:10 PM - end writing; write time: 0:00:00
10-Mar-2022 12:32:10 PM - begin writing
10-Mar-2022 12:32:16 PM - Info bpdm (pid=2368) Waiting for storage server XXXXXV002a.intra.pri, disk volume PureDiskVolume
10-Mar-2022 12:33:20 PM - Info bpdm (pid=2368) Transferred 52665758232 bytes, to storage server XXXXXV002a.intra.pri, disk volume PureDiskVolume

Maj_Hazard1 · ‎05-18-2022

Finally resolved the issue.

After checking with the networking team for the 15th time we found that the backup traffic was being routed through a Riverbed Appliance (between the source and target msdp pools) even though there was a "valid" exclusion rule in place. Reapplied the exclusion to bypass the Riverbed and the issue disappeared.

Cheers

Mike

View solution in original post

Maj_Hazard1 · ‎03-11-2022

One thing I missed is I also see the same issue when the duplication hits 100% but never finishes..

Thanks again for your time and attention.,

Nicolai · ‎03-14-2022

Hi @Maj_Hazard1

If you have firewall in your setup, set TCPKEEP_ALIVE on master and media servers on both side of the firewall. Firewall closes ilde connections after X amount of time. This has caused issues for me with database backup until I set TCPKEEP_ALIVE. I would even say it should be part of setting up a base Netbackup install.

I am not sure your problem is related, it has a smell of it, but it is easy to test and implement and requires no reboot.

https://www.veritas.com/support/en_US/article.100028680

Just be sure to implement the settings permanent, some of the tech notes descriptions are temporary until next boot.

Maj_Hazard1 · ‎03-14-2022

Thanks so much for responding.

It's a head scratcher for sure..

I'll try those settings and see what happens.. I'll just need to find how to get that setting up to the flex 5150.. Linux would be much easier if it was Windows... I'm an old dog dealing with the complexity of new tricks..

I'll let you know if anything exciting happens.

Cheers

Maj_Hazard1 · ‎05-18-2022

Finally resolved the issue.

After checking with the networking team for the 15th time we found that the backup traffic was being routed through a Riverbed Appliance (between the source and target msdp pools) even though there was a "valid" exclusion rule in place. Reapplied the exclusion to bypass the Riverbed and the issue disappeared.

Cheers

Mike

VOX

Duplications randomly appear to hang