Delays in VMware Backup jobs finishing after Upgra...

Heller42 · ‎12-03-2015

We are seeing delays in VMware backup jobs finishing after upgrading to Netbackup 7.7.1. We were on version 7.6.0.2 prior to the upgrade. Our master server is Solaris 10. We have 3 MSDP servers running Linux 6.7. We have two VMware backup host servers. One is running Windows Server 2012 and the other is running Linux 6.7. The VMware backups write to the MSDP servers. Any one of the three MSDP server can write to each others disk. The delay are causing other backups to fail, since they aren't able to run during their back windows. Below is the jobs details from one of the slow jobs.

12/03/2015 08:39:37 - Info nbjm (pid=12665) starting backup job (jobid=4065919) for client sql53092vm, policy Omaha_VMware_SQL, schedule Incremental
12/03/2015 08:39:38 - estimated 38973558 kbytes needed
12/03/2015 08:39:38 - Info nbjm (pid=12665) started backup (backupid=sql53092vm_1449153577) job for client sql53092vm, policy Omaha_VMware_SQL, schedule Incremental on storage unit XP53TAPE008 using backup host 53tape001.botw.ad.bankofthewest.com
12/03/2015 08:39:38 - started process bpbrm (pid=40696)
12/03/2015 08:39:40 - Info bpbrm (pid=40696) sql53092vm is the host to backup data from
12/03/2015 08:39:40 - Info bpbrm (pid=40696) reading file list for client
12/03/2015 08:39:40 - Info bpbrm (pid=40696) accelerator enabled
12/03/2015 08:39:49 - connecting
12/03/2015 08:39:49 - connected; connect time: 0:00:00
12/03/2015 08:39:49 - Info bpbrm (pid=40696) starting bpbkar on client
12/03/2015 08:39:50 - Info bptm (pid=40986) start
12/03/2015 08:39:50 - Info bpbkar (pid=2844) Backup started
12/03/2015 08:39:50 - Info bpbrm (pid=40696) bptm pid: 40986
12/03/2015 08:39:50 - Info bpbkar (pid=2844) accelerator enabled backup, archive bit processing:<disabled>
12/03/2015 08:39:50 - Info bpbkar (pid=2844) INF - Backing up vCenter server 53apps178vm, ESX host v53host030.bankofthewest.com, BIOS UUID 42045880-2f95-1fee-fa56-4f3c65d4164a, Instance UUID 50049e93-61b0-0ab2-3f5b-b141cb28f6a1, Display Name sql53092vm, Hostname SQL53092VM.botw.ad.bankofthewest.com
12/03/2015 08:39:51 - Info bptm (pid=40986) using 262144 data buffer size
12/03/2015 08:39:51 - Info bptm (pid=40986) using 30 data buffers
12/03/2015 08:39:55 - Info bptm (pid=40986) start backup
12/03/2015 08:40:25 - Info bptm (pid=40986) backup child process is pid 41677
12/03/2015 08:40:25 - begin writing
12/03/2015 08:40:34 - Info bpbkar (pid=2844) INF - Transport Type = san
12/03/2015 08:47:19 - Info bptm (pid=40986) waited for full buffer 2 times, delayed 148 times
12/03/2015 08:47:21 - Info bpbkar (pid=2844) accelerator sent 717710848 bytes out of 33181077504 bytes to server, optimization 97.8%
12/03/2015 10:10:23 - Info bptm (pid=40986) EXITING with status 0 <----------
12/03/2015 10:10:23 - Info xp53tape009 (pid=40986) StorageServer=PureDisk:xp53tape008; Report=PDDO Stats for (xp53tape008): scanned: 32406122 KB, CR sent: 142286 KB, CR sent over FC: 0 KB, dedup: 99.6%, cache disabled
12/03/2015 10:10:29 - Info bpbrm (pid=40696) validating image for client sql53092vm
12/03/2015 10:10:30 - Info bpbkar (pid=2844) done. status: 0: the requested operation was successfully completed
12/03/2015 10:10:30 - end writing; write time: 1:30:05
the requested operation was successfully completed (0)

We are seeing the below error in the bptm logs.

09:09:55.303 [26144] <16> 4065881:bptm:26144:xp53tape010: [ERROR] PDSTS: fbu_align_extents: (6130) stream_offset(48750986240) != extent.stream_offset=(48750957568)
09:09:56.701 [26144] <16> 4065881:bptm:26144:xp53tape010: [ERROR] PDSTS: fbu_align_extents: (6134) stream_offset(48755704832) != extent.stream_offset=(48755033088)
09:09:56.701 [26144] <16> 4065881:bptm:26144:xp53tape010: [ERROR] PDSTS: fbu_align_extents: (6134) stream_offset(48755712512) != extent.stream_offset=(48755033088)
09:09:56.717 [26144] <16> 4065881:bptm:26144:xp53tape010: [ERROR] PDSTS: fbu_align_extents: (6134) stream_offset(48756687872) != extent.stream_offset=(48755033088)

Has anyone experience the above issue?

sdo · ‎12-04-2015

What version is ESXi?

Marianne · ‎12-04-2015

"Any one of the three MSDP server can write to each others disk. " Never heard or seen this. MSDP pool/disk is dedicated to a single media server.

Handy NetBackup Links

cradlemtn · ‎12-04-2015

We are now on NBU 7.7.1 since 10 days. It run stable. Checking a bunch of VMware show that EXITING and job end is almost in the same second. Can not confirm this in our environment.

Is the time gap somethimg related to your client read timeout settings or other timeout values?

If yes it could be a hint for communication problems between servers.

Heller42 · ‎12-04-2015

I may not have explained the configuration correctly. We have storage units setup of each Disk Pool. Within the configuration for the storage units we have the option set to user only the following media servers. All three MSDP servers are checked, so any one of the MSDP server can function as a media server thatm can write to the storage unit.

Heller42 · ‎12-04-2015

We notice changes to the contentrouter.cfg after the upgrade. The maxfile size = 64Mib. It was set to 256Mib prior to the upgrade. The notes in the config file state that value shouldn’t be decreased.

Marianne · ‎12-04-2015

It is impossible for multiple media servers to write to each others disk. The are possibility sharing fingerprint load, nothing else. About the errors seen in bptm log, please log a Support call with Symantec. They will need level 5 logs. In the meantime, please disable shared fingerprinting and let each media server only have access to own STU and see if that makes a difference.

Handy NetBackup Links

nbutech · ‎12-04-2015

So after changing the value are you backups compeleting on time ... ?

Heller42 · ‎12-28-2015

I opened a case with Veritas. We started by reducing the storage unix max concurrent streams. It was set to unlimited. It is now set to 80. That reduced the delay. They also had me change the LockPoolSize to 2048 in the contentrouter.cfg. We didn't modify any other setting in the contentrouter.cfg file. The jobs are now finishing within their backup windows. The case is still open. They are working with engineering. We may be at the maximum number of VMware accelerator backups that can be run to a single MSDP server. It appears to be a limitation on the NetBackup side.

VOX

Delays in VMware Backup jobs finishing after Upgrading to NB 7.7.1