Re: When Netbackup starts on file server, DFS Repl...

bbot · ‎02-19-2016

When Netbackup starts on file server, DFS Replication sores from 0 to 600,000 items in backlog

- NetBackup Server version of Master/Media Server(s): 7.6.0.1 and 7.6.0.1
- NetBackup Client version on File Server: 7.6.0.1
- OS and patch level (SP/CU) of File Server: Windows Server 2012 R2 Datacenter

Backup set is about 12 TB.

Has anyone experienced this issue?

sdo · ‎02-19-2016

How many items in total in the DFSR store?

bbot · ‎02-19-2016

1.1million items and roughly 5 TB of files in one of the policies

There are a total of 4 policies that run on this server for a total of 12 TB.

nbutech · ‎02-19-2016

Article URL:http://www.veritas.com/docs/000021355

Wiriadi_Wangsa · ‎02-19-2016

Hi Bbot,

You might want to patch the NBU client to 7.6.0.4 as it is the most stable version for DFSR backup (for 7.6.0.x family).

It's nice if you can patch the master and media server too, but you don't have too - they can stay at 7.6.0.1. But the NetBackup in DFSR server really needs to be at version 7.6.0.4.

Here is another reason why you need to patch: https://www.veritas.com/support/en_US/article.TECH210794.

sdo · ‎02-20-2016

Hi bbot, you have highlighted one symptom, but so far we only know this as a cosmetic symptom.

1) When a backup runs, the DFSR backlog soars to 600k items.

.

Some questions:

2) When it soars, does it climb in a linear fashion, or does it simply jump from 0 to 600k?

3) When the backup finishes, does it drop immediately from 600k back to zero (or some other small figure)? Or does it remain at 600k?

4) Are you suggesting that the action of a backup having occured causes DFSR to have re-evaluate the entire DFSR file set and re-process/re-check all DFSR items?

.

Whenever I am faced with what appears to be strange behaviour with a backup jobs iteraction with other products, I tend to look in the EEB guide as one of my first steps. Can I suggest that you also peruse the NetBackup v7.6.x.x EEB guide to search for any known issues with NetBackup's interaction with DFSR which have been resolved since your version of v7.6.0.1. The EEB guide is here:

Symantec NetBackup 7.6 Emergency Engineering Binary Guide

http://www.veritas.com/docs/000003552

Marianne · ‎02-20-2016

If we look at the best practice TN for DFSR backups : http://www.veritas.com/docs/000095710 we see the following : NetBackup utilizes the Microsoft VSS DFSR Writer to consistently snap the volumes hosting DFSR data. Replication is momentarily paused while the snapshot is being created. The DFS Replication service remains started and is aware of all USN Journal updates. Once the snapshot is complete, replication resumes and the backup of data can commence. Do you have bpfis log to see what is happening in the snapshot phase? Any in Windows Event Viewer logs that looks like VSS problem?

Handy NetBackup Links

bbot · ‎02-21-2016

@nbutech Thanks for the links. I am definitely going to look into upgrading the file server to 7.6.0.4.

@sdo

2) When it soars, does it climb in a linear fashion, or does it simply jump from 0 to 600k?

I've only seen this one where I turned off the backup for about 4 days to allow dfs replication to go down to 0, then turned it back on. The next morning when the backup completed, it was at 600,000.

3) When the backup finishes, does it drop immediately from 600k back to zero (or some other small figure)? Or does it remain at 600k?

It does not drop to 0, but it does go to a lower figure. It went down to around 400,000 by the end of the day.

4) Are you suggesting that the action of a backup having occured causes DFSR to have re-evaluate the entire DFSR file set and re-process/re-check all DFSR items?

I think that may be a possibility that it is causing it to re-process/re-check. 600,000 item changes seems high for one day. We have about 1800 employees that use this file server, so it also may be normal to have that many changes. Perhaps I may stop the backup and let the dfs replication finish over the weekend, then rerun it over the weekend to see what the backlog is. (our offices are closed on weekends)

@marianne

Our diferential backups do take about 8 hours to complete (monday-thurs), and the full(ran on friday) takes about 35 hours to complete. If replication is momentarily paused, it could be possible that this server doesn't have enough time to process both the backups and DFS.

Nothing indicative in event viewer for VSS related problems.

sdo · ‎02-22-2016

To have 400,000 items change, or indeed to see 600,000 items change... makes me wonder if NetBackup and/or some other process/tool/script/task/job is 'touching' or updating the 'dates' of files in some way so as to cause DSFR to believe that the files have been modified.

1) Could you show us the output of:

Windows:  > bpgetconfig -M mydfsrclient.company.com | findstr /i "file_access ctime"

Unix:     # bpgetconfig -M mydfsrclient.company.com | egrep   -i "file_access|ctime"

...and:

2) Are you aware whether there is a site specific process which forcibly resets permissions/inheritance, on a regular basis?

3) Do you have Enterprise Vault scheduled to archive files?

4) Is there another different large set of files which are being robo-copy synchronized in to the DFSR shares?

Will_Restore · ‎02-22-2016

Upgrade to at least 7.6.0.2 (might as well go to 7.6.0.4)

https://www.veritas.com/support/en_US/article.TECH212807

= = = = = = = =

Also verify this setting for each DFSR Client:

Host Properties > Clients > DFRS_CLIENT > Windows Client > Incrementals > Based on timestamp

sdo · ‎02-22-2016

Now I wish that I had also personally checked the EEB guide, and taken my own advice. ;)

Mark_Solutions · ‎02-23-2016

Unless i have missed it i haven't seen confirmation that you are backing up via the Shadow Copy Components and not via drive letters / paths.

Could you confirm that please .. i just wondered as you said you had more that one policy to back it all up but maybe just using different SCC paths.

If using drive paths then i assume you would have to use bpstart_notify to shut down DFSR before th ebackup starts .. which would account for the growing back log ... then bpend_notify when the backup finishes to start it up again leaving it to work through that huge queue.

sdo · ‎02-23-2016

Good point Mark. And, if the bpstart_notify and bpend_notify scripts have not been written very carefully then the config/setup could be taking DFSR offline and online by different streams at different times and so causing all manner of up/down havoc whilst backups are running.

.

Personally I think the OP just needs to change to not update file atime, and use file ctime.

.

@bbot - can you show us each of the four policies (as text file attachments please! please don't paste into the thread ;)

bppllist -L <policy-name>

Marianne · ‎02-23-2016

Maybe have another look at the best practice TN for DFSR backups : http://www.veritas.com/docs/000095710
and let us know which option is being used in your environment?

Handy NetBackup Links

sdo · ‎02-25-2016

@bbot - what did you do in the end? resolved now?

bbot · ‎02-25-2016

@sdo - I just updated to 7.6.0.4 last night. I'm waiting for our dfs backlog to go down to 0, then will re-run the job to see what happens. I'll update this in 1-2 days. Thanks!

Will_Restore · ‎07-08-2016

(four months later) So how is it going?

VOX

When Netbackup starts on file server, DFS Replication sores from 0 to 600,000 items in backlog