cancel
Showing results for 
Search instead for 
Did you mean: 

When Netbackup starts on file server, DFS Replication sores from 0 to 600,000 items in backlog

bbot
Level 4

When Netbackup starts on file server, DFS Replication sores from 0 to 600,000 items in backlog

- NetBackup Server version of Master/Media Server(s): 7.6.0.1 and 7.6.0.1
- NetBackup Client version on File Server: 7.6.0.1
- OS and patch level (SP/CU) of File Server: Windows Server 2012 R2 Datacenter

Backup set is about 12 TB.

Has anyone experienced this issue?

16 REPLIES 16

sdo
Moderator
Moderator
Partner    VIP    Certified

How many items in total in the DFSR store?

bbot
Level 4

1.1million items and roughly 5 TB of files in one of the policies

There are a total of 4 policies that run on this server for a total of 12 TB.

nbutech
Level 6
Accredited Certified

Check the tecnotes below and decide if you want to upgrade to 7.6.0.3 or 7.6.0.4

 

When using NetBackup to protect Windows 2012 NTFS Data Deduplication volumes, restored DFSR (Distributed File System Replication) data may be corrupt.

 

Article URL:http://www.veritas.com/docs/000021355

 

DFSR backups leave a file in temp at the end of every job

 

Wiriadi_Wangsa
Level 4
Employee Accredited

Hi Bbot,

You might want to patch the NBU client to 7.6.0.4 as it is the most stable version for DFSR backup (for 7.6.0.x family). 

It's nice if you can patch the master and media server too, but you don't have too - they can stay at 7.6.0.1. But the NetBackup in DFSR server really needs to be at version 7.6.0.4.

Here is another reason why you need to patch: https://www.veritas.com/support/en_US/article.TECH210794.

 

 

 

 

sdo
Moderator
Moderator
Partner    VIP    Certified

Hi bbot, you have highlighted one symptom, but so far we only know this as a cosmetic symptom.

1) When a backup runs, the DFSR backlog soars to 600k items.

.

Some questions:

2) When it soars, does it climb in a linear fashion, or does it simply jump from 0 to 600k?

3) When the backup finishes, does it drop immediately from 600k back to zero (or some other small figure)?   Or does it remain at 600k?

4) Are you suggesting that the action of a backup having occured causes DFSR to have re-evaluate the entire DFSR file set and re-process/re-check all DFSR items?

.

Whenever I am faced with what appears to be strange behaviour with a backup jobs iteraction with other products, I tend to look in the EEB guide as one of my first steps.   Can I suggest that you also peruse the NetBackup v7.6.x.x EEB guide to search for any known issues with NetBackup's interaction with DFSR which have been resolved since your version of v7.6.0.1.   The EEB guide is here:

Symantec NetBackup 7.6 Emergency Engineering Binary Guide​

http://www.veritas.com/docs/000003552

Marianne
Level 6
Partner    VIP    Accredited Certified
If we look at the best practice TN for DFSR backups : http://www.veritas.com/docs/000095710 we see the following : NetBackup utilizes the Microsoft VSS DFSR Writer to consistently snap the volumes hosting DFSR data. Replication is momentarily paused while the snapshot is being created. The DFS Replication service remains started and is aware of all USN Journal updates. Once the snapshot is complete, replication resumes and the backup of data can commence. Do you have bpfis log to see what is happening in the snapshot phase? Any in Windows Event Viewer logs that looks like VSS problem?

bbot
Level 4

@nbutech Thanks for the links. I am definitely going to look into upgrading the file server to 7.6.0.4.

@sdo

2) When it soars, does it climb in a linear fashion, or does it simply jump from 0 to 600k?

I've only seen this one where I turned off the backup for about 4 days to allow dfs replication to go down to 0, then turned it back on. The next morning when the backup completed, it was at 600,000.

3) When the backup finishes, does it drop immediately from 600k back to zero (or some other small figure)?   Or does it remain at 600k?

It does not drop to 0, but it does go to a lower figure. It went down to around 400,000 by the end of the day.

4) Are you suggesting that the action of a backup having occured causes DFSR to have re-evaluate the entire DFSR file set and re-process/re-check all DFSR items?

I think that may be a possibility that it is causing it to re-process/re-check. 600,000 item changes seems high for one day. We have about 1800 employees that use this file server, so it also may be normal to have that many changes. Perhaps I may stop the backup and let the dfs replication finish over the weekend, then rerun it over the weekend to see what the backlog is. (our offices are closed on weekends)

 

@marianne

Our diferential backups do take about 8 hours to complete (monday-thurs), and the full(ran on friday) takes about 35 hours to complete. If replication is momentarily paused, it could be possible that this server doesn't have enough time to process both the backups and DFS.

Nothing indicative in event viewer for VSS related problems.

sdo
Moderator
Moderator
Partner    VIP    Certified

To have 400,000 items change, or indeed to see 600,000 items change... makes me wonder if NetBackup and/or some other process/tool/script/task/job is 'touching' or updating the 'dates' of files in some way so as to cause DSFR to believe that the files have been modified.

1) Could you show us the output of:

Windows:  > bpgetconfig -M mydfsrclient.company.com | findstr /i "file_access ctime"

Unix:     # bpgetconfig -M mydfsrclient.company.com | egrep   -i "file_access|ctime"

...and:

2) Are you aware whether there is a site specific process which forcibly resets permissions/inheritance, on a regular basis?

3) Do you have Enterprise Vault scheduled to archive files?

4) Is there another different large set of files which are being robo-copy synchronized in to the DFSR shares?

Will_Restore
Level 6

Upgrade to at least 7.6.0.2 (might as well go to 7.6.0.4)

https://www.veritas.com/support/en_US/article.TECH212807

= = = = = = = =

Also verify this setting for each DFSR Client:

Host Properties > Clients > DFRS_CLIENT > Windows Client > Incrementals > Based on timestamp

sdo
Moderator
Moderator
Partner    VIP    Certified

Now I wish that I had also personally checked the EEB guide, and taken my own advice.  ;)

Mark_Solutions
Level 6
Partner Accredited Certified

Unless i have missed it i haven't seen confirmation that you are backing up via the Shadow Copy Components and not via drive letters / paths.

Could you confirm that please .. i just wondered as you said you had more that one policy to back it all up but maybe just using different SCC paths.

If using drive paths then i assume you would have to use bpstart_notify to shut down DFSR before th ebackup starts .. which would account for the growing back log  ... then bpend_notify when the backup finishes to start it up again leaving it to work through that huge queue.

sdo
Moderator
Moderator
Partner    VIP    Certified

Good point Mark.   And, if the bpstart_notify and bpend_notify scripts have not been written very carefully then the config/setup could be taking DFSR offline and online by different streams at different times and so causing all manner of up/down havoc whilst backups are running.

.

Personally I think the OP just needs to change to not update file atime, and use file ctime.

.

@bbot - can you show us each of the four policies (as text file attachments please!   please don't paste into the thread ;)

bppllist -L <policy-name>

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Maybe have another look at the best practice TN for DFSR backups : http://www.veritas.com/docs/000095710 
and let us know which option is being used in your environment?

sdo
Moderator
Moderator
Partner    VIP    Certified

@bbot - what did you do in the end?  resolved now?

bbot
Level 4

@sdo - I just updated to 7.6.0.4 last night. I'm waiting for our dfs backlog to go down to 0, then will re-run the job to see what happens. I'll update this in 1-2 days. Thanks!

Will_Restore
Level 6

(four months later) So how is it going?