cancel
Showing results for 
Search instead for 
Did you mean: 

Exchange DAG backups long delays

Tomrae
Level 3

Environment:

Netbackup 7.5.0.5

Windows 2008 R2

3 Exchange 2010 servers running in a Virtual environment.

Backups start and get stuck on creating snapshots.

From the Job Detail log:

7/25/2013 4:30:54 AM - Info nbjm(pid=6724) starting backup job (jobid=1361360) for client , policy Exch2010_MBX23_3, schedule Differential-Inc 
7/25/2013 4:30:54 AM - Info nbjm(pid=6724) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1361360, request id:{7FC16BCA-64E7-43DD-8716-2955E689AEF1}) 
7/25/2013 4:30:54 AM - requesting resource App240_OST
7/25/2013 4:30:54 AM - requesting resource appusnj40.NBU_CLIENT.MAXJOBS.EXCHANGE_CLIENT
7/25/2013 4:30:54 AM - requesting resource appusnj40.NBU_POLICY.MAXJOBS.Exch2010_MBX23_3
7/25/2013 4:30:54 AM - granted resource appusnj40.NBU_CLIENT.MAXJOBS.EXCHANGE_CLIENT
7/25/2013 4:30:54 AM - granted resource appusnj40.NBU_POLICY.MAXJOBS.Exch2010_MBX23_3
7/25/2013 4:30:54 AM - granted resource DISK RESOURCE
7/25/2013 4:30:54 AM - granted resource App240_OST
7/25/2013 4:30:55 AM - estimated 99776702 Kbytes needed
7/25/2013 4:30:55 AM - begin Parent Job
7/25/2013 4:30:55 AM - begin Exchange 14 Snapshot, Step By Condition
Status 0
7/25/2013 4:30:55 AM - end Exchange 14 Snapshot, Step By Condition; elapsed time: 00:00:00
7/25/2013 4:30:55 AM - begin Exchange 14 Snapshot, Read File List
Status 0
7/25/2013 4:30:55 AM - end Exchange 14 Snapshot, Read File List; elapsed time: 00:00:00
7/25/2013 4:30:55 AM - begin Exchange 14 Snapshot, Create Snapshot
7/25/2013 4:30:55 AM - started
7/25/2013 4:30:57 AM - started process bpbrm (7468)
7/25/2013 4:31:04 AM - Info bpbrm(pid=7468) EXCHANGE_CLIENT is the host to backup data from    
7/25/2013 4:31:04 AM - Info bpbrm(pid=7468) reading file list from client       
7/25/2013 4:31:04 AM - Info bpbrm(pid=7468) start bpfis on client        
7/25/2013 4:31:04 AM - Info bpbrm(pid=7468) Starting create snapshot processing        
7/25/2013 4:31:06 AM - Info bpfis(pid=12904) Backup started          
7/25/2013 5:58:02 AM - Info bpfis(pid=12904) done. status: 0         
7/25/2013 5:58:02 AM - end Exchange 14 Snapshot, Create Snapshot; elapsed time: 01:27:07
7/25/2013 5:58:03 AM - end writing
Status 0
7/25/2013 5:58:03 AM - end Parent Job; elapsed time: 01:27:08
7/25/2013 5:58:03 AM - begin Exchange 14 Snapshot, Policy Execution Manager Preprocessed
Status 0
7/25/2013 6:10:02 AM - end Exchange 14 Snapshot, Policy Execution Manager Preprocessed; elapsed time: 00:11:59
7/25/2013 6:10:02 AM - begin Exchange 14 Snapshot, Delete Snapshot
7/25/2013 6:10:03 AM - started process bpbrm (8052)
7/25/2013 6:10:11 AM - Info bpbrm(pid=8052) Starting delete snapshot processing        
7/25/2013 6:10:11 AM - Info bpfis(pid=0) Snapshot will not be deleted       
7/25/2013 6:10:15 AM - Info bpfis(pid=8080) Backup started          
7/25/2013 6:12:00 AM - end writing
Status 0
7/25/2013 6:12:00 AM - end Exchange 14 Snapshot, Delete Snapshot; elapsed time: 00:01:58
Status 0
7/25/2013 6:12:00 AM - end operation
the requested operation was successfully completed(0)

 

I can see files building up in the C:\Veritas\NetBackup\online_util\fi_cntl folder that match the number of log files for that database.  However, it is getting slower.  Rebooting the Exchange clients help for a day or two.  The backup job does not start until the snapshot completes, but this is taking hours when it should be minutes.

There was a similar post for this but I have tried all the suggestions in the post but nothing seems to help:

https://www-secure.symantec.com/connect/forums/exchange-snapshot-takes-2-hours#comment-8771041

I am pointing to the Passive copy and I have disabled consistancy checks, but this does not seem to help.

 

14 REPLIES 14

Mark_Solutions
Level 6
Partner Accredited Certified

With all thing slow please check your anti-virus software and make sure all NetBackup directories and processes are excluded from scanning / access protection

Also when it is running take a look at task manager to see how the server is handling CPU and Memory to see if there are any clues there as to why it is slowing down.

There was a bug fix for the bpresolver being slow that is included in 7.5.0.6 but it does look like you are getting past that phase?

 

Tomrae
Level 3

Anti-virus is disabled.  The server seems to be fine when the backups are running.

Actually, I loaded 7.5.0.6 on the clients.

Mark_Solutions
Level 6
Partner Accredited Certified

NetBackup will use VSS to create the snapshots for exchange - if that process is very slow then it is most likely a Microsoft issue - worth checking for any roll-up packages for VSS for the servers and looking at the VSS settings / free disk space etc.

Dyneshia
Level 6
Employee

Quick question, is this an exchange backup or vmware with exchage application protection backup?

Tomrae
Level 3

This is an Exchange backup.

Will_Restore
Level 6

did you follow Jakob's final post in the link ?   (emphasis added)

 

Hi,

Oups I forgot to mention the probably most important change we made in our setup.
We limited the number of simultanious streams for each of the DAG members to three and found that it run must faster.

In your case it should be enough set the maximum number of streams per policy to a low number (maybe 3 or 4). For a DAG with an even distribution of MBs across the DAG members we found that it was necesarry to do it at NBU client level (client attributes on master server properties). Otherwise the first member could easily takes up all the streams for the policy and you were back to square one.

You need to know a little more about the underlying storage array used for the Exchange databases. If it is using the same total number of spindles for the LUNs as before then you are definately creating contention by running 5 x the number of read streams for the same number of harddisk readheads.

--jakob;

Tomrae
Level 3

I did see his post.  I had a long call with Microsoft regarding this since Symantec believed there was an issue with VSS.  The conslusion was VSS is working fine.  We ran a backup using the built-in Windows Server Backup and it took 8 minutes.  The same database backup on NBU took 52 minutes.

I do think, I have identified the source of my problem.  The issue seems to be just after VSS makes a snapshot, Netbackup then waits for temp files to build up in the C:\Program Files\Veritas\NetBackup\online_util\fi_cntl folder.  The ratio is one temp file per log file in Exchange.  The problem is we get about 10,000 log files in a given database.  The temp files are created at a rate of about 3-5 per second.  Thus, it takes 40 minutes to build all the files.  Once this is done, the backup starts.  When the backup finishes, the temp files are cleared and the snapshot is removed.

Does this sound right?  Does everyone see these temp files?  Is there a way to stop this behavior?  On the Windows Backup, the backup starts immediately after the snapshot is created.  It almost looks like it is building up an index or something.

 

 

watsson
Not applicable
Is this issue appearing even if you run a full backup of database, or only if you execute an incremental backup?

Tomrae
Level 3

This happens on both Full and Differential backups.

Tomrae
Level 3

I have an update.  These temp files are required as per Symantec.  It looks like we may have a piece of Security software which is scanning files as they are created.  We removed this software from one of our Exchange servers and the backups are running fine.  The others that still have this software (Bit9 parity agent) are running slow:

Bit9 - creates 10-20 files per second

No Bit9 - creates 500 files per second

I will be testing my hypothesis over the week-end...

CRZ
Level 6
Employee Accredited Certified

This sounds like something you would want to open a case for and either confirm that this is the way it works, or find out if you're hitting a defect and require 7.5.0.6 or an EEB (or both).

Dip
Level 4

I am also having same issue. Taking snapshot during Full backups takes up to 3 hours in some DBs.

Tomrae
Level 3

As I thought, we had a local security program that was causing long delays when creating these meta-files.  From my conversations with Symantec, I found that my thoughts regarding the NBU Exchange 2010 backup process are correct.  There is a one-to-one relationship between an Exchange log file and a metafile that is created in the C:\Veritas\NetBackup\online_util\fi_cntl folder.

We ran tests on disk I/O latency and found the disk itself was fine, all delays were coming from the OS.  This is what lead us to look at local programs.  In this investigation, we removed anti-virus and a security program called Bit9 and backups started working fine.

If you experience this in your environment, look for anything this would slow disk writes or look for high disk latency.

Toddman214
Level 6

I may be experiencing this same issue (sorry for rehashing an older thread.) Myself and our Exchange guy have been beating our heads against this issue. As I type this, I have an Exchange backup that started at 4:31pm cst, and as of 9:31pm cst (5 hours), the snapshot is still running! The total backup time for a full backup from start to status 0 averages around 22 hours. Then, once it finishes, the VSS writer will sit in a "waiting for completion" state for many more hours. By this time, an entire day has passed and the Incrememental backup gets murdered with status 130, stating that all of the databases are "not frozen." Very frustrating!  The average full backup for each of the Exchange database servers is about 3.1tb (we have two identical servers, and the other exhibits very similar behavior). The location C:\Veritas\NetBackup\online_util\fi_cntl contains 700,000+ metafiles right now. We use Sophos Antivirus, and I checked the current exclusion list, but I'm not quite sure what to add in there.

Tomrea,

So, are you saying that in your situation, the antivirus was scanning each of the metafiles as they were created? Did you add the location C:\Veritas\NetBackup\online_util\fi_cntl folder to the antivirus exclusion list? I'll check with the Exchange admin, but I believe disk i/o was one thing we've already checked.

 

Thanks all,

Todd