cancel
Showing results for 
Search instead for 
Did you mean: 

Very slow Exchange GRT backups

lu
Level 6
Hi ! We have NBU 6.5.4 and try to run GRT backups on a W2003 SP2 R2 server, with all needed nfs patches. We can see two phases in the backup 1- very fast backup of the store via SAN client 2- very slow indexation when the NFS client/server is involved. It takes 1 to 4 hours to index the 60GB store which contains 5 mailboxes. The exchange server and the media server are connected to the same switch. If we snoop the dialog on the media server, we can see that NFS packets are sent very slowly (about 4-8 KB/s). We see that the NBU nfs server waits 1 second before sending the data (see snoop below). We also found this 1 second delay in the bpdm log (see below). Happens with basic disk and advanced disk. Any idea how to solve this problem ? TIA, Ludo. ------- nfs snoop --------- 2.24509 exchserv -> nbumedia026 NFS C READ3 FH=2283 at 1175568384 for 4096 ==> read request from the Exchange server 2.29524 nbumedia026 -> exchserv TCP D=18624 S=7394 Ack=4153401168 Seq=45132715 Len=0 Win=64240 ==> small ack sent by the media server (len=0) 3.28856 nbumedia026 -> exchserv NFS R READ3 OK (4096 bytes) ==> but data is sent 1 second after the request ! 3.28862 nbumedia026 -> exchserv TCP D=18624 S=7394 Ack=4153401168 Seq=45134175 Len=1460 Win=64240 3.28866 nbumedia026 -> exchserv TCP D=18624 S=7394 Push Ack=4153401168 Seq=45135635 Len=1308 Win=64240 ------- bpdm -------- 14:18:52.572 [13851] <2> read_data: waited for empty buffer 3 times, delayed 71 times 14:18:52.572 [13851] <2> set_restore_cntl: dmcommon.c.6963: firstblk = 0, blocks_to_skip = 0, bytes _to_skip = 0, fragnum = 0 (input parameters) 14:18:52.572 [13851] <2> read_backup: seeking to image relative block number 130967312 frag relativ e block number 130967312 to start read-blockmap ===> here is another 1 second delay 14:18:53.557 [13852] <2> send_bptm_req: [13851] bptm parent answered 0, 0, 0 14:18:53.557 [13852] <2> write_blocks: [13851] writing 2048 data blocks of 512 14:18:53.648 [13852] <2> filter_image_ifr: [13851] sending bp*m position request, curr_frag = 1, ne w_frag = 1, curr_blknum = 130969360, new_blknum = 130967216, firstblk = 130967216 14:18:53.651 [13851] <2> check_positioning: CINDEX 0 wants to skip to frag 1, firstblk 130967216, A CTIVE_GC = 1
41 REPLIES 41

lu
Level 6
More info: this bug also slows down restores... :-( I have started a granular restore, and it took 10 minutes for 25MB. I 'trussed' bpdm during the restore and also saw the 1 second sleeps. With the LD_PRELOAD32 hack, the restore took 2 minutes. Please, Symantec, when the final release of GRT will be available :)

CRZ
Level 6
Employee Accredited Certified
This definitely looks like the sort of thing that should get escalated to the developers for a real answer (or fix)!

I have a hunch there's a "really good reason" that delay was coded in, but obviously I have no clue what it is.  I'd hate for your modifications to come back and bite you, though, so I *strongly* recommend opening a case to get an officially supported EEB replacement... if that's what it ends up being.

lu
Level 6
Yes... still waiting :(

lu
Level 6
Yes setting SoftMountPingtimeout to 60 may avoid an error 1 during the indexing phase. But it does not speed up the indexing.

MattS
Level 6
True, and for the record it did not help our issue.

Good news is that the last test we performed Symantec had me add a touch file (only works with the  provided EEB) and the GRT backup only took a few minutes longer than the non-grt backups.  It still ended in a status 1 but it seems to be a step in the right direction.  At least my test jobs wont last 24 hours...

lu
Level 6
Is it /usr/openv/netbackup/db/config/nbfsd_enableDirect ?

MattS
Level 6
Thats the one.  Support get you that EEB? If so hows it work for you?

lu
Level 6
It seems to work :) The indexing is very fast because the nfs server directly opens the image on disk instead of using bpdm with its 1 second delays everywhere... No error 1 so far.

lu
Level 6
Warning !!! With this EEB duplications of Exchange backups are broken ! Avoid duplications or your NBU server may be stuck running "image cleanups" during hours !

lu
Level 6
Still waiting... The latest EEB created problems with Duplications...

RecklessTrippy
Not applicable
Lu - could you post the EEB file name or case number to reference to at least fixed part one of the problem.  I'll worry about the duplications later - I still rather have my GRT and I can't seam to get a hold of the EEB for this.

lu
Level 6
EEB 1712608 + touch /usr/openv/netbackup/db/config/nbfsd_enableDirect => faster indexing but BIG problems with duplications (catalog corruption).

MattS
Level 6
Looks we are using the same EEB Lu. 
Reckless, if you want the EEB for 6.5.5 try and ask for 1928803.1 (that should be a recompiled version of 1712608.9)

Matt

MattS
Level 6
Reckless,

I forgot to mention that this EEB does not solve my particular issue with GRT backups ending in a status 1 when the mailstore is larger than 60-90GB (could never narrow down at what size the problem shows up).
Though it has cut our testing time down to just a few minutes longer than a normal non-grt backup.  Previously the job would run for days!

Matt

Roger_C
Level 4
Employee

This fix has been in effect and it does resolve the issue regarding the backup performance,

Engineering/Tech Support are fully aware of the side effect with SLP - however, the details from original site that reported are so generic that it would foolish to suggest anything such as corruption at this stage, I will update this thread in a few days time to connect the link with SLP as there will be some development on this SLP v ET1712608.8

RogC

lu
Level 6
We have tried the EEB 1712608.10 and the backup is fast but the duplication to tape is still awful (catalog corruption).

MattS
Level 6
My tests with the latest EEB have been successfull for single large mail stores.  But when try to backup more than one at a time (Microsoft Information Store:\*) it usually ends with a status 1 again.  I even changed the policy to only allow 2 jobs to run at the same time, but i still ended up with random status 1s and backup jobs running for 20+ hours.

So, semi solved here. 

lu
Level 6
> But when try to backup more than one at a time (Microsoft Information Store:\*) it usually ends with a status 1 again. Thanks ! Good to know ! Maybe the media server is able to provide via NFS only one image to index at the same time ?

Roger_C
Level 4
Employee
Apologies for the delay in updating in all but it has been confirmed
that ET1712608.10 can be safely applied to environments without causing any
side effects.

The only caveat/precuation is that when running GRT Image Duplications you should
implement "GRANULAR_DUP_RECURSION = 0" in the bp.conf. This disables the catloging of
GRT images.

Full coverage of this flag is documented in this tech note.
http://support.veritas.com/docs/317302

In summary ET1712608.10 + GRANULAR_DUP_RECURSION (if dup'ing GRT images) = effective quicker backups.

It has also been proven that the same ET1712608.10 can resolve SharePoint GRT performance
related issues.

We're currently documenting all of this and will provide an official resoultion to this problem.

Rog C

lu
Level 6
Yes, I can confirm that with ET1712608.10 + GRANULAR_DUP_RECURSION=0, we started to have GRT backups working and duplication to tape was ok. Of course the duplicated image to tape, needs to be copied back to disk to have GRT browsing. Is there a fix for "errors 1" when the Exchange stores are backed-up in parallel (multiplexing enabled) ? Is it related to the fact that only one NFS mount is allowed on the Exchange client at one time ?