cancel
Showing results for 
Search instead for 
Did you mean: 

Very slow Exchange GRT backups

lu
Level 6
Hi ! We have NBU 6.5.4 and try to run GRT backups on a W2003 SP2 R2 server, with all needed nfs patches. We can see two phases in the backup 1- very fast backup of the store via SAN client 2- very slow indexation when the NFS client/server is involved. It takes 1 to 4 hours to index the 60GB store which contains 5 mailboxes. The exchange server and the media server are connected to the same switch. If we snoop the dialog on the media server, we can see that NFS packets are sent very slowly (about 4-8 KB/s). We see that the NBU nfs server waits 1 second before sending the data (see snoop below). We also found this 1 second delay in the bpdm log (see below). Happens with basic disk and advanced disk. Any idea how to solve this problem ? TIA, Ludo. ------- nfs snoop --------- 2.24509 exchserv -> nbumedia026 NFS C READ3 FH=2283 at 1175568384 for 4096 ==> read request from the Exchange server 2.29524 nbumedia026 -> exchserv TCP D=18624 S=7394 Ack=4153401168 Seq=45132715 Len=0 Win=64240 ==> small ack sent by the media server (len=0) 3.28856 nbumedia026 -> exchserv NFS R READ3 OK (4096 bytes) ==> but data is sent 1 second after the request ! 3.28862 nbumedia026 -> exchserv TCP D=18624 S=7394 Ack=4153401168 Seq=45134175 Len=1460 Win=64240 3.28866 nbumedia026 -> exchserv TCP D=18624 S=7394 Push Ack=4153401168 Seq=45135635 Len=1308 Win=64240 ------- bpdm -------- 14:18:52.572 [13851] <2> read_data: waited for empty buffer 3 times, delayed 71 times 14:18:52.572 [13851] <2> set_restore_cntl: dmcommon.c.6963: firstblk = 0, blocks_to_skip = 0, bytes _to_skip = 0, fragnum = 0 (input parameters) 14:18:52.572 [13851] <2> read_backup: seeking to image relative block number 130967312 frag relativ e block number 130967312 to start read-blockmap ===> here is another 1 second delay 14:18:53.557 [13852] <2> send_bptm_req: [13851] bptm parent answered 0, 0, 0 14:18:53.557 [13852] <2> write_blocks: [13851] writing 2048 data blocks of 512 14:18:53.648 [13852] <2> filter_image_ifr: [13851] sending bp*m position request, curr_frag = 1, ne w_frag = 1, curr_blknum = 130969360, new_blknum = 130967216, firstblk = 130967216 14:18:53.651 [13851] <2> check_positioning: CINDEX 0 wants to skip to frag 1, firstblk 130967216, A CTIVE_GC = 1
41 REPLIES 41

bernes_stainz
Not applicable
Hi Ludo,

we`ve the same poblem

ciao bernes

lu
Level 6
Same delay seen in the bpdm logs ?

Mouse
Moderator
Moderator
Partner    VIP    Accredited Certified

I've opened couple of cases with Symantec support several months ago and didn't receive any meaningful explanation so far.
It seems that GRT technology either was not tested well or simply doesn't work as advertised.

lu
Level 6
Awesome ! With a truss you can see that they added a sleep(1) in the bpdm code ! Please Symantec, fix this code ! ... 3.5539 read(0, " X F E R B L O C K 4 2".., 21) = 21 3.5541 alarm(0) = 600 3.5542 sigaction(SIGALRM, 0xFFBFD0C0, 0x00000000) = 0 new: hand = 0x00000000 mask = 0 0 0 0 flags = 0x0000 3.5545 getpid() = 22360 [22355] 3.5546 llseek(4, 0, SEEK_END) = 1964349 3.5548 write(4, " 1 7 : 0 0 : 5 2 . 2 8 5".., 172) = 172 3.5550 nanosleep(0xFFBFE240, 0xFFBFE238) = 0 tmout: 0.000000000 sec resid: 0.000000000 sec 4.5553 nanosleep(0xFFBFE240, 0xFFBFE238) = 0 tmout: 1.000000000 sec resid: 0.000000000 sec 4.5559 kill(22355, SIG#0) = 0 4.5560 getpid() = 22360 [22355] 4.5562 llseek(4, 0, SEEK_END) = 1965001 4.5563 write(4, " 1 7 : 0 0 : 5 3 . 2 8 6".., 77) = 77 4.5566 getpid() = 22360 [22355] 4.5567 llseek(4, 0, SEEK_END) = 1965078 ...

MattS
Level 6
Hi Lu,
Thats a good find.

Right now Symantec is still working on another EEB.  Today i'm going to test with extending the SoftMountPingtimeout to 60 (in regards to NFS). I'll let you know how that goes.

Matt

lu
Level 6
Does GRT works for Exchange 2007 ? Only "alpha quality" for Exchange 2003 ?

MattS
Level 6
We are using exchange 2007 and are experiencing this issue.  See the post in the other thread for our setup details.
https://www-secure.symantec.com/connect/forums/totally-annoyed-granular-recovery-nbu-653

lu
Level 6
So sum up, the problem is found on NBU 6.5.3 or 6.5.4, Win 2k3 or Solaris 10 media server.

lu
Level 6
I wanted to remove the 1 second delay in bpdm, so I tried to do this with a LD_PRELOAD32 (Solaris 10). 1- Compile the code below with "cc -G libnanosleep.c -o libnanosleep.so" 2- Create a wrapper for bpdm using this script: #!/bin/ksh LD_PRELOAD_32=/tmp/libnanosleep.so export LD_PRELOAD_32 /usr/openv/netbackup/bin/bpdm.orig "$@" It will replace the 1 sec sleep with a 20ms one. Here are our results: - indexing time=3-4 minutes instead of 1-4 hours - but much higher I/O load on the media server, caused by the fact that 'bpdm' read data in 256k block when NFS needs only 4/8k. So there are 2 bugs in the 6.5.4 bpdm: 1- the 1 second sleep before each seek 2- it reads 256kB when NBU NFS only asks for 4kB This "patch" only addresses the 1st one.
===libnanosleep.c===
#include <stdio.h>
#include <errno.h>
#include <dlfcn.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <strings.h>

static unsigned int (*func)();

unsigned int sleep(unsigned int seconds)
{
  unsigned int retval;
  int fd;
  char buf[256];
  time_t cur_time;

  if(!func) {
    func = (unsigned int (*)()) dlsym(RTLD_NEXT, "sleep");
  }
  
  sprintf(buf, "/tmp/sleep.%d.txt", (int)getpid()); 
  fd = open(buf, O_WRONLY|O_CREAT|O_APPEND);
  time(&cur_time);
  sprintf(buf, "%.19s: sleep=%d\n", ctime(&cur_time), (int)seconds);
  write(fd, (const void *)buf, (size_t)strlen(buf));
  close(fd);

  if (seconds == 1) { usleep(20000); return 0; }
  
  retval = func(seconds);
  return retval;
}

Vapulaflo
Level 4
Hi ludo,

I've got exactly the same problem!

Vapulaflo
Level 4
Hi Lu,

Thanks for the tip to reduce the nanosleep!.... It s crazy!

lu
Level 6
Can you give more details ? (NBU version/platform, Exchange version, stores size, etc)

Vapulaflo
Level 4
We try to backup a mail store of 60 Go with 5 mailbox

The windows 2003 server is on R2 SP2 with the NFS patch.


I think  I will test your code to reduce the nanosleep time, but I should need a confirmation of a veritas or symantec guy that's working fine....

MattS
Level 6
Did anyone experiencing this issue have older versions of netbackup or backup exec previously installed on their exchange servers?  I was looking through the registry on our exchange server and I see lots of entries for the previous install of backup exec.

The reason i mention this is because netbackup uses backup exec code to backup exchange.

I guess a better question would be, are any of you working with a fresh OS, exchange install and netbackup install?

lu
Level 6
We have a fresh install of the media servers on Solaris 10, so there's no registry. We see that we have the same problem with two completely different OSes, so the problem is really in bpdm.

lu
Level 6
Just tried with the 6.5.5 bpdm binary. Still this 1 second delay... (I just installed the bpdm binary, not tried a full 6.5.5 upgrade on the media server: anybody can try with a 6.5.5 upgrade ?)

MattS
Level 6
lu i was refering to the client, the exchange server.

By the way, i have experienced this with a RHEL4 media server on a seperate exchange cluster.

lu
Level 6
> lu i was refering to the client, the exchange server. Ok ! Our exchange server was recently installed from scratch, for this test. So I think the registry is clean...

MattS
Level 6
Good to know, thanks.