Forum Discussion

pnobels's avatar
pnobels
Level 3
4 years ago

Netbackup vmware backup stalls often for sharepoint servers

Hi,

we often see that vmware backups of specifically sharepoint servers seem to stall.  No error in Netbackup, it just seems to hang...  This only happens with vm's hosting Sharepoint.  It's not always but often.

The only solution there is to cancel the backup, kill the bpbkar32 process and restart the backup.  Sometimes it hangs again, sometimes it just takes the backup.  There's no real indication why and when this happens.

We backup a few hundred vm's.  The issue only shows up with vm's hosting sharepoint for some reson...

Does this ring a bell?

Today i took a dmp of the bpbkar32 process, and it seems to hang in libfs_ntfs :

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION} Breakpoint A breakpoint has been reached.

EXCEPTION_CODE_STR: 80000003

STACK_TEXT:
00000000`00f47eb0 00007ffe`ff62f9ab : 00000000`00000007 ffffffff`fffffffe 00000000`021bead0 00007fff`008fe376 : libfs_ntfs+0x42b8
00000000`00f480d0 00007fff`008f920c : 00000000`00000000 00000000`021be7d8 00000000`00000000 00007ffe`ff630423 : libfs_ntfs!rvp_map_get_extent+0x5b
00000000`00f48120 00007fff`008fa3f3 : ffffffff`ffffffff 00000000`00f483e0 00000000`020fa2b0 00000000`020fa2b0 : libos_win!rvp_map_fini+0x102dc
00000000`00f48200 00007fff`008fa715 : 00000000`00000000 00000000`020fa2b0 00000000`00f483e0 00000000`00000000 : libos_win!rvp_map_fini+0x114c3

  • Hamza_H's avatar
    Hamza_H
    4 years ago

    I think there is an EEB for this as you said it get stuck at Getextent 

    the EEB is 3973811

    you can check this :

    https://www.veritas.com/content/support/en_US/downloads/update.UPD770735

     

    extract :

    Abstract

    NetBackup 8.1.2 HotFix - VMware backups in a hung state Etrack 3973811

    Description

    Veritas Bug ID: ET 3973811

     

    Version: NetBackup 8.1.2

     

    Fix Included resolves: 

     

    VMware backups hang during getExtent call occasionally. This EEB changes assignment of a negative

    number to an unsigned variable, logs once every 10k iteration of a file, and will fail the extent

    calculation if the same offset is seen 10k times in a row.

     

    Install on: Media Server that is the VMware backup host

     

    maybe it is worthy to install this EEB and monitor next backups.

     

    Good luck :)

8 Replies

  • At what point in the backup is it hanging ? These VMs that happen to be running Sharepoint, are they a lot larger than the other VMs that always work ? Do they all happen to be located on the same ESX ? The same DataStore ? Do they have a metric ton of load on them and so may be exceeding a timeout value somewhere while waiting to be snapshoted ? Are your VM admins having to add/move these VMs around semi-regularly for reasons & interrupting your backup processes ? Any errors showing up from vCenter during the problem windows ? 

    Please post the job details to start with. 

    • pnobels's avatar
      pnobels
      Level 3

      The snapshot is taken.  

      These VMs that happen to be running Sharepoint, are they a lot larger than the other VMs that always work ? No.

      Do they all happen to be located on the same ESX ? No

      The same DataStore ? Not necessarily.  Some are , others are not.

      Do they have a metric ton of load on them and so may be exceeding a timeout value somewhere while waiting to be snapshoted ?  No.  Snapshot seems to be okay.  Looks like it is the start of the backup where it goes wrong...

      Are your VM admins having to add/move these VMs around semi-regularly for reasons & interrupting your backup processes ? No

      Any errors showing up from vCenter during the problem windows ? No

      20-jan-2021 15:40:14 - Info nbjm (pid=4884) starting backup job (jobid=5750954) for client SRV-BE-092, policy VMware_Sharepoint, schedule FULL
      20-jan-2021 15:40:15 - estimated 58433214 kbytes needed
      20-jan-2021 15:40:15 - Info nbjm (pid=4884) started backup (backupid=SRV-BE-092_1611153615) job for client SRV-BE-092, policy VMware_Sharepoint, schedule FULL on storage unit SU_SRV-080_VMware-AllDisks using backup host srv-be-080.blabla.com
      20-jan-2021 15:40:16 - started process bpbrm (pid=63944)
      20-jan-2021 15:40:17 - Info bpbrm (pid=63944) SRV-BE-092 is the host to backup data from
      20-jan-2021 15:40:17 - Info bpbrm (pid=63944) reading file list for client
      20-jan-2021 15:40:17 - Info bpbrm (pid=63944) starting bpbkar32 on client
      20-jan-2021 15:40:17 - connecting
      20-jan-2021 15:40:17 - connected; connect time: 0:00:00
      20-jan-2021 15:40:18 - Info bpbkar32 (pid=92424) Backup started
      20-jan-2021 15:40:18 - Info bpbkar32 (pid=92424) archive bit processing:<enabled>
      20-jan-2021 15:40:18 - Info bptm (pid=64188) start
      20-jan-2021 15:40:18 - Info bptm (pid=64188) using 1048576 data buffer size
      20-jan-2021 15:40:18 - Info bptm (pid=64188) setting receive network buffer to 4195328 bytes
      20-jan-2021 15:40:18 - Info bptm (pid=64188) using 512 data buffers
      20-jan-2021 15:40:19 - Info bptm (pid=64188) start backup
      20-jan-2021 15:40:19 - begin writing
      20-jan-2021 15:40:21 - Info bpbkar32 (pid=92424) INF - Backing up vCenter server srv-be-065.blabla.com, ESX host esx-be-004.blabla.com, BIOS UUID 422760ef-5f73-aba7-04ce-b7383337602e, Instance UUID 5027bad3-080b-3656-b6c5-79c2752e2afd, Display Name SRV-BE-092, Hostname SRV-BE-092.BLABLA.com

      -> stops here...

      • pats_729's avatar
        pats_729
        Level 6

        not a definate answer but .... did you verified if the VMware Tools are updated on this VM's ? If not worth to verify.

  • Do you see any exceptional CPU or Memory spike during backup on SP servers ? Or even out of backup window?

    Is this VMware policy protecting share point or share point policy?
    • pnobels's avatar
      pnobels
      Level 3

      Hi,

      there's no archived performance logging available on these boxes.  I configured some now.  Will need to check if i can simulate it.  Or wait till next week...

      Not sure what you mean with the last line.  This is a simple vmware backup of a Windows vm which happens to run a sharepoint environment.   It's not a specific sharepoint backup policy.  The netbackup policy is not aware there's a sharepoint environment on there...