cancel
Showing results for 
Search instead for 
Did you mean: 

VMWare snapshots being locked by NetBackup since 7.6.0.2 upgrade.

Chi_Chris_2012
Level 2

Since upgrading from 7.5.0.6 to 7.6.0.2 and our appliances from 2.5.3 to 2.6.2 we have been experiencing sporadic status 156 errors on some VM based backups.  The issue seems to be that NetBackup  locks the snapshot down and does not come back and deletes it.  Since the snapshot isn't being delete the netbackup doesn't release the lock and therefore the disk cannot be consolidated.  This causes the next day's backups to fail with status 156.

Our VMWare backup policies go to a single 5230 appliance and run early evening and finish early morning.  However, hours after the jobs finish and zero jobs are running to the appliance there are still  bpbkar processes running that correlate to the previous night's backup jobs .

If I don't kill these process then the disk wont consolidate and causes some of the next night's VMware backups to fail.  The only way to VM admin's to consolidate the disks is for me to kill the pids associated with the hung bpbkar process.

 

Below are the job details of one of the failed jobs:

 

7/21/2014 10:41:56 AM - Info nbjm(pid=30146) starting backup job (jobid=1798854) for client mp1cucm-sub03, policy VMQUERY_CISCO, schedule DIFF-2WK 
7/21/2014 10:41:56 AM - Info nbjm(pid=30146) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1798854, request id:{8B81D818-10ED-11E4-B49E-B151A9591D42}) 
7/21/2014 10:41:56 AM - requesting resource stu_disk_mpnbuapp02
7/21/2014 10:41:56 AM - requesting resource mpnbu01.NBU_CLIENT.MAXJOBS.mp1cucm-sub03
7/21/2014 10:41:56 AM - requesting resource mpnbu01.VMware.Datastore.chivmvc03/Cisco/Application Datastore
7/21/2014 10:41:56 AM - granted resource mpnbu01.NBU_CLIENT.MAXJOBS.mp1cucm-sub03
7/21/2014 10:41:56 AM - granted resource mpnbu01.VMware.Datastore.chivmvc03/Cisco/Application
7/21/2014 10:41:56 AM - granted resource MediaID=@aaaah;DiskVolume=PureDiskVolume;DiskPool=dp_disk_mpnbuapp02;Path=PureDiskVolume;StorageServ...
7/21/2014 10:41:56 AM - granted resource stu_disk_mpnbuapp02
7/21/2014 10:41:56 AM - estimated 13991947 Kbytes needed
7/21/2014 10:41:56 AM - begin Parent Job
7/21/2014 10:41:56 AM - begin Application Snapshot, Step By Condition
Status 0
7/21/2014 10:41:56 AM - end Application Snapshot, Step By Condition; elapsed time: 0:00:00
7/21/2014 10:41:56 AM - begin Application Snapshot, Read File List
Status 0
7/21/2014 10:41:56 AM - end Application Snapshot, Read File List; elapsed time: 0:00:00
7/21/2014 10:41:56 AM - begin Application Snapshot, Create Snapshot
7/21/2014 10:41:56 AM - started process bpbrm (21073)
7/21/2014 10:41:56 AM - started
7/21/2014 10:41:57 AM - Info bpbrm(pid=21073) mp1cucm-sub03 is the host to backup data from    
7/21/2014 10:41:57 AM - Info bpbrm(pid=21073) reading file list for client       
7/21/2014 10:41:57 AM - Info bpbrm(pid=21073) start bpfis on client        
7/21/2014 10:41:57 AM - Info bpbrm(pid=21073) Starting create snapshot processing        
7/21/2014 10:41:57 AM - Info bpfis(pid=21082) Backup started          
7/21/2014 10:41:57 AM - snapshot backup of client mp1cucm-sub03 using method VMware_v2
7/21/2014 10:42:14 AM - Info bpbrm(pid=21073) INF - vmwareLogger: WaitForTaskCompleteEx: Unable to access file <unspecified filename> since it is locked <232>
7/21/2014 10:42:14 AM - Info bpbrm(pid=21073) INF - vmwareLogger: WaitForTaskCompleteEx: SYM_VMC_ERROR:  TASK_REACHED_ERROR_STATE     
7/21/2014 10:42:14 AM - Info bpbrm(pid=21073) INF - vmwareLogger: ConsolidateVMDisks: SYM_VMC_ERROR:  TASK_REACHED_ERROR_STATE     
7/21/2014 10:42:14 AM - Info bpbrm(pid=21073) INF - vmwareLogger: ConsolidateVMDisksAPI: SYM_VMC_ERROR:  TASK_REACHED_ERROR_STATE     
7/21/2014 10:42:15 AM - Critical bpbrm(pid=21073) from client mp1cucm-sub03: FTL - VMware_freeze: VIXAPI freeze (VMware snapshot) failed with 36: SYM_VMC_TASK_REACHED_ERROR_STATE
7/21/2014 10:42:15 AM - Critical bpbrm(pid=21073) from client mp1cucm-sub03: FTL - VMware error received: Unable to access file <unspecified filename> since it is locked
7/21/2014 10:42:15 AM - Critical bpbrm(pid=21073) from client mp1cucm-sub03: FTL - vfm_freeze: method: VMware_v2, type: FIM, function: VMware_v2_freeze
7/21/2014 10:42:15 AM - Critical bpbrm(pid=21073) from client mp1cucm-sub03: FTL -       
7/21/2014 10:42:15 AM - Critical bpbrm(pid=21073) from client mp1cucm-sub03: FTL - vfm_freeze: method: VMware_v2, type: FIM, function: VMware_v2_freeze
7/21/2014 10:42:15 AM - Critical bpbrm(pid=21073) from client mp1cucm-sub03: FTL -       
7/21/2014 10:42:15 AM - Critical bpbrm(pid=21073) from client mp1cucm-sub03: FTL - snapshot processing failed, status 156  
7/21/2014 10:42:15 AM - Critical bpbrm(pid=21073) from client mp1cucm-sub03: FTL - snapshot creation failed, status 156  
7/21/2014 10:42:15 AM - Warning bpbrm(pid=21073) from client mp1cucm-sub03: WRN - ALL_LOCAL_DRIVES is not frozen   
7/21/2014 10:42:15 AM - Info bpfis(pid=21082) done. status: 156         
7/21/2014 10:42:15 AM - end Application Snapshot, Create Snapshot; elapsed time: 0:00:19
7/21/2014 10:42:15 AM - Info bpfis(pid=21082) done. status: 156: snapshot error encountered      
7/21/2014 10:42:15 AM - end writing
Status 156
7/21/2014 10:42:15 AM - end Parent Job; elapsed time: 0:00:19
7/21/2014 10:42:15 AM - begin Application Snapshot, Stop On Error
Status 0
7/21/2014 10:42:15 AM - end Application Snapshot, Stop On Error; elapsed time: 0:00:00
7/21/2014 10:42:15 AM - begin Application Snapshot, Resources For Cleanup
Status -9999
7/21/2014 10:42:15 AM - end Application Snapshot, Resources For Cleanup; elapsed time: 0:00:00
7/21/2014 10:42:15 AM - begin Application Snapshot, Delete Snapshot
7/21/2014 10:42:15 AM - started process bpbrm (21152)
7/21/2014 10:42:16 AM - Info bpbrm(pid=21152) Starting delete snapshot processing        
7/21/2014 10:42:16 AM - Info bpfis(pid=21161) Backup started          
7/21/2014 10:42:16 AM - Critical bpbrm(pid=21152) from client mp1cucm-sub03: FTL - cannot open /usr/openv/netbackup/online_util/fi_cntl/bpfis.fim.mp1cucm-sub03_1405957316.1.0    
7/21/2014 10:42:16 AM - Info bpfis(pid=21161) done. status: 4207         
7/21/2014 10:42:16 AM - end Application Snapshot, Delete Snapshot; elapsed time: 0:00:01
7/21/2014 10:42:16 AM - Info bpfis(pid=21161) done. status: 4207: Could not fetch snapshot metadata or state files 
7/21/2014 10:42:16 AM - end writing
Status 4207
7/21/2014 10:42:16 AM - end operation
Status 156
7/21/2014 10:42:16 AM - end operation
snapshot error encountered(156)

 

 

Has anyone come across this before? Any help would be greatly appreciated.

 

Thanks,

 

Chris

1 ACCEPTED SOLUTION

Accepted Solutions

CEvensen
Level 2
Employee Accredited

This bug has been patched for 7.6.0.2 and 7.6.0.3. The fix will be included in the 7.6.0.4 release of NBU, and only effects Linux backup hosts. (does not apply to windows)

Please see:

http://www.symantec.com/docs/TECH223790

*PLEASE NOTE*

If you are running an NBU appliance, there is a second patch that needs to be applied along with the regular bpbkar EEB. This smaller EEB (see TN above for list of patches - 1 set for 2.6.0.2, and 1 set for 2.6.0.3 appliances) needs to be applied before the larger EEB.

 

Best regards,

 

Chris Evensen

View solution in original post

11 REPLIES 11

mnolan
Level 6
Employee Accredited Certified

I do believe there might be a reoccurance of http://www.symantec.com/docs/TECH181118
I believe you can make recommended changes to prevent it. Killing the hung bpbkar processes (which are holding onto the lock files) should allow consolidation.
 

INT_RND
Level 6
Employee Accredited

Try updating the VMWare tools inside the clients. The new version of Netbackup might be expecting new features or behavior from the API that is not gracefully degrading with the older version.

V4
Level 6
Partner Accredited

What does bpfis says. pls post bpfis log

Also can you try changing policy to hostname instead of display (using display name) and disable quiece and test ). Also give a try to below TN .. they refer to similar issue which you are facing..(although not particularly for 7.6.x but still applicable for NBU 7.x

http://www.symantec.com/docs/TECH137001
http://www.symantec.com/docs/TECH166561

 

Try restarting vmware tools inside guest vm

Verify VSS are stable inside Guest

Increase snapshot timeouts on vmware backuphost using following registry

HKEY_LOCAL_MACHINE >SOFTWARE> VERITAS > NETBACKUP> CURRENTVERSION > CONFIG>BACKUP

We hope your VM are not using shared bus as vmware doesn't support snapshots for shared vm bus. 

Unable to use Snapshots or perform a backup on virtual machines configured with bus-sharing
http://kb.vmware.com/kb/1006392

VMware does not support snapshots of virtual machines configured for bus-sharing.  This means the virtual machine cannot be protecting using the vStorage API.

For such VM you would need to install NBU agent inside guest vm and back it up as if it was physical host.

Let us know how it goes

Chi_Chris_2012
Level 2

Thank you all for you suggestions.  I had opened a ticket with support and they say this is a known bug in 7. 6 and the fix will be included in the 7.6.0.3 release.  In the meantime they suggested the following work around which we were pretty much doing to circumvent the issue:

This workaround requires a time window when  it is safe (when no VMware backups should be running) to end all bpbkar processes so that the hung bpbkar processes can be stopped.

In order to apply the workaround the process requires editing files and the possible creation of files. This should always be done with care.

 

1. We need to create or edit the "virtualization.conf" file. This file is located in the following directory:

 

"/usr/openv/netbackup/"

 

Remember, if this file is not there, it needs to be created.

 

2. Next we need to add the following to the contents of the "virtualization.conf" file:

 

[BACKUP]

"InfiniteWaitForBpbkar"=dword:00000000

"WaitDiskLeaseRelease"=dword:12c

 

3. Next a cron job will need to be created to end the bpbkar processes. WARNING: THIS CRON JOB NEEDS TO RUN DURING A TIME PERIOD WHEN NO VMWARE BACKUPS SHOULD BE RUNNING.

 

The following needs to be added on it's own line and must run after any other cron jobs.

 

0 7 * * * /usr/bin/killall -9 bpbkar

 

IMPORTANT NOTE: The above line will perform this task every day at 7:00AM. This needs to be changed to reflect the agreed upon time when no VMWare backups should be running.

mnolan
Level 6
Employee Accredited Certified

Hi,

 

This is exactly what I stated in my response above, minus the targeted version for a fix + cron to kill bpbkar.

Gautier_Leblanc
Level 5
Partner Accredited

Hi all,

 

I have the same issue with 7.6.0.3 :\

CEvensen
Level 2
Employee Accredited

This bug has been patched for 7.6.0.2 and 7.6.0.3. The fix will be included in the 7.6.0.4 release of NBU, and only effects Linux backup hosts. (does not apply to windows)

Please see:

http://www.symantec.com/docs/TECH223790

*PLEASE NOTE*

If you are running an NBU appliance, there is a second patch that needs to be applied along with the regular bpbkar EEB. This smaller EEB (see TN above for list of patches - 1 set for 2.6.0.2, and 1 set for 2.6.0.3 appliances) needs to be applied before the larger EEB.

 

Best regards,

 

Chris Evensen

Gautier_Leblanc
Level 5
Partner Accredited

I applied this patch on my appliance (5230 on 2.6.0.3), but one of my virtual machine still have this issue... 

CEvensen
Level 2
Employee Accredited

Gautier,

 

If one of the Vm's is still exhibiting an issue, it is being caused by something other than what is fixed in this EEB. What symptoms does the single VM exhibit? Is bpbkar hung? What is the failure?

Gautier_Leblanc
Level 5
Partner Accredited

Hello,

 

I'll put logfiles here tomorrow, but I have not them now.

Marianne
Level 6
Partner    VIP    Accredited Certified

Please start a new discussion for your issue.

The issue in this discussion has been resolved by the workaround supplied by Support to this user.