Re: VMWare snapshots being locked by NetBackup since 7.6.0.2 upgrade.

This issue still exists in 7.6.0.4. In fact we never started having the problem until our Backup Team applied Maintenance Release 7.6.0.4. Now the problem is cropping up.

https://www-secure.symantec.com/connect/forums/vmware-snapshots-being-locked-netbackup-7602-upgrade

 

 

4 Replies
Highlighted

To see whats going on lets

To see whats going on lets look at the logs. Please reply with the detailed status log. 

Also What does the Vcenter logs say? Look at the Task & Events tab of the VM having the issue.

Highlighted

Tasks & Events indicate disk

Tasks & Events indicate disk consolidation fails after backup job indicates it removed snapshot.

We sent VMware logs. To this point they have told us this:

On 6th, we could see a command to create snapshot, which was completed successfully ( This created a snapshot)

2015-03-06T11:04:26.161Z| vmx| SnapshotVMX_TakeSnapshot start: 'NBU_SNAPSHOT VM_NAME 1425639864', deviceState=0, logging=0, quiesced=0, native=0, sibling=0 cb=18F40770, cbData=19CCFD10
2015-03-06T11:04:48.259Z| vcpu-0| SnapshotVMXTakeSnapshotComplete done with snapshot 'NBU_SNAPSHOT radfa0231p11170 1425639864': 1030

-- We then saw the command to consolidate. We could see that the consolidation was done, But with errors.

2015-03-06T11:21:00.260Z| vmx| SnapshotVMX_Consolidate: starting


2015-03-06T11:23:35.516Z| vcpu-0| Vix: [2222072 vigorCommands.c:577]: VigorSnapshotManagerConsolidateCallback: snapshotErr = Could not open/create change tracking file (5:83C)

2015-03-06T11:23:35.516Z| vcpu-0| Turning off snapshot info cache.

2015-03-06T11:23:35.526Z| vcpu-0| Turning off snapshot disk cache.

2015-03-06T11:23:35.526Z| vcpu-0| SnapshotVMXConsolidateOnlineCB: Done with consolidate

-- Following down the logs, we could see multiple attempts to consolidate, But all the attempt was successful with errors.
-- This is because the CTK files was getting corrupted.
-- At the moment, we don't know the reason why the CTK files are getting corrupted.
-- We would need more time to analyze this and discuss it internally.

 

We are trying to see if the Backup Team has any status logs. They keep saying there are no logs. 
 

Right now what we have observed and believe to be happening is as follows:

 

At some point a NetBackup job will fail attempting to consolidate a VMware

snapshot after completing a backup. I think most everyone is aware that

NetBackup takes a snapshot of a VM prior to starting a backup job on the VM.

When the backup job completes NetBackup should delete and consolidate that

snapshot. NetBackup also creates a change tracking file for every virtual

disk on a VM. This change tracking file is leveraged for NetBackup's

incremental backup process. Change tracking files should be a 1 to 1

relationship for every virtual disk, meaning for every virtual disk there

should be one, and only one, change tracking file associated with a virtual

disk.

 

Once the Netbackup job fails to consolidate the snapshot, that VM is in a

state that puts it, the datastore it resides on, as well as other VMs

residing on that datastore, at risk for several problems. The worst case

scenario being the VM reaches a point where the only way to make it

functional again is to restore the entire VM from backup.

 

Among other things, we are currently engaged with the vendors right now to

try and determine what exactly causes the consolidation to fail, what puts

the VM in the state it ends up in, and what can we do, if anything, to

remediate it to prevent it from continuing to happen.

 

The scenario we are finding seems to be that once the snapshot consolidation

fails, Netbackup can no longer backup the VM. The VM, although shows in the

vCenter settings that it has no active snapshots, the virtual disks are

actually running on snapshot virtual disks. When NetBackup attempts to

backup the VM the next time, the backup job fails indicating it was unable

to take a snapshot. However, the backup job actually does take a snapshot

and the VM virtual disk is now running on that new snapshot disk. The

previous snapshot disk still exists as well, as does the VM's original

virtual disk. In addition, each one of these snapshot virtual disks has an

associated change tracking file and delta file. The VM, however, still

believes it has no snapshots, or at least vCenter believes that. In

addition, when the VM is in this state if we attempt to manually take a

snapshot we find it is successfully and vCenter thinks the VM now has a

snapshot, and the VM virtual disk is now running on the new snapshot disk.

The other snapshot disk still remain, as does the original virtual disk.

When we manually delete the snapshot that was manually created, the snapshot

deletes, or so vCenter indicates. However, when deleting the snapshot

another snapshot disk is actually created and the VM's virtual disk is now

running on the new snapshot disk. All other snapshot disks still remain, as

does the original virtual disk. After the snapshot has been manually deleted

we then attempt to manually perform a consolidation. The consolidation

fails. However, another snapshot disk gets created and now the VM's virtual

disk is running on the new snapshot disk. All other snapshot disks still

remain, as does the original virtual disk.

 

So, for example:

 

VM is running on original virtual disk - vm-flat.vmdk. (vm.vmdk; vm-ctk.vmdk

should exist as well)

NetBackup takes snapshot. VM is running (writing) on vm-00001-delta.vmdk.

(vm-00001.vmdk & vm-00001-ctk.vmdk should exist as well). All above

mentioned disks/files still exist.

NetBackup completes, deletes snapshot, consolidation fails. VM is running

(writing) on vm-00002-delta.vmdk. (vm-00002.vmdk & vm-00002-ctk.vmdk should

exist as well). All above mentioned disks/files still exist. 

Netbackup attempts backup, fails indicating cannot take snapshot. . VM is

running (writing) on vm-00003-delta.vmdk. (vm-00003.vmdk & vm-00003-ctk.vmdk

should exist as well). All above mentioned disks/files still exist. 

Netbackup attempts backup, fails indicating cannot take snapshot. . VM is

running (writing) on vm-00004-delta.vmdk. (vm-00004.vmdk & vm-00004-ctk.vmdk

should exist as well). All above mentioned disks/files still exist.

Netbackup attempts backup, fails indicating cannot take snapshot. . VM is

running (writing) on vm-00005-delta.vmdk. (vm-00005.vmdk & vm-00005-ctk.vmdk

should exist as well). All above mentioned disks/files still exist.

A manual snapshot is successfully taken. VM is running (writing) on

vm-00006-delta.vmdk. (vm-00006.vmdk & vm-00006-ctk.vmdk should exist as

well). All above mentioned disks/files still exist.

The manually taken snapshot is (or appears to be) manually deleted

successfully. VM is running (writing) on vm-00007-delta.vmdk. (vm-00007.vmdk

& vm-00007-ctk.vmdk should exist as well). All above mentioned disks/files

still exist.

A disk consolidation is manually attempted and fails. VM is running

(writing) on vm-00008-delta.vmdk. (vm-00008.vmdk & vm-00008-ctk.vmdk should

exist as well). All above mentioned disks/files still exist.

 

Keep in mind if a VM has 2, 3, 4, etc. original virtual disks, the above

process happens for every one of those virtual disks. We have seen a VM to

show no active snapshots in vCenter but yet have 70 virtual disk files and

associated files when it really has only 4 actual virtual disks.

 

If the above described steps/process actually all completed successfully and

as expected, what we should see after is:

VM is running on original virtual disk - vm-flat.vmdk. (vm.vmdk; vm-ctk.vmdk

should exist as well)

Highlighted

Re: Tasks & Events indicate disk

hello , did you manage to find the solution to the disk consolution and netbackup issue? 

 

Highlighted

Re: Tasks & Events indicate disk

@YMar if you are experiencing a similar issue, rather start a new discussion and tell us as much as possible about your environment - NBU version, VMware/vCenter versions, troubleshooting done up to now, etc.

This post is more than 18 months old and NBU and VMware have both seen many version upgrades.