03-17-2015 11:23 AM
This issue still exists in 7.6.0.4. In fact we never started having the problem until our Backup Team applied Maintenance Release 7.6.0.4. Now the problem is cropping up.
https://www-secure.symantec.com/connect/forums/vmware-snapshots-being-locked-netbackup-7602-upgrade
03-17-2015 01:30 PM
To see whats going on lets look at the logs. Please reply with the detailed status log.
Also What does the Vcenter logs say? Look at the Task & Events tab of the VM having the issue.
03-18-2015 07:30 AM
Tasks & Events indicate disk consolidation fails after backup job indicates it removed snapshot.
We sent VMware logs. To this point they have told us this:
On 6th, we could see a command to create snapshot, which was completed successfully ( This created a snapshot)
2015-03-06T11:04:26.161Z| vmx| SnapshotVMX_TakeSnapshot start: 'NBU_SNAPSHOT VM_NAME 1425639864', deviceState=0, logging=0, quiesced=0, native=0, sibling=0 cb=18F40770, cbData=19CCFD10
2015-03-06T11:04:48.259Z| vcpu-0| SnapshotVMXTakeSnapshotComplete done with snapshot 'NBU_SNAPSHOT radfa0231p11170 1425639864': 1030
-- We then saw the command to consolidate. We could see that the consolidation was done, But with errors.
2015-03-06T11:21:00.260Z| vmx| SnapshotVMX_Consolidate: starting
2015-03-06T11:23:35.516Z| vcpu-0| Vix: [2222072 vigorCommands.c:577]: VigorSnapshotManagerConsolidateCallback: snapshotErr = Could not open/create change tracking file (5:83C)
2015-03-06T11:23:35.516Z| vcpu-0| Turning off snapshot info cache.
2015-03-06T11:23:35.526Z| vcpu-0| Turning off snapshot disk cache.
2015-03-06T11:23:35.526Z| vcpu-0| SnapshotVMXConsolidateOnlineCB: Done with consolidate
-- Following down the logs, we could see multiple attempts to consolidate, But all the attempt was successful with errors.
-- This is because the CTK files was getting corrupted.
-- At the moment, we don't know the reason why the CTK files are getting corrupted.
-- We would need more time to analyze this and discuss it internally.
We are trying to see if the Backup Team has any status logs. They keep saying there are no logs.
Right now what we have observed and believe to be happening is as follows:
At some point a NetBackup job will fail attempting to consolidate a VMware
snapshot after completing a backup. I think most everyone is aware that
NetBackup takes a snapshot of a VM prior to starting a backup job on the VM.
When the backup job completes NetBackup should delete and consolidate that
snapshot. NetBackup also creates a change tracking file for every virtual
disk on a VM. This change tracking file is leveraged for NetBackup's
incremental backup process. Change tracking files should be a 1 to 1
relationship for every virtual disk, meaning for every virtual disk there
should be one, and only one, change tracking file associated with a virtual
disk.
Once the Netbackup job fails to consolidate the snapshot, that VM is in a
state that puts it, the datastore it resides on, as well as other VMs
residing on that datastore, at risk for several problems. The worst case
scenario being the VM reaches a point where the only way to make it
functional again is to restore the entire VM from backup.
Among other things, we are currently engaged with the vendors right now to
try and determine what exactly causes the consolidation to fail, what puts
the VM in the state it ends up in, and what can we do, if anything, to
remediate it to prevent it from continuing to happen.
The scenario we are finding seems to be that once the snapshot consolidation
fails, Netbackup can no longer backup the VM. The VM, although shows in the
vCenter settings that it has no active snapshots, the virtual disks are
actually running on snapshot virtual disks. When NetBackup attempts to
backup the VM the next time, the backup job fails indicating it was unable
to take a snapshot. However, the backup job actually does take a snapshot
and the VM virtual disk is now running on that new snapshot disk. The
previous snapshot disk still exists as well, as does the VM's original
virtual disk. In addition, each one of these snapshot virtual disks has an
associated change tracking file and delta file. The VM, however, still
believes it has no snapshots, or at least vCenter believes that. In
addition, when the VM is in this state if we attempt to manually take a
snapshot we find it is successfully and vCenter thinks the VM now has a
snapshot, and the VM virtual disk is now running on the new snapshot disk.
The other snapshot disk still remain, as does the original virtual disk.
When we manually delete the snapshot that was manually created, the snapshot
deletes, or so vCenter indicates. However, when deleting the snapshot
another snapshot disk is actually created and the VM's virtual disk is now
running on the new snapshot disk. All other snapshot disks still remain, as
does the original virtual disk. After the snapshot has been manually deleted
we then attempt to manually perform a consolidation. The consolidation
fails. However, another snapshot disk gets created and now the VM's virtual
disk is running on the new snapshot disk. All other snapshot disks still
remain, as does the original virtual disk.
So, for example:
VM is running on original virtual disk - vm-flat.vmdk. (vm.vmdk; vm-ctk.vmdk
should exist as well)
NetBackup takes snapshot. VM is running (writing) on vm-00001-delta.vmdk.
(vm-00001.vmdk & vm-00001-ctk.vmdk should exist as well). All above
mentioned disks/files still exist.
NetBackup completes, deletes snapshot, consolidation fails. VM is running
(writing) on vm-00002-delta.vmdk. (vm-00002.vmdk & vm-00002-ctk.vmdk should
exist as well). All above mentioned disks/files still exist.
Netbackup attempts backup, fails indicating cannot take snapshot. . VM is
running (writing) on vm-00003-delta.vmdk. (vm-00003.vmdk & vm-00003-ctk.vmdk
should exist as well). All above mentioned disks/files still exist.
Netbackup attempts backup, fails indicating cannot take snapshot. . VM is
running (writing) on vm-00004-delta.vmdk. (vm-00004.vmdk & vm-00004-ctk.vmdk
should exist as well). All above mentioned disks/files still exist.
Netbackup attempts backup, fails indicating cannot take snapshot. . VM is
running (writing) on vm-00005-delta.vmdk. (vm-00005.vmdk & vm-00005-ctk.vmdk
should exist as well). All above mentioned disks/files still exist.
A manual snapshot is successfully taken. VM is running (writing) on
vm-00006-delta.vmdk. (vm-00006.vmdk & vm-00006-ctk.vmdk should exist as
well). All above mentioned disks/files still exist.
The manually taken snapshot is (or appears to be) manually deleted
successfully. VM is running (writing) on vm-00007-delta.vmdk. (vm-00007.vmdk
& vm-00007-ctk.vmdk should exist as well). All above mentioned disks/files
still exist.
A disk consolidation is manually attempted and fails. VM is running
(writing) on vm-00008-delta.vmdk. (vm-00008.vmdk & vm-00008-ctk.vmdk should
exist as well). All above mentioned disks/files still exist.
Keep in mind if a VM has 2, 3, 4, etc. original virtual disks, the above
process happens for every one of those virtual disks. We have seen a VM to
show no active snapshots in vCenter but yet have 70 virtual disk files and
associated files when it really has only 4 actual virtual disks.
If the above described steps/process actually all completed successfully and
as expected, what we should see after is:
VM is running on original virtual disk - vm-flat.vmdk. (vm.vmdk; vm-ctk.vmdk
should exist as well)
09-23-2016 06:53 AM
hello , did you manage to find the solution to the disk consolution and netbackup issue?
09-23-2016 08:13 AM