β06-08-2012 08:23 AM
I have a couple of Windows clients (one hosting virtual machines, one is a SQL server) that are not backing up successfully. The backup jobs kick off, they run fine for some amount of time, usually a couple or more hours, then the job fails with a status 156 error (job details for the most recent failure for one of the clients follows). I am not familiar with snapshots and how they work in NetBackup, how would I approach this problem?
06/07/2012 20:26:01 - Info nbjm (pid=3861) starting backup job (jobid=1805018) for client bknhicodchv05, policy NHIC-HOST, schedule Full
06/07/2012 20:26:01 - Info nbjm (pid=3861) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1805018, request id:{86630874-B100-11E1-BEEB-1CC1DE1EF662})
06/07/2012 20:26:01 - requesting resource bkbugatti-hcart-robot-tld-0
06/07/2012 20:26:01 - requesting resource bkferrari.NBU_CLIENT.MAXJOBS.bknhicodchv05
06/07/2012 20:26:01 - requesting resource bkferrari.NBU_POLICY.MAXJOBS.NHIC-HOST
06/07/2012 20:26:01 - awaiting resource bkbugatti-hcart-robot-tld-0. No drives are available.
06/07/2012 20:35:39 - awaiting resource bkbugatti-hcart-robot-tld-0. Maximum job count has been reached for the storage unit.
06/07/2012 20:35:54 - awaiting resource bkbugatti-hcart-robot-tld-0. No drives are available.
06/07/2012 20:36:31 - awaiting resource bkbugatti-hcart-robot-tld-0. Maximum job count has been reached for the storage unit.
06/07/2012 20:36:35 - awaiting resource bkbugatti-hcart-robot-tld-0. No drives are available.
06/07/2012 20:43:31 - awaiting resource bkbugatti-hcart-robot-tld-0. Maximum job count has been reached for the storage unit.
06/07/2012 20:44:41 - awaiting resource bkbugatti-hcart-robot-tld-0. Waiting for resources.
Reason: Drives are in use, Media server: bkbugatti,
Robot Type(Number): TLD(0), Media ID: N/A, Drive Name: N/A,
Volume Pool: NHIC, Storage Unit: bkbugatti-hcart-robot-tld-0, Drive Scan Host: N/A,
Disk Pool: N/A, Disk Volume: N/A
...
06/07/2012 22:32:24 - awaiting resource bkbugatti-hcart-robot-tld-0. Maximum job count has been reached for the storage unit.
06/07/2012 22:50:10 - awaiting resource bkbugatti-hcart-robot-tld-0. Waiting for resources.
Reason: Drives are in use, Media server: bkbugatti,
Robot Type(Number): TLD(0), Media ID: N/A, Drive Name: N/A,
Volume Pool: NHIC, Storage Unit: bkbugatti-hcart-robot-tld-0, Drive Scan Host: N/A,
Disk Pool: N/A, Disk Volume: N/A
06/07/2012 22:51:58 - granted resource bkferrari.NBU_CLIENT.MAXJOBS.bknhicodchv05
06/07/2012 22:51:58 - granted resource bkferrari.NBU_POLICY.MAXJOBS.NHIC-HOST
06/07/2012 22:51:58 - granted resource NH4347
06/07/2012 22:51:58 - granted resource HP.ULTRIUM4-SCSI.020
06/07/2012 22:51:58 - granted resource bkbugatti-hcart-robot-tld-0
06/07/2012 22:51:58 - estimated 2018993995 kbytes needed
06/07/2012 22:51:58 - Info nbjm (pid=3861) started backup job for client bknhicodchv05, policy NHIC-HOST, schedule Full on storage unit bkbugatti-hcart-robot-tld-0
06/07/2012 22:51:59 - started process bpbrm (pid=25184)
06/07/2012 22:52:22 - connecting
06/07/2012 22:52:29 - connected; connect time: 0:00:00
06/07/2012 22:52:32 - mounting NH4347
06/07/2012 22:53:38 - mounted NH4347; mount time: 0:01:06
06/07/2012 22:53:38 - positioning NH4347 to file 2
06/07/2012 22:55:11 - positioned NH4347; position time: 0:01:33
06/07/2012 22:55:11 - begin writing
06/07/2012 22:57:50 - Info bpbrm (pid=25184) bknhicodchv05 is the host to backup data from
06/07/2012 22:57:50 - Info bpbrm (pid=25184) reading file list from client
06/07/2012 22:57:57 - Info bpbrm (pid=25184) starting bpbkar on client
06/07/2012 22:57:59 - Info bpbkar (pid=0) Backup started
06/07/2012 22:57:59 - Info bpbrm (pid=25184) bptm pid: 25240
06/07/2012 22:58:00 - Info bptm (pid=25240) start
06/07/2012 22:58:00 - Info bptm (pid=25240) using 65536 data buffer size
06/07/2012 22:58:00 - Info bptm (pid=25240) using 30 data buffers
06/07/2012 22:58:00 - Info bptm (pid=25240) start backup
06/07/2012 22:58:00 - Info bptm (pid=25240) backup child process is pid 25244
06/07/2012 22:58:00 - Info bptm (pid=25240) Waiting for mount of media id NH4347 (copy 1) on server bkbugatti.
06/07/2012 22:59:06 - Info bptm (pid=25240) media id NH4347 mounted on drive index 13, drivepath /dev/rmt/15cbn, drivename HP.ULTRIUM4-SCSI.020, copy 1
06/08/2012 00:25:29 - end writing; write time: 1:30:18
06/08/2012 00:29:32 - Error bpbrm (pid=25184) from client bknhicodchv05: ERR - failure reading file: H:\NHICODMGMT06\NHICODMGMT06\Snapshots\7EF3FA58-B9A2-40CE-8B0A-FAD7341AE60D\NHICODMGMT06-D_D2E986C7-6A9C-4B8D-98A2-8EE89C9BEB2F.avhd (WIN32 2: The system cannot find the file specified. )
06/08/2012 00:29:33 - Error bpbrm (pid=25184) from client bknhicodchv05: ERR - Snapshot Error while reading file: GLOBALROOT\Device\HarddiskVolumeShadowCopy151\NHICODMGMT06\NHICODMGMT06\Snapshots\7EF3FA58-B9A2-40CE-8B0A-FAD7341AE60D\NHICODMGMT06-D_D2E986C7-6A9C-4B8D-98A2-8EE89C9BEB2F.avhd
06/08/2012 00:29:33 - Critical bpbrm (pid=25184) from client bknhicodchv05: FTL - Backup operation aborted!
06/08/2012 00:29:35 - Error bptm (pid=25240) media manager terminated by parent process
06/08/2012 00:30:54 - Error bpbrm (pid=25184) could not send server status message
06/08/2012 00:30:57 - Info bpbkar (pid=0) done. status: 156: snapshot error encountered
snapshot error encountered (156)
β06-08-2012 08:41 AM
For open file backup make sure the client is configured to use VSS.
You can find the setting under Client Attributes of Master Server Host properties.
β06-08-2012 08:57 AM
Error 156 means that NetBackup had a problem with the creation of the snapshot.
Check bpfis logs, in the client, to troubleshoot it. If it is VMware backup, the bpfis process is in the vmware backup host. You will find the cause in those logs.
β06-08-2012 12:18 PM
I added these servers to the client list and set them to use VSS as you described. Will the NBU clients on those servers need to be restarted for this setting to take effect?
β06-08-2012 01:41 PM
There is a good number of reason for a VM snapshot to fail, please check the official list and let us know if you find anything that can might help.
http://www.symantec.com/business/support/index?page=content&id=HOWTO44492
Regards.
β06-09-2012 02:35 PM
Neither the client nor the master server services need to be restarted for this to take effect
β06-10-2012 05:21 AM
Can U check the bpbkar logs on the client...it might help u.
Also check whehter Antivirus files are accessing at the time of backup.
β06-10-2012 07:32 AM
Please help us to understand better :
How is backup job configured - NBU client software installed on the VM or doing VM backup via backup host?
Exact NBU version on Master, media server, backup host and/or client?
OS version on master, media server backup host and/or client?
Please show us policy config. On master, run following command from cmd:
(....\veritas\netbackup\bin\admincmd) > bppllist NHIC-HOST -L
β06-13-2012 06:07 AM
I added these two clients under the Client Attributes section as you mentioned and made sure they were set to VSS. One of the two seems to be backing up fine now (the SQL server), the other is still getting the status 156 error (Windows VM host server).
β06-13-2012 06:24 AM
If this is a NetBackup Client installed on a server hosted on VM then the VMWare tools interact with the snapsot creation / VSS
If there are apps running on the client then edit the
C:\ProgramData\VMware\VMware Tools\tools.conf
add the followng section:
[vmbackup]
vss.disableAppQuiescing = true
This then exclude appliaction quiescing during a file system backup
Hope this helps
β06-13-2012 06:46 AM
The server that is continuing to get the 156 errors is a Windows server acting as a VM host (not a guest VM). The NBU client is installed on the host. Server appears to back up fine until it reaches the following two files:
1 - H:\NHICODMGMT06\NHICODMGMT06\Snapshots\7EF3FA58-B9A2-40CE-8B0A-FAD7341AE60D\NHICODMGMT06-D_D2E986C7-6A9C-4B8D-98A2-8EE89C9BEB2F.avhd
2 - GLOBALROOT\Device\HarddiskVolumeShadowCopyxx\NHICODMGMT06\NHICODMGMT06\Snapshots\7EF3FA58-B9A2-40CE-8B0A-FAD7341AE60D\NHICODMGMT06-D_D2E986C7-6A9C-4B8D-98A2-8EE89C9BEB2F.avhd (the number represented by xx in "HarddiskVolumeShadowCopyxx" varies between backup jobs, but the .avhd file and remainder of the path are the same).
Master server info: SUN OS 5.10, NBU 7.1.0.3
Media server info: SUN OS 5.10, NBU 7.1.0.2
Client: Windows Server 2008, NBU client 7.0
In which directory on a UNIX server would \veritas\netbackup\bin\admincmd reside?
β06-13-2012 07:44 AM
/usr/openv/netbackup/bin/admincmd/
On the VMHost try running:
bpfis query
That shows all of the snapshots created and their ID (xxxxx)
You can then run:
bpfis delete -id xxxxx
To clear it out - see if that helps
β06-13-2012 11:10 AM
Policy Name: NHIC-HOST
Options: 0x0
template: FALSE
audit_reason: ?
Names: (none)
Policy Type: MS-Windows (13)
Active: yes
Effective date: 02/03/1991 14:25:54
Client Compress: no
Follow NFS Mnts: no
Backup netwrk drvs:no
Collect TIR info: no
Mult. Data Stream: no
Perform Snapshot Backup: no
Snapshot Method: (none)
Snapshot Method Arguments: (none)
Perform Offhost Backup: no
Backup Copy: 0
Use Data Mover: no
Data Mover Type: -1
Use Alternate Client: no
Alternate Client Name: (none)
Use Virtual Machine: 0
Hyper-V Server Name: (none)
Enable Instant Recovery: no
Policy Priority: 1000
Max Jobs/Policy: Unlimited
Disaster Recovery: 0
Collect BMR Info: no
Keyword: <mseo>KeyGroup=NHICGRP;KeyType=aes256;compress=none;</mseo>
Data Classification: -
Residence is Storage Lifecycle Policy: no
Client Encrypt: no
Checkpoint: no
Residence: bkbugatti-hcart-robot-tld-0
Volume Pool: NHIC
Server Group: *ANY*
Granular Restore Info: no
Exchange Source attributes: no
Exchange 2010 Preferred Server: (none defined)
Application Discovery: no
Discovery Lifetime: 0 seconds
Generation: 40
Ignore Client Direct: no
Client/HW/OS/Pri: bknhicodchv05 Windows-x64 Windows2008 0 0 0 0 ?
Client/HW/OS/Pri: bknhicodchv06 Windows-x64 Windows2008 0 0 0 0 ?
Client/HW/OS/Pri: bknhicodchv07 Windows-x64 Windows2008 0 0 0 0 ?
Client/HW/OS/Pri: bknhicodchv08 Windows-x64 Windows2008 0 0 0 0 ?
Client/HW/OS/Pri: bknhicodchv09 Windows-x64 Windows2008 0 0 0 0 ?
Include: ALL_LOCAL_DRIVES
Schedule: Full
Type: FULL (0)
Frequency: 1 day(s) (86400 seconds)
Maximum MPX: 1
Synthetic: 0
PFI Recovery: 0
Retention Level: 10 (4 months)
u-wind/o/d: 0 0
Incr Type: DELTA (0)
Alt Read Host: (none defined)
Max Frag Size: 0 MB
Number Copies: 1
Fail on Error: 0
Residence: (specific storage unit not required)
Volume Pool: (same as policy volume pool)
Server Group: (same as specified for policy)
Residence is Storage Lifecycle Policy: 0
Daily Windows:
Day Open Close W-Open W-Close
Sunday 018:00:00 030:00:00 018:00:00 030:00:00
Monday 018:00:00 030:00:00 042:00:00 054:00:00
Tuesday 018:00:00 030:00:00 066:00:00 078:00:00
Wednesday 018:00:00 030:00:00 090:00:00 102:00:00
Thursday 018:00:00 030:00:00 114:00:00 126:00:00
Friday 018:00:00 030:00:00 138:00:00 150:00:00
Saturday 018:00:00 030:00:00 162:00:00 174:00:00 006:00:00
Schedule: Full-monthly
Type: FULL (0)
Calendar sched: Enabled
Day 1 of month
Maximum MPX: 1
Synthetic: 0
PFI Recovery: 0
Retention Level: 8 (1 year) 9 (infinity)
u-wind/o/d: 0 0
Incr Type: DELTA (0)
Alt Read Host: (none defined)
Max Frag Size: 0 MB
Number Copies: 2
Fail on Error: 0 0
Residence: bkbugatti-hcart-robot-tld-0 bkbugatti-hcart-robot-tld-0
Volume Pool: NHIC NHIC
Server Group: *ANY* *ANY*
Residence is Storage Lifecycle Policy: 0
Daily Windows:
Day Open Close W-Open W-Close
Sunday 000:00:00 168:00:00 000:00:00 168:00:00
Monday 000:00:00 000:00:00
Tuesday 000:00:00 000:00:00
Wednesday 000:00:00 000:00:00
Thursday 000:00:00 000:00:00
Friday 000:00:00 000:00:00
Saturday 000:00:00 000:00:00
β06-13-2012 01:14 PM
Looking at your policy output: Seems you are NOT using a Windows backup host to perform VMware backups ?
Your policy type is Ms-Windows which points to 'normal' client backup.
NBU agent (Client software) installed in each guest VM?
Why 7.0 on W2008 clients when rest of environment is 7.1?
Which version of W2008 on all of these clients listed in policy?
Are all of them failing with 156? (job details in opening post shows bknhicodchv05).