Re: 4204: Incompatible client found

ianhoskins · ‎03-01-2016

I have some VMs failing with the error 4204: Incompatible client found.

The final status error is 6.
If the job is re-run it runs fine no issues.

vCenter is clean, no errors, no orphaned snapshots

Thoughts?


03/01/2016 00:02:02 - Info nbjm (pid=2896) starting backup job (jobid=1162384) for client NTINFASGP01, policy g_cml_prod_vm_zone4_w02, schedule Daily_Inc_Mon-Fri_0000
03/01/2016 00:03:09 - Info bpbrm (pid=1431) NTINFASGP01 is the host to backup data from
03/01/2016 00:03:09 - Info bpbrm (pid=1431) reading file list for client
03/01/2016 00:03:09 - Info bpbrm (pid=1431) starting bpbkar on client
03/01/2016 00:03:09 - Info bpbkar (pid=1434) Backup started
03/01/2016 00:03:09 - Info bpbrm (pid=1431) bptm pid: 1435
03/01/2016 00:03:09 - estimated 7376359 kbytes needed
03/01/2016 00:03:09 - Info nbjm (pid=2896) started backup (backupid=NTINFASGP01_1456808589) job for client NTINFASGP01, policy g_cml_prod_vm_zone4_w02, schedule Daily_Inc_Mon-Fri_0000 on storage unit cml_copy1 using backup host lxnbumacmlp02.conseco.bak
03/01/2016 00:03:09 - started process bpbrm (pid=1431)
03/01/2016 00:03:09 - connecting
03/01/2016 00:03:09 - connected; connect time: 0:00:00
03/01/2016 00:03:10 - Info bptm (pid=1435) start
03/01/2016 00:03:10 - Info bptm (pid=1435) using 262144 data buffer size
03/01/2016 00:03:10 - Info bptm (pid=1435) using 30 data buffers
03/01/2016 00:03:10 - Info bptm (pid=1435) start backup
03/01/2016 00:03:10 - begin writing
03/01/2016 00:03:32 - Error bpbrm (pid=1431) from client NTINFASGP01: ERR - Error opening the snapshot disks using given transport mode: san Status 23

03/01/2016 00:03:33 - Critical bpbrm (pid=1431) from client NTINFASGP01: FTL - cleanup() failed, status 6

03/01/2016 00:03:35 - Error bptm (pid=1435) media manager terminated by parent process
03/01/2016 00:03:39 - Info bpbkar (pid=0) done. status: 6: the backup failed to back up the requested files
03/01/2016 00:03:39 - end writing; write time: 0:00:29
the backup failed to back up the requested files  (6)


03/01/2016 00:01:11 - Info nbjm (pid=2896) starting backup job (jobid=1162187) for client NTINFASGP01, policy g_cml_prod_vm_zone4_w02, schedule Daily_Inc_Mon-Fri_0000
03/01/2016 00:01:11 - Info nbjm (pid=2896) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1162187, request id:{9E327C90-DF6A-11E5-B05E-EC80692BC3F8})
03/01/2016 00:01:11 - requesting resource cml_copy1
03/01/2016 00:01:11 - requesting resource lxnbumsp01.conseco.bak.NBU_CLIENT.MAXJOBS.NTINFASGP01
03/01/2016 00:01:11 - requesting resource lxnbumsp01.conseco.bak.VMware.Datastore.vavscmlp01.conseco.ad/CNO
03/01/2016 00:01:11 - requesting resource lxnbumsp01.conseco.bak.VMware.ESXserver.esxntcml03.cnoinc.com
03/01/2016 00:01:11 - requesting resource lxnbumsp01.conseco.bak.VMware.vCenter.vavscmlp01.conseco.ad
03/01/2016 00:01:11 - requesting resource lxnbumsp01.conseco.bak.VMware.snapshot.vCenter.vavscmlp01.conseco.ad
03/01/2016 00:01:17 - granted resource  lxnbumsp01.conseco.bak.NBU_CLIENT.MAXJOBS.NTINFASGP01
03/01/2016 00:01:17 - granted resource  lxnbumsp01.conseco.bak.VMware.Datastore.vavscmlp01.conseco.ad/CNO vSphereCML/vmp-phi-319
03/01/2016 00:01:17 - granted resource  lxnbumsp01.conseco.bak.VMware.ESXserver.esxntcml03.cnoinc.com
03/01/2016 00:01:17 - granted resource  lxnbumsp01.conseco.bak.VMware.vCenter.vavscmlp01.conseco.ad
03/01/2016 00:01:17 - granted resource  lxnbumsp01.conseco.bak.VMware.snapshot.vCenter.vavscmlp01.conseco.ad
03/01/2016 00:01:17 - granted resource  MediaID=@aaaad;DiskVolume=cml_copy1;DiskPool=cml_copy1;Path=cml_copy1;StorageServer=ptgwcml01;MediaServer=lxnbumacmlp02.conseco.bak
03/01/2016 00:01:17 - granted resource  cml_copy1
03/01/2016 00:01:51 - Info bpbrm (pid=32729) NTINFASGP01 is the host to backup data from
03/01/2016 00:01:51 - Info bpbrm (pid=32729) reading file list for client
03/01/2016 00:01:51 - Info bpbrm (pid=32729) start bpfis on client
03/01/2016 00:01:51 - Info bpbrm (pid=32729) Starting create snapshot processing
03/01/2016 00:01:51 - Info bpfis (pid=32750) Backup started
03/01/2016 00:01:51 - snapshot backup of client NTINFASGP01 using method VMware_v2
03/01/2016 00:01:51 - estimated 7376359 kbytes needed
03/01/2016 00:01:51 - begin Parent Job
03/01/2016 00:01:51 - begin Application Snapshot: Step By Condition
Operation Status: 0
03/01/2016 00:01:51 - end Application Snapshot: Step By Condition; elapsed time 0:00:00
03/01/2016 00:01:51 - begin Application Snapshot: Read File List
Operation Status: 0
03/01/2016 00:01:51 - end Application Snapshot: Read File List; elapsed time 0:00:00
03/01/2016 00:01:51 - begin Application Snapshot: Create Snapshot
03/01/2016 00:01:51 - started process bpbrm (pid=32729)
03/01/2016 00:02:02 - end Application Snapshot: Create Snapshot; elapsed time 0:00:11
03/01/2016 00:02:02 - Info bpfis (pid=32750) done. status: 0
03/01/2016 00:02:02 - Info bpfis (pid=32750) done. status: 0: the requested operation was successfully completed
03/01/2016 00:02:02 - end writing
Operation Status: 0
03/01/2016 00:02:02 - end Parent Job; elapsed time 0:00:11
03/01/2016 00:02:02 - Info nbjm (pid=2896) snapshotid=NTINFASGP01_1456808511
03/01/2016 00:02:02 - begin Application Snapshot: Policy Execution Manager Preprocessed
03/01/2016 00:03:39 - Info bpbrm (pid=1834) Starting delete snapshot processing
03/01/2016 00:03:39 - end Application Snapshot: Policy Execution Manager Preprocessed; elapsed time 0:01:37
03/01/2016 00:03:39 - begin Application Snapshot: Stop On Error
Operation Status: 0
03/01/2016 00:03:39 - end Application Snapshot: Stop On Error; elapsed time 0:00:00
03/01/2016 00:03:39 - begin Application Snapshot: Cleanup Resources
03/01/2016 00:03:39 - requesting resource lxnbumsp01.conseco.bak.VMware.snapshot.vCenter.vavscmlp01.conseco.ad
03/01/2016 00:03:39 - granted resource  lxnbumsp01.conseco.bak.VMware.snapshot.vCenter.vavscmlp01.conseco.ad
Operation Status: 0
03/01/2016 00:03:39 - end Application Snapshot: Cleanup Resources; elapsed time 0:00:00
03/01/2016 00:03:39 - begin Application Snapshot: Delete Snapshot
03/01/2016 00:03:39 - started process bpbrm (pid=1834)
03/01/2016 00:03:40 - Info bpfis (pid=1838) Backup started
Operation Status: 6

03/01/2016 00:03:45 - Info bpbrm (pid=1834) INF - vmwareLogger: LoginAPI: SYM_VMC_ERROR:  SOAP_ERROR
03/01/2016 00:03:45 - Info bpbrm (pid=1834) INF - vmwareLogger: SOAP 1.1 fault: "":ServerFaultCode [no subcode]
03/01/2016 00:03:45 - Critical bpbrm (pid=1834) from client NTINFASGP01: vfm_thaw: method: VMware_v2, type: FIM, function: VMware_v2_thaw
03/01/2016 00:03:45 - Critical bpbrm (pid=1834) from client NTINFASGP01: snapshot delete returned status 4204

03/01/2016 00:03:45 - Info bpfis (pid=1838) done. status: 4204

03/01/2016 00:03:45 - end Application Snapshot: Delete Snapshot; elapsed time 0:00:06
03/01/2016 00:03:45 - Info bpfis (pid=1838) done. status: 4204: Incompatible client found
03/01/2016 00:03:45 - end writing
Operation Status: 4204

Operation Status: 6

03/01/2016 00:03:45 - Info bpbrm (pid=1933) Starting delete snapshot processing
03/01/2016 00:03:45 - Info bpfis (pid=1936) Backup started
03/01/2016 00:03:45 - Warning bpbrm (pid=1933) from client NTINFASGP01: Stream g_cml_prod_vm_zone4_w02+d32750+1 pid 1838 is not active.
03/01/2016 00:05:04 - Info bpfis (pid=1936) done. status: 0
03/01/2016 00:05:04 - Info bpfis (pid=1936) done. status: 0: the requested operation was successfully completed
the backup failed to back up the requested files  (6)

Mark_Solutions · ‎03-01-2016

This looks like an application state part of the policy .. perhaps SQL, Exchange or SharePoint

The error is telling you that the NetBackup Client installed on your VM is not of a version that allows the Application to be backed up using a VMware policy

Check the version of the client and upgrade / patch as needed and also make sure that the Symantec VSS provider is installed on it

Let us know what you find as i have also seen this error with other issues .. including of i remember correctly trying to do an incremental backup when doing SQL backups of a VM .. which is not supported but gives a fairly useless error message

ianhoskins · ‎03-01-2016

Unfortunatly none of these servers have any kind of "application" that would need the NBU client.
Plus I can re-run the job minutes later and it is successful.

We don't install the NBU client on VMs anymore and don't use the Symantec VSS either.
From what I have read in the past NBU client on VMs is for application backups and for restoring files, which we don't do directly back to VMs.

Mark_Solutions · ‎03-01-2016

Are all of the VMs on the same datastore / area within vcenter?

Just wondering if this could be a permissions issue?

If they work when you run them manually they could be taking your credentials into account soemwhere but scheduled backups use the registered account

So do they ever work when running from a scheuled backup?

ianhoskins · ‎03-01-2016

Yep credentials are fine. I have other VMs on the same datastores, ESX servers, vCenter that all work great.
When they start via schedule I see the account in VMWare runnign the commands against the VM. Its the same account if I run the backup manually.

Here is the kicker, it is intermittant. Tonights backup run could run just fine. Its never the same servers.

I have about 65gb of VXMS logs that Veritas is trying to sift thru... no luck yet!

ianhoskins · ‎03-01-2016

Some more background, and maybe a VMWare expert can chime in here.

When the admins decided on vCetner 6.0 they seperated some functions off of vCenter using the External Services Platform. I am told this takes care of licensing and authentication and some other items that I am not sure of to free up resources from vCenter. Everything was able to be backed up fine when on vCenter 6.0. Months later admins went to vCenter 6.0u1b. After that upgrade we started seeing these random "6" errors. Like I said in previous posts, random, not the same machine every time, could go days at a time with not seeing any errors.

VMWare was involved and had the admins add more memory and CPU the vCenter because of some other issue they were seeing due to performance collection. At that time the admins also decided to pin the vCenter and External Services Platform to the same ESX to see if that would take care of any issues. Being pinned on the same ESX would mean a 40GB interconnect instead of a 10gb interconnect between the ESX hosts. When the 2 VMWare servers were pinned we didn't see any backup failures.

I just found out that over the weekend the rule for keeping those 2 servers pinned to the same host was lost due to a migration, meaning the 2 VMWare servers were on seperate ESX hosts, and hence 2 days of Netbackup VMWare random failures.

The admins have once again pinned the machines so we will see what happens tonight.

ianhoskins · ‎03-02-2016

Update: Pinning the vCenter and External Services Platform did not help last night.

Michal_Mikulik1 · ‎03-02-2016

Here, the key message is this error in backup-type job:

ERR - Error opening the snapshot disks using given transport mode: san Status 23

Always this error in failing backup jobs?

¨There could be still more causes - engage VxMS logging, it could say more (especially when the error is random).

Rgds

Michal

Mark_Solutions · ‎03-03-2016

Other thoughts ....

1. Could you be running too many jobs at the same time .. this can cause Vmotion to kick in due to the load and then VM's aren't where they should be (where they were when NBU did its 8 hourly(by default) scan) so the disks cannot be found

2. Too many jobs could just overload it anyway .. keep the numbers down - you actually tend to get better performance

3. If there is a lot of vmotion then you could try reducing the scan frequency to a couple of hours .. adds a bit of extra workload to the vmware backup hosts but may help (set on clients tab of each policy) to keep it up to date and everything in the right place

The 23 error is annoying as it isn't always as it seems .. i have a client with it now ... it had 2 new disks added one of which was left raw but online .. so when it does the disk mapping its throws an error which then causes it to fail with a 23 ... even after removing that disk it still failed as it was holding the configuration and trying to scan a disk that didn't exist .. can also happen due to permissions, vmotion moving things around, clients needing consolidation ....

ianhoskins · ‎03-03-2016

Interesting thought on a vmotion moving the VMs so the database is not up to date.
That might explain why a manual backup runs fine.

On your point, too many jobs overloading vcenter... whats too many jobs? I see the best practices just says to run 1 or 2 vm backups per datastore at a time.

mkotan · ‎03-03-2016

What do you have your resource limits set at? You can go to host properties -> master server and its at the bottom.

How many snapshots are you allowing at once and how many per datastore?

ianhoskins · ‎03-04-2016

We ran fine with limits with a resource limit of 3 on datastore until the 6.0u1b upgrade. This would allow about 100-150 vms to backup at a time across 3 media agents.
Last night we set to vcenter and snapshot to 20 and datastore 2 and still see errors.
Veritas is now coming back saying that u1b is not a supported environment after they had ok'd it.
In fact according to their latest emails they are saying that no "letter patches (ie. 5.1.u1b") have ever been supported on veritas as its not on their compatability matrix (5.1u1 is supported).

ianhoskins · ‎03-04-2016

Per Veritas PM of VMware

Just wanted to get us all on the same page. I checked with our Engineering team on this issue and here is their response:

“The issue with 5.5 update 3b was the disabling of SSLV3. This was done for 6.0 GA and required us to pick up VDDK 5.5.4 in 7.6.1.1. So 6.0 and later updates is already covered. There were a couple of things we had to verify in 6.0 u1a and there were no issues in u 1b.”

In short, 6.0 U1b is supported.

A_K1 · ‎03-29-2016

I observed the same phenomenon at one of my customers environment. I'm still investigating in it. But after the easter days it is gone. Did you get rid of it?

VOX

4204: Incompatible client found