cancel
Showing results for 
Search instead for 
Did you mean: 

VMware backup of SQL fails to run ASC

CadenL
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi

I've configured a VMware backup of a single SQL virtual machine running SQL2008 on a Windows 2008 platform. I've also installed the Veritas VSS provider.

Basically the ASC is failing to run with the following error:

Error nbpem (pid=3288) Cannot perform application state capture because a client name cannot be determined for <client FQDN>"

And is identical to the following post as far as I can tell

https://vox.veritas.com/t5/NetBackup/MSSQL-VMware-type-backup-error-client-name-cannot-be-determined...

The name resolution seems to be good from both Master server (RH Linux)  and the media server (NBU 5240 Appliance) - both are running NBU 8.1 and are ok with the bpclntcmd commands. I have tried creating host entries on all the machines but get the same error.

I have also noticed the following. The vmtools was showing 'not running' on the sql vm so I rebooted and it started running once booted but seemed to change back to 'not running' when I tried a test backup. The vmtools version is 10.1.15

Within the netbackup policy, if I browse to the vm instead of doing a query I can see 'DNS name' is blank but the IP details are there. I reinstalled the vmtools (which I assume overwrites the Veritas VSS provider?) and this info came back. 

If I set the query find the vm via the displayname it finds the VM ok but I get the ASC error but the backup of the actual vm completes ok.  If I set the query to find the vm based on vm hostname it finds the vm ok but the jobs fails straight away with an error 200 in Activity monitor.

I guess I need the ASC bit working so I can do the SQL GRT but perhaps I don't need the Veritas VSS providers as it sounds like this will only allow me to truncate the logs (and I can do that by other methods).

Amazingly this did work once only - the very first time I tried it having just installed the Veritas VSS provider - but failed on the next time I ran the job and hasn't worked since. Nothing was changed. 

Can anyone tell me the best log files to look into to help troubleshoot this and if the logs files are on the master, the media or the client?

thanks in advance and kind regards

 

1 ACCEPTED SOLUTION

Accepted Solutions

Systems_Team
Moderator
Moderator
   VIP   

Hi CadenL,

I've been down this same path before unfortunately Smiley Sad

Exact same symptoms - VMware Tools appears to be running fine, then boom....IP address and DNS name are gone when you look at your VCentre console, and of course backups that rely on that fail.  If you look at the services remotely, you will find VMTools has stopped.  Try logging in to that server and you'll probably find that VMTools is magically running again.

You'll probably also see that you may have ghost disk drives under Computer Management->Disk Management (likely to have a capacity of 0 (zero) MB).  Also under Device Manager, if you turn on "Show  Hidden Devices", I bet you will find ghost disks under Disk Drives, possibly ghost "Generic volume shadow copy" under Storage volume shadow copies, and ghost Generic volumes under Storage volumes.  The ghost items in Device Manager will be in a very light grey compared to the real, active devices.  If you look under their properties on the General tab, you'll probably see "Currently, this hardware device is not connected to the computer. (Code 45)".  I see all this items as left overs of what happens after this issue.

You can delete all these items, and when the issue happens again they will reappear.  If you do a VSSADMIN List Shadows, interestingly there are no orphaned shadow copies, just these ghost devices.

The cause appears to be a number of buggy versions of VMware Tools.  We saw this first on a very early 10.0.x version, and several versions following that.  We tried 10.1.0, and initially that seemed good but then started having the same problem with that.  We currently run 10.1.10 and so far so good.  But disappointing to see you are 10.1.15 and still seeing this, which means the same or similar bug keeps getting reintroduced.

For temporary workarounds for this, we had to revert to pure agent based backups as an interim method.  Doing that we would see that VMTools sayed stable and didn't bomb out.

Not a fix, but hope this info helps.

Steve

View solution in original post

6 REPLIES 6

D_Flood
Level 6

I haven't upgraded to 8.1 yet but I know in 7.7.3 there are two places in the Policy definition where the method of picking up the name is specified and both have to match for things to work right.  The two locations are the Clients tab and the VMWare tab.  If they don't match then you get things like VM's being listed by IP address in OpsCenter and in the "Client Backups" list.

Also, unless you have a VMWare Admin that likes renaming systems, using the VM Name rather than other attributes seems (at least in my experience) to be the best selection criteria.

Also, make sure that the client name specified in the client attibutes is the correct one.  In some cases you may have to use the FQDN for "extras" like SQL Backup to kick in correctly.

 

 

Systems_Team
Moderator
Moderator
   VIP   

Hi CadenL,

I've been down this same path before unfortunately Smiley Sad

Exact same symptoms - VMware Tools appears to be running fine, then boom....IP address and DNS name are gone when you look at your VCentre console, and of course backups that rely on that fail.  If you look at the services remotely, you will find VMTools has stopped.  Try logging in to that server and you'll probably find that VMTools is magically running again.

You'll probably also see that you may have ghost disk drives under Computer Management->Disk Management (likely to have a capacity of 0 (zero) MB).  Also under Device Manager, if you turn on "Show  Hidden Devices", I bet you will find ghost disks under Disk Drives, possibly ghost "Generic volume shadow copy" under Storage volume shadow copies, and ghost Generic volumes under Storage volumes.  The ghost items in Device Manager will be in a very light grey compared to the real, active devices.  If you look under their properties on the General tab, you'll probably see "Currently, this hardware device is not connected to the computer. (Code 45)".  I see all this items as left overs of what happens after this issue.

You can delete all these items, and when the issue happens again they will reappear.  If you do a VSSADMIN List Shadows, interestingly there are no orphaned shadow copies, just these ghost devices.

The cause appears to be a number of buggy versions of VMware Tools.  We saw this first on a very early 10.0.x version, and several versions following that.  We tried 10.1.0, and initially that seemed good but then started having the same problem with that.  We currently run 10.1.10 and so far so good.  But disappointing to see you are 10.1.15 and still seeing this, which means the same or similar bug keeps getting reintroduced.

For temporary workarounds for this, we had to revert to pure agent based backups as an interim method.  Doing that we would see that VMTools sayed stable and didn't bomb out.

Not a fix, but hope this info helps.

Steve

CadenL
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi Steve

Thanks for the input, I think it's proved to be very useful. So, firstly, I'm still working through this for a fox but I have made (I think!) some progress. 

It turns out that I was running 10.1.5 of the vmtolls and not 10.1.15  - I don't think I was wearing my glasses when I made a note of the version number ;o)

So I updated this to version 10.1.15 and that made the vmtools much more stable. Now it keeps 'running' through backup attempts and after reboots and logins etc. It also shows all the information correclty when I do a browse for the vm within the clients tab of the policy. 

What else has changed is the error when the ASC fails. Previously it would give me an error saying the client name couldn't be determined (which I assume was due to missing content as a result of the vmtools not running) now it fails with an error 47  - host is unreachabe.

So I'm still getting an error but feel a little happier that I've something more stable to troubleshoot now. So I'm gping to re-install the the client software and the Veritas VSS provider and see what that brings.

I have a couple of questions though that may help me ensure I have a properly configured system:

Within the policy, on the VMware tab what should the primary identifier be set to? I've tried both displayname and vm hostname and both allow me to correctly query the vcentre server but I wasn't sure which is the best setting?

Within the query itself (and bearing in mind I'm only working with a single host) I set the query to either vm hostname contains "mysqlclient" and this seems to be ok. But I can also use the displayname contains "mysqlclient"  and this too returns the correct result - again which is best?

Finally - on the client itself, should I see the Veritas VSS provider when I run a VSSadmin List Providers command? I don't see anything other than the MS provider - I didn't even see the vmware provider before I replaced it with the Veritas one - Should I see the Veritas and\or the VMware provieders? or only the MS provider?

many thanks in advance

Systems_Team
Moderator
Moderator
   VIP   

Hi CadenL,

Great feedback, and I'm really happy you made a mistake with the VM Tools version. I was really worried the same painful bug was reintroduced in a version later than what I had found was stable.

I use "VM DNS Name" as the Primary VM Identifier on the VMware tab. The reason I chose this is that it gives me the FQDN of the client, whereas the other two will give the short name. Depending on your config, MS-SQL will probably know about this client by its FQDN, where DisplayName could be totally different. It's possible but not guaranteed that this may help with your status 47 as well. Having gone with VM DNS Name also allowed me to pick up on this issue with VM Tools stopping, as VM DNS Name no longer exists when that happens.

For the query, I don't think it matters too much what fields you use, so long as it works for your environment, but I do recommend the VM DNS Name for the reasons listed above. For my VM SQL backups, I use a query like this:

Datacenter Equal "MY-Datacenter" AND [NB_BACKUP_MSSQL] Equal "YES"

You could also use something like - VMDNSName equal "MyServerName.MyDomain.Name"

As I have 3 datacenters, I use that part to limit the query, and then the [NB_BACKUP_MSSQL] is a Custom Attribute configured in VCentre. The type is "Virtual Machine", so this appears as a custom attribute on all VM's (even though they may not be targets for SQL backup. I also have other attributes for Standard and Exchange backups). I set the value of this as YES, or it is left blank if not needed/wanted. The custom attributes may take a little while to show up in NetBackup, but they are magic once you have them. When they have populated and you've written your query, then test it and hey presto you've got just your MS-SQL VM's. I know VMware are deprecating Custom Attributes and moving to Tags, so I'll have a little rework when I get to that point.

For the Veritas (still called Symantec in 7.7.3) VSS Provider, no you normally won't see it listed as a Provider, or as a Service....but if you have a look while the backup is running (during the snapshot phase), you will - it's added on the fly. It will disappear afterwards.  Not only does this VSS provider help with truncating logs, but if you don't have it then it is possible that you won't get a consistent snapshot - also fully supported by Veritas when configured like that.

A few extra bits if you haven't been through this before (most of this is in the VMware and SQL NetBackup manuals):

  • Make sure the account you run the client under is the same one MS-SQL is running under, and has SA rights to DB's - similar to a traditional SQL agent based backup.
  • When using VMware SQL policies, all your backup schedules should be FULL backups, even your daily one which would normally be an incremental.
  • Sometimes you'll need to run your first backup with the Truncate Logs option unticked, followed by turning it back on and leaving it that way. If memory serves me correctly I think you would get a status 1 and the logs not truncating without doing that - think it is mentioned in the manual.
  • If you've not worked with SQL backups before, when you do a restore make sure you're using the NetBackup MS SQL Client, not the normal BAR GUI - see lots of people get stung by that.

I find the VM SQL (and Exchange) policies awesome.  From the one backup I can restore SQL DB's, the entire VM, VMDK's and individual files.  I don't have to configure separate SQL backups.  The only downside is if you had a very busy SQL box and you needed to backup DB's and truncate logs fairly regularly then you'd probably need to use traditional script based or SQL Intelligent Policies instead.

Hope this helps,

Steve

CadenL
Moderator
Moderator
Partner    VIP    Accredited Certified

All sorted! :o)

The main problem was with the version of vmtools that kept failing and so once that was upgraded things become much better.

The followup issue with the error 47 was down the to 8.1 client version now needing a certificate from the master server to allow the connection - somehow this wasn't working and so I needed to generate a new one

Thanks very much Syeve you've been a great help.

 

Systems_Team
Moderator
Moderator
   VIP   

Excellent Smiley Very Happy

Glad to have helped,

Steve