Be2012 Vsphere 5.0 Browse Failure Cannot connect t...

King_Julien · ‎02-05-2013

Hi all,

Im facing this error message randomly when I open a policy.

"Browse failure
Failure to browse 'lalalalalalala.domain.com'
Cannot connect to vCenter or ESX host"

Here's the output of SGMON of the time I reproduce the issue when I opened a policy moments ago:

BESERVER: [05/02/13 13:29:16] [2956] -1 SetServiceStatus: state: SERVICE_RUNNING check point: 0 wait hint: 0

BEREMOTE: [02/05/13 13:29:19] [5508] 2013-02-05T13:29:18.687 [fsys\vmvcb] - VM_VCBPROXY_FS::ConnectToVCServer: vmcConnect: failed

BEREMOTE: [02/05/13 13:29:19] [5508] 2013-02-05T13:29:18.687 [fsys\vmvcb] - VM_VCBPROXY_FS::ConnectToVCServer: failed, result = 0XE0009583

BEREMOTE: [02/05/13 13:29:19] [5508] 2013-02-05T13:29:18.687 [fsys\vmvcb] - VM_VCBPROXY_FS::GenerateVCItemDles() Could not connect to the VC server 'lalalalalalalala.localdomain' User:'administrator'

BEREMOTE: [02/05/13 13:29:19] [5508] 2013-02-05T13:29:18.687 [fsys\vmvcb] - VM_VCBPROXY_FS::GenerateVCItemDles() Could not delete the file C:\Program Files\Symantec\Backup Exec\Data\{9478DFF7-DDA6-411E-B637-BD559348CF76}.XML(0x2)

BEREMOTE: [02/05/13 13:29:19] [0860] 2013-02-05T13:29:18.687 [dsss] + rpcdssession.cpp (920):

BEREMOTE: [02/05/13 13:29:19] [0860] 2013-02-05T13:29:18.687 [dsss] | Cached attach failed: 0xe0009583

MANAGEME: [05/02/13 13:29:18] [2248] BECAT : [4756]:[4]: error in new DSEnumerator() : -536832637

MANAGEME: [05/02/13 13:29:18] [2248] BECAT : [4756]:[1]: error(-20).

MANAGEME: [05/02/13 13:29:18] [2248] BECAT : [4756]:[5]: Total objects found: 0

MANAGEME: [05/02/13 13:29:18] [2248] BECAT : [4756]:[5]: m_sqlqueryTTL( 0 )

MANAGEME: [05/02/13 13:29:18] [2248] BECAT : [4756]:[5]: return -1

MANAGEME: [05/02/13 13:29:18] [2248] BECAT : [4756]:[3]: SQLQuery (MachineInfo_View) Time Spend = 0 ms - ReturnedCount (0)

MANAGEME: [02/05/13 13:29:19] [0000] WARNING: Resource string '' was not found in BackupExec.Management.Resources.Localized

MANAGEME: [02/05/13 13:29:19] [0000] WARNING: LocalizedDeleteException: Unknown Message Resource ID

MANAGEME: [02/05/13 13:29:19] [0000] WARNING: Resource string '' was not found in BackupExec.Management.Resources.Localized

MANAGEME: [02/05/13 13:29:19] [0004] 02/05 13:29:18.688[MetaData ] QueryMetaData ends: MDQ_MachineInfo_View rc=0 numrows=0

MANAGEME: [02/05/13 13:29:19] [0000] WARNING: Resource string '' was not found in BackupExec.Management.Resources.Localized

MANAGEME: [02/05/13 13:29:19] [0000] WARNING: LocalizedDeleteException: Unknown Message Resource ID

MANAGEME: [02/05/13 13:29:19] [0000] ERROR: LocalizedException:

MANAGEME: [02/05/13 13:29:19] [0000] ERROR: Message: Cannot connect to vCenter or ESX host.

MANAGEME: [02/05/13 13:29:19] [0000]

MANAGEME: [02/05/13 13:29:19] [0000] ERROR: Reason:

MANAGEME: [02/05/13 13:29:19] [0000] ERROR: Stack: at BackupExec.Management.Components.MetaData.MetaDataManager.HandleErrorList(List`1 errorInfoList)

MANAGEME: [02/05/13 13:29:19] [0000] at BackupExec.Management.Components.MetaData.MetaDataManager.QueryMetaDataAsyncWorker(MetaDataQueryInfo metaDataQueryInfo)

MANAGEME: [02/05/13 13:29:19] [0000] Asynchronous task c4f36bc9-142e-437c-96b5-68963917082e completing 'QueryMetaDataAsyncWorker'

MANAGEME: [02/05/13 13:29:19] [0000] Task AsynchronousMethodTask completed

As mentioned this is random. Most of my VM backup fails with messages like "Final error: 0xe0009583 - Cannot connect to vCenter or ESX host"

I also checked and replied some questions here https://www-secure.symantec.com/connect/forums/v-79-57344-38308-my-full-file-server-backup-vmware-agent but nothing yet.

What else can I check? As far as I know (Since I'm new here) is that vcenter and the media server are in different subnets. But as mentioned before, this behavior occurs randomly. Now I have my backups configured to go via RAWS and they are working fine, but I also need a backup of the VMDK+GRT!

Any help or suggestion is more than welcome!

CraigV · ‎02-05-2013

Have you ruled out a DNS issue by resolving the vCenter and media server by name/IP from each other?

lmosla · ‎02-05-2013

Hi, a few thoughts, Make sure you have the VMware tools installed on the guest virtual machine. If you have the VMware VSS Provider installed on the guest virtual machine uninstall the VMware VSS Provider. see this Technote: http://www.symantec.com/business/support/index?page=content&id=TECH129864 Then since you want to use GRT, install the Agent for Windows on the guest virtual machine. Note: the VMware Tools needs to be installed prior to installing the Agent for Windows. Take a look at the Best Practices for VMware http://www.symantec.com/business/support/index?page=content&id=HOWTO74443 In your job enable the "Advanced Open File" and choose the Microsoft VSS provider "System - Microsoft VSS provider".

King_Julien · ‎02-05-2013

Ping from Media server to Vcenter using hostname, "Ping request could not find host vcenter. Check the name and try again"
Ping from Media server to Vcenter using IP, OK.

Ping from Vcenter to Media Server by hostname, "Request timed out"
Ping from Vcenter to Media server by IP, "Request timed out"

Next time I will start from the basics... Thank you CraigV!

King_Julien · ‎02-05-2013

And now, the networking team is telling me "Oh yeap.. sorry... there's a firewall between the media server and then vcenter"

...I love those guys...

King_Julien · ‎02-15-2013

Well, after a week, the firewall rules were finally changed and now, at least I have ping reply from both ends and a wide range of ports opened.

Minutes ago I was trying some "1 time backup" and they were working fine (even with no RAWS installed for GRT file level), and then, suddenly...

Job ended: viernes, 15 de febrero de 2013 at 11:50:31 a.m.

Completed status: Failed

Final error: 0xe0009583 - Cannot connect to vCenter or ESX host.

Final error category: Resource Errors

For additional information regarding this error refer to link V-79-57344-38275

No other backup were running at the same time.

I dont' see any relevant event id neither on media server or vcenter.

I retried the job and it's running now... ¿any idea of where can I look for a further troubleshooting?

Just in case, my actual media server is running virtual too.

King_Julien · ‎02-15-2013

By the moment I removed the "mark as solution", sorry men, the issue persist !

Backup_Exec1 · ‎02-15-2013

Hi

Please check below link for this issue

http://www.symantec.com/docs/TECH167858

Hope that helps

Thanks

CraigV · ‎02-15-2013

...your media server is virtual? Are you perhaps getting this error with the backup server included in the job...?

If it is included, exclude/remove it and run the backup job again.

King_Julien · ‎02-15-2013

Hi,

My media server is not included in the jobs.

Regards,

King_Julien · ‎02-15-2013

Hi,

I 1. A port conflict to the VCenter Server

- Symptoms:

Unable to access the ESX from the selection as it freezes.

I can access my selection list, but from time to time an error message appears telling me about "unable to connect". On the backup job properties, vsphere port number is 902.

2. An old server may be still in the selection list

This doens't aplies since this also happens with "1 time backup", meaning that this is a fresh new selection list. As mentioned before, if after that first failed backup I rerun the job, it started to run.

3. Heavy load on the ESX server

I also thought this could be a possibility, still, today when the backup failed I checked on the vcenter under "performance" tab (network, cpu and memory) and it was normal. I read KB from Vmware and as suggested there, my VM's are distributed on several datstores. The VM that failed today during my "1 time backup" test, resides on a datastore with 120 Gb free of 1 Tb.

I never had the errors when trying to connect to Vcenter:

"VMware Infrastructure Client could not establish the initial connection with server "xyz". Details: The server took too long to respond"

"VMware Infrastructure Client could not establish a connection with server "xyz". Details: Timeout retrieving server inventory"

Regards,

CraigV · ‎02-16-2013

...have you tried to run a repair of BE through Add/Remove Programs and a repair of the BEDB through BEutility.exe? Tried removing the Virtual Center server and then readding it?

King_Julien · ‎02-18-2013

Hi Craig, good afternoon.

I think I got it... Not sure.

When doing VM backups, the disk of the VM being backed up appears on disk management on the media server, BE brings that disk to the media server to perform the backup.

Now, what happens when two or more jobs are running at the same time having a disk of 1.6 ot 1.3 Tb? I believe that in certain way, backup exec collapses and then send the message "cannot connect to vcenter".

I realize of this today when I was checking the free space of the datastores and I found that one of them was almost filled by the media server... That was weird because the VM settings for the media server is only two disk with 50 gb in total.

So as I said before, I guess that this happens when bringing several vm's.

I guess the best thing to do is to enable NDB transport mode and see what's happening, right now the tranport mode is Hotadd.

To use SAN transport mode the media server must be a pshycal server right?

Colin_Weaver · ‎02-19-2013

If your datastores are filling up expecetedly then check for oprhans/extra snapshots left behind. Too many snapshpts can affect general performance of teh VMs (not just backup performence), and depending on the state of the snapshot chain has the potential to cause backup issues.

If Symantec snapshots are left behind then you have either experienced a process crash mid backup job, or Backup Exec sent the command to remove the snapshot and for whatever reason the command was never fully actioned on the VMware side.

If you have lots of non Symantec snapshots then we do recommend that no snapshots are present at the start of any backup operations.

King_Julien · ‎02-19-2013

Hi Collin,

Yes, already did that. Just for sharing information for other guys that might have some issue like this>

Determining if there are leftover delta files or snapshots that VMware vSphere or Infrastructure Client cannot detect
http://kb.vmware.com/kb/1005049

In fact yes, BE sometimes left some snapshots but then we delete them. BE2012 is fully patched, in fact one of the latest hotfix installed was http://www.symantec.com/docs/TECH199866 and we also followed http://www.symantec.com/docs/TECH200709.

The only snapshots we might have are the ones from BE.

Running the backups using NDB transport seems to be working fine, but it's taking for ever, right now I have one backup running since 23 hours ago to backup a server with 2 tb; Yesterday, when I realized of the possible cause of the issue I was running a backup using hotadd transport for 1.4 tb that job completed succesfully after 9 hours, even when I was running small backups (10 to 40 gb backups) of another

Im going to configure the backups for the smaller servers to use HotAdd and for the biggest servers i;m going to use NDB.

I don't know if it's related, but from time to time, in vsphere an error message appears telling "A general error ocurred: Solution is not responding" (In regards of BE), I'm using ESX 5.0.0

Anyway, as mentioned on another post we have the ok to start testing another backup tools on our enviroment. In the meanwhile, until we start with the migration (it can be in 2 days or 8 months, who knows) we have to deal with this and relaunch the failed jobs manually.

Regards.

VOX

Be2012 Vsphere 5.0 Browse Failure Cannot connect to Vcenter