cancel
Showing results for 
Search instead for 
Did you mean: 

Backups sometimes fail due to comms error

Frosty
Level 3
I've been gradually working on isolating an error that occurs sometimes (not all the time) on my backups.  We're running BE 12.5 and I have updated BE to SP2 and all the latest hotfixes.  I have also rolled out updated BE Agent software to all servers involved in the backup.  This is an example of the errors I am getting:

----------------------------------------- start included error reports
Job Completion Status

Job ended: Wednesday, 24 June 2009 at 1:10:15 AM
Completed status: Failed
Final error: 0xe000fe30 - A communications failure has occurred.
Final error category: Server Errors

For additional information regarding this error refer to link V-79-57344-65072

Errors
Click an error below to locate it in the job log
Backup- \\grvweb\WebWiz V-79-57344-65072 - The connection to target system has been lost. Backup set canceled.

Exceptions
Click an exception below to locate it in the job log
Backup- \\grvweb\WebWiz V-79-57344-34108 - An unexpected error occurred when cleaning up Symantec Volume Snapshot Provider (VSP) snapshots. Make sure that no other application has a lock on the cache files created by the snapshot operation.
----------------------------------------- end included error reports

The server that is failing is GRVWEB.  The details of this server are interesting ... it is Windows Server 2000 ... and it is running as a VMWARE virtual server which is running on another server (GRVWEB1) which runs Windows Server 2003.  So to summarise:

GRVWEB1 runs Windows Server 2003 and VMWARE software and virtual of GRVWEB.
GRVWEB runs Windows Server 2000 and is a VMWARE virtual.

In researching the error messages, I learned that the most likely cause was some problem with anti-virus software.  GRVWEB (the virtual server) does not run any anti-virus software.  GRVWEB1 (the host server) does run Symantec Endpoint Protection.  I have configured it so that the folder where the virtual disk images are stored is not being scanned by SEP.

I also experimented with AOFO.  We are using AOFO for our backups, configured as "Automatic" selection of provider.  It uses VSS on the host server (Windows 2003) and that works fine.  It automatically selects the Symantec VSP for GRVWEB, which is right, given that it is a Windows Server 2000 machine ... but obviously something is still going horribly wrong somewhere in that process.  I've tried turning off AOFO altogether, but then I get all sorts of other issues with open files on my other servers.

Last thing I have tried to change is to set it so that the backup uses Windows Shares on the GRVWEB virtual server.  This seems to have reduced the frequency of the errors, but has not eliminated them altogether.

I was wondering if anyone had any further suggestions, as I am at my wits end.  I would like to upgrade GRVWEB to Windows Server 2003, but that's not an option for me just at the moment.  I was thinking of using Windows Shares to back it up, but uninstalling the BE Agent from GRVWEB.  Would it still back up the data, even without an agent installed?
6 REPLIES 6

CraigV
Moderator
Moderator
Partner    VIP    Accredited
I've got this problem intermittently, strangely enough with...Windows 2000 Server servers.
My solution (which seems to work...until it happens again!), was to reinstall the RAWS, and make sure it was publishing to the correct server. I've also toyed with the idea of setting the parameters of the service to automatically restart with every failure (stop).
Most likely fix is to reinstall the RAWS...uninstalling it first is best.
If you're going to use WIndows shares to back up, you cannot do this. Your shares etc. are greyed out, and you won't be able to access them. The only way around this is to have a Remote Agent installed.

Frosty
Level 3
Thanks ... I will try uninstalling and reinstalling RAWS (am I correct that this means Remote Agent for Windows Servers or something like that?).  Will let you know whether this works or not for me.

Something you mention above regarding ensuring that the agent is reporting correctly to the right management server is ringing a faint bell with me.  I'm new in the job here and I have been slowly working through the network, investigating errors in the Event Logs and that sort of thing.  I noticed a few weeks ago that there were WINS errors being reported.  Those errors indicated that there was a possible problem with the naming of the server, in that it didn't seem to know precisely what domain it belonged to.  Something about the DNS suffix not being set (or similar).

So if GRVWEB (full name: GRVWEB.GRV.LOCAL) is trying to report to GRVMANAGE (which is the server running BE), maybe it cannot (or cannot consistently) find GRVMANAGE.  Maybe I should tell it to report to GRVMANAGE.GRV.LOCAL, or maybe even the IP address?  Dunno.  But I will look into that and see if I can find anything wrong in its setup.

CraigV
Moderator
Moderator
Partner    VIP    Accredited
...that's another way of doing that. Put in the FQDN (Fully Qualified Domain Name) like you said. You might want to make sure that you can ping this FQDN from your remote agent server too, to make sure it resolves IP to name, and vice versa.
If that doesn't work, use the IP address. And if THAT doesn't work, you've got a possible DNS issue with that particular server.
RAWS is Remote Agent for Windows Servers =)

Frosty
Level 3
Yesterday I uninstalled the Remote Agents from both the host server (Windows 2003) and the VMWARE guest server (Windows 2000) and then reinstalled them.  I paid particular attention also to the name of the server they should report to.  It had been just the simple server name (GRVMANAGE) but I changed it after installing to a FQDN (GRVMANAGE.GRV.LOCAL).  Unfortunately this, the reinstallation and the FQDN change, has not fixed my problems, as the backups failed again last night.

My strong feeling is that its something fundamental to do with the fact that I am backing up a "server within a server" because of the VMWARE.

So I am now thinking I might take a completely different approach, by configuring a scheduled task on the guest Windows 2000 Server that will run a regular Windows Backup job and will drop the resulting .BAK file on the host server.  Then I can just backup the host server later in the evening using BackupExec.

Frosty
Level 3
Spent part of last Friday working on this issue.  Discovered that I can occasionally generate network connection errors using other tools (e.g. Windows Backup, Windows Explorer file copying, Robocopy) when copying data from the guest VMWARE server to the host server, and also when just moving data on/off the host server.

So BackupExec would seem to be off the hook in terms of being the cause of the errors.  After discussing this with another guy who is a bit of of a VMWARE guru, he suggested that there was some possibility of a fix if we upgrade from our current v1.x version of VMWARE to the newer v2.x edition.  He confirmed that this upgrading of VMWARE had fixed some similar issues on servers at another site he supports.

So we are going to schedule an upgrade soon.  Once that is done, I will monitor the situation and report back here.  I'm hopeful that the issue will then be fixed and the culprit (old version of VMWARE software) identified.

Frosty
Level 3

I decided to go ahead and install a Symantec Endpoint Protection client on GRVWEB.  The CPU went to 100% caused by RTVSCAN.EXE and after getting some help from Symantec Support, we discovered that this was likely caused by GRVWEB not being joined to the domain.  So we joined it to the domain and this indeed fixed the 100% CPU issue.  However the network disconnection problems in BackupExec continue.  We will be upgrading to VMware v2 on GRVWEB1 in a few weeks.