cancel
Showing results for 
Search instead for 
Did you mean: 

SSR 2011 fails to cleanup .v2i recovery points after a backup

PCJunky
Level 4

We have Symantedc System Recovery 2010/2011 across several sights, and we have noticed a common problem occurs where previous recovery points do not get cleaned up post backup as per the rules in the backup job.

Is there anything else we can do to make this more robust without resorting to post backup scripts deleteing files as this will mess up SSR's recovery point history and run the risk of deleting the last good backup if the current backup fails.

Currently we are aware of only 2 places we can automate this process, in the backup job you can set the amount of recovery points to keep, and in the manage backup destination/settings you can set a threshhold based on capacity.

So for example if I am creating a backup files that is 350 gb (final compressed .v2i files), and I have set "Limit the number of recovery points for this backup" to 1, and set the "manage destination settings" to "monitor disk space usage for the backup storage" and set the threshold to 360 gb, also set to "auto matically optomise storage"

Have I missed something here or is there a better way we should be doing this?

From what I can see in the logs, at the point the process should be purging the previous days .v2i files SSR seems to think the backup device is unavailable (when it fails), however the backup device in this specific example is an Iomega NAS drive on a 1GB lan...

Thanks

42 REPLIES 42

criley
Moderator
Moderator
Employee Accredited

@PCJunky,

Those errors really do suggest some kind of network problem which needs investigating.

PCJunky
Level 4

Versions updated to 9.0.3.40369 on both servers

criley
Moderator
Moderator
Employee Accredited

@RS,

OK, it looks like there is a problem with the 'Automatically optimize storage' option. SSR knows the threshold has been exceeded because I see a prompt when I go into the 'Manage Backup Destination' screen but clearly the auto optimize feature is not working. The issue may be specific to backing up to external drives (in my case, a USB drive).

Do you have a support contract with us? If yes, can you please open a case and let me know what the case id is?

PCJunky
Level 4

@Chris, I would agree - except we are seeing this on several servers which is what has lead me here, and in this particular instance we have two servers backing up to the same share, both are identical DELL T610's purchased at the same time, both plugged into a the same HP procurve gig lan, and as we can now see from the last two days both have had backup issues one would attribute to network conectivity yet at the same time the other has completed succesfully i.e.

Wednesday:

Server 1 backup to Wednesday share on NAS - succesful

Server 2 backup to Wednesday share on NAS - succesful

Thursday:

Server 1 backup to Thursday share on NAS - succesful

Server 2 backup to Thursday share on NAS - succesful

Friday:

Server 1 backup to Friday share on NAS - succesful

Server 2 backup to Friday share on NAS - failed - unable to connect to share

Monday:

Server 1 backup to Monday share on NAS - failed - inable to connect to share

Server 2 backup to Monday share on NAS - succesful

PCJunky
Level 4

It may also be worth mentioning that my perception is that the problem esculates, so like the example above its starts off ok then gets worse until we interveane, appearing to resolve the issues, then off we go again.

But thats just a feeling.

Hopefully by closley monitoring and reporting my findings here we can build up a more accurate picture...

criley
Moderator
Moderator
Employee Accredited

@RS,

It seems the issue is not specific to backing up to an external device (i.e. USB drive) as I see the same issue when backing up to a network share. This looks like a defect which I will raise internally. However, if you have a support contract, please open a case and let me know what the case ID is.

PCJunky
Level 4

Tuesday:

Server 1 backup to Tuesday share on NAS - succesful

Server 2 backup to Tuesday share on NAS - succesful 

 

That's now a full week of backups, and tonight will be required to purge the previous backup, drive has 1.8TB of capacity and is showing 1.4TB used so there is plenty of space for create the next backup and removing the old backup/recovery point after.

Each individual share is capped at 700GB, and the combined backup of both server currently requires 338GB.

 

criley
Moderator
Moderator
Employee Accredited

For those that need to know, I've escalated this internally. Details here:

http://www.symantec.com/docs/TECH173961

https://www-secure.symantec.com/connect/issues/automatically-optimize-storage-option-not-working-symantec-system-recovery-ssr-2011

PCJunky
Level 4

I have also installed some ping monitoring and logging software on both servers that will confirm for us if the network does drop out when BESR logs show that it has.

EMCO Ping Monitor 

PCJunky
Level 4

Last night was the first night the backup has rolled over and had to create a bakup - then clean up the old recovery points, here are the results:

Wednesday:

Server 1 backup to Wednesday share on NAS - Failed

Server 2 backup to Wednesday share on NAS - succesfull

 

 So server 2 backup up fine with no errors and complete the backup as expected, file sizes as expected AND then at the end of the backup cleaned up the olf recovery points, the ping monitor show no loss of connection, and the symantec logs show everything has worked as expected, and fianlly the NAS is up and running as expected.

here is the ping summary from server 2:

However server 1 has failed, on closer inspection the backup has appeared to complete but as stated at the start of this thread it has crashed and failed when trying to clear up the old recovery points, breaking it down as:

Independent recovery point set to retain only 1 recovery point

During the entire backup time the ping test on server 1 recorded no loss of connection:

But the BESR log shows the following error:

Job started correctly at 6:30 PM then:

06:57 AM (the following morning)

Error EC8F17B7: Cannot create recovery points for job: 3_Wednesday.

Error EC8F03F0: Cannot add the new recovery point to the history of this drive.

Error E7D1001E: Unable to read from file.

Error EBAB03F1: The specified network name is no longer available. (UMI:V-281-3215-6071)

Details:

Source: Backup Exec System Recovery

 

Here is what was in the backup share the following day, as you can see we now have a mishmas of files from the previous recovery poitnset on the 3rd and the new recovery point set on the 10th.

Please let me know your thoughts, I think I have captured everything of relevence here

Andrew

 

 


 

 

 

PCJunky
Level 4

sorry it's abit messy but i was in a rush to post it before the page timed out on me wink

criley
Moderator
Moderator
Employee Accredited

@PCJunky,

As per my comment on the 7th, have you checked this yet?

http://www.symantec.com/docs/TECH141655

PCJunky
Level 4

Thursday:

Server 1 backup to Thursday share on NAS - failed

Server 2 backup to Thursday share on NAS - succesful

------10/11/2011 6:30 PM

Info 6C8F1F65: The drive-based backup job, 4_Thursday, has been started automatically.
Details:
Source: Backup Exec System Recovery

------11/11/2011 00:24 AM

Error EC8F17B7: Cannot create recovery points for job: 4_Thursday.
 Error E7D1001F: Unable to write to file.
  Error EBAB03F1: The specified network name is no longer available.
 Error E7D10046: Unable to set file size.
  Error EBAB03F1: An unexpected network error occurred. (UMI:V-281-3215-6071)

Details:
Source: Backup Exec System Recovery

------11/11/2011 17:08 PM

Info 6C8F0428: Backup Exec System Recovery service stopped.
Details:
Source: Backup Exec System Recovery

------11/11/2011 17:16 PM

Info 6C8F0427: Backup Exec System Recovery service started successfully.
Details:
Source: Backup Exec System Recovery

------

PCJunky
Level 4

I have been able to carry out the instrutions in http://www.symantec.com/docs/TECH141655 today for server 1 and server 2.

criley
Moderator
Moderator
Employee Accredited

This really is going to need further investigation. There is only so much we can do via the forums.

If you can, open a case with support.

PCJunky
Level 4

Friday night both servers backed up so I was optomistic the work round here:

http://www.symantec.com/docs/TECH141655 

Had made a difference, but today server 1 has failed again and server 2 failed to clean up its recovery points:

Error EC8F17B7: Cannot create recovery points for job: 1_Monday.
 Error E7D1001F: Unable to write to file.
  Error EBAB03F1: The specified network name is no longer available. 

But the connection report from the monitor tool I setup shows no loss of conection - again.

So why is BESR so unreliable, as I stated atthe start of this we are experianceing this on many sites which is what prompted me to start this thread and try and capture as much info as possible to see if I had missed anything and see if the comunity could offer any pointers.

Looking at all potential areas for problems here is a break down what this example customer setup has:

Server 1 - SBS 2008 (patched and up to date) - BESR 2010 Standard (patched and up to date)

Backs up "Dell utility - 78mb", "OS 59gb", "Data 329gb", Recovery 1.3gb" the resulting recovery points total aprox 280gb

The longer this backup runs the more unreliable it becomes, so the initial week when no old recovery points need to be removed it usually works ok, then as the weeks progress it develops more problems which seem to centre around to issues, 1 - it fails saying connection to the NAS device was lost (but monitoing the connection does not support this and no connection loss is seen), 2 - it fails to remove old recovery points at the end of the backup (again loss of connection).

Server 1 - Server 2008 Std (patched and up to date) - BESR 2010 Standard (patched and up to date)

Backs up "Dell utility - 78mb", "OS 67.5gb", "Data 45gb", Recovery 550mb" the resulting recovery points total aprox 58gb 

Now there is a posibilty that the monitor software that basically uses ping is not picking up a very brief loss of connection, but then you would have though the sofware would retry for a reasonable windows before giving up.

We have this software on over 50% of the server we look after and the last time I checked we had over 80 Dell server alone, as well as the usual HP and smattering of home bru servers.

I'm looking at this and thinking I have a real problem going forward that I need to resolve or find an alternative solution.

Could someone advise me who I need to speak to at Symantec to try and progress this issue in a productive way?

Thanks for all your help here Chris.

Regards

Andrew

criley
Moderator
Moderator
Employee Accredited

Andrew,

You started this thread for the issue where BESR/SSR was not cleaning up the recovery points. I have since confirmed that this appears to be a defect and have already escalated it internally (see my comment on the 9th Nov).

You now seem to be talking about a different issue. This really should be covered in a new thread with the appropriate title. However, as I said yesterday, this really needs a support case so that it can be looked at in more detail. Based on the number of clients you have, I assume you have a support contract with us?

PCJunky
Level 4

Chris,

Once again than you for your promt responces and patience so far, I have used this thread to track a particular setup I am monitoring and capture events as they unfold here to see if anything stands out that I have missed when dealing with these issues ona fire fighting day to day basis, please bare with me a little longer.

The original thread was titiled:

SSR 2011 fails to cleanup .v2i recovery points after a backup

But to be fair this applies to BESR 2010 and SSR 2011.

So far we have established that we do a full recovery point,not incrimental and there is a known issue with SSR monitoring the backup device and purging recovery points - covered here:

http://www.symantec.com/business/support/index?page=content&id=TECH167758

Which goes some way to answering that original question but does not resolve why the process built into the actual backup job does not always remove the previous recovery points.

Looking at the evidance in this instance as it has ammassed I see to related issues.

1, BESR/SSR does not always remove old recovery points when the job completes, this is seperate to the issue above and is the core of this thread - so far the evidence I have gathered leans towards BESR loosing connection the the backup device, but other evidence indicates this is not a network conectivity issue and so possibly as issue with BESR.

We have applied the suggested work round:

http://www.symantec.com/docs/TECH141655

But it has not helped so far but its only been in place a couple of days so to early to judge.

I'm going to switch the server 1 backup to direct to USB drive to remove any network connectivity issues and see what differences that makes, and will continue to update here....

2, BESR/SSR does not always start the backup job, sighting loss of connectivity - I think this is closely related to the above and so would like ot keep this on the table here as the errors are almost identicle.

Lets see what backing up directly to a usb drive and take it from there....

Agaun thanks for your patience.

Andrew

 

criley
Moderator
Moderator
Employee Accredited

So far we have established that we do a full recovery point,not incrimental and there is a known issue with SSR monitoring the backup device and purging recovery points - covered here:

http://www.symantec.com/business/support/index?page=content&id=TECH167758

Which goes some way to answering that original question but does not resolve why the process built into the actual backup job does not always remove the previous recovery points.

Am I missing something here? This is a defect which is why it's not working. Not sure I understand your comment 'which goes some way to answering..'.

It seems you are seeing other issues as well which need further investigation. As previously mentioned, I believe your best bet is to get a case open so that support can look at this in more detail. The issue may be another defect or it may be something specific to your environment.

Hope that helps.

PCJunky
Level 4

...possibly its my understanding of how this work so I'll lay it out here how I perceive it, and then you can correct me or agree.

There are two places BESR/SSR can be set to groom/remove old recovery points.

1, In the backup job you create there is a sub screen "Define backup wizard\Options" - "Limit the number of recovery points saved for this backup"

I am assuming that this setting is run at the end of the succesful creation of a recovery point and removes all previous recovery points, limiting the total amount of recovery points to the number specified - in this case 1

2, In Tools - manage backup destination, you can set the threshold at which BESR/SSR will automatically grrm the defined bakup device based on the free space, this has a known issue covered here:

http://www.symantec.com/business/support/index?page=content&id=TECH167758 

I'm treeating these as two entirley seperate processes triggerd by differnet criteria?

One runs at the end of a succesful bakup, the other constantly monitors the backup device and grooms it as and when it sees fit?

So if I understand this correctly issue 1 is not related to issues2?

Thanks

Andrew