cancel
Showing results for 
Search instead for 
Did you mean: 

Windows 2008 DIFF Suddenly Runs Big Backup

PaulDean
Level 3

Hi All,

I would like to share a problem we have with one of our Windows 2008 backups and see if anyone has encountered a similar problem and if possable find a resolution.

Servers Involved:-

NBU Matser - HPUX 11.31 - NBU7504

Client - Windows 2008 x64 - NBU7507

Media server - Symantec 5220 Appliance - NBU 7506

The policy is using dedupe with accelerator and change journal - ALL_LOCAL_DRIVES (the D: drive is the problem drive)

A FULL once every 4 weeks (D: drive has 1.1 tb of data) then DIFFS daily (The Diffs are usually between 10 and 30 gigs and usually takes between 2 to 4 hours to complete)

Problem:-

Every so often the DIFF backup decides to backup 500 to 800 gigs of data and taking 10 to 20 hours to complete.

Anti virus is off during the backups and there appear not to have been any invasive maintenance (ie defraggs or data moves etc ) prior to the DIFFs going large.

We have had many calls logged with support and none have really identified the problem.

The latest was to upgrade to NBU7507 (Which we have done on the client) and this improved matters with the occurance of the BIG DIFFS being reduced drastically in frequency.

Our long term aim is to upgrade to NBU7601 but we are several weeks away from this.

Has anyone encountered a similar behaviour or is there anything that we could check or look into.

Regards

Paul

9 REPLIES 9

DG-2005
Level 5

I would say turn up the logigng and see what it's doing from the backups, also look what is restorable from those jobs that have increased 5-800g as opposed to the other DIFF backups. It almost sounds like a DB dump or something along those lines is being generated.

Mark_Solutions
Level 6
Partner Accredited Certified

Usually this happens when there is an issue with the change journal - perhaps a corrupt file or it detects a linked file (such as Enterprise Vault produces when it leaves shortcuts)

When this happens it stops using the change journal for a set of files / folders and considers them all new so they get backed up again.

It does usually write something in the detail of the job log about it but it is not very clear as to what it is saying and the consequences.

The only way around it is to stop using change journal - it does slow things down a little but avoids the issue as it relies on the track log only which does not experience the same issues with those file types.

Once the file has been accepted back into the change journal subsequent backups go back to normal - all very odd and annoying but a Microsoft thing rather than a NetBackup one!

Some details and messages listed here: http://www.symantec.com/docs/HOWTO87030

Hope this helps

PaulDean
Level 3

Hi and thankyou for your replies.

Re DG-2005 - We have no evidence of any large data moves or creations when these large diffs occur.

The backup appears to work but decides to back up many more files than usual, although it is not backing up the whole of the drive fresh, just the majority of it.

Re Mark - I think this is the more likely cause. We have no indication in the activity monitor when it goes large and no apparent errors in any logs.

If we decide to disable the journal file, will this trigger a backup of the whole drive again in which case i guess I'd be better to time this for its scheduled FULL

 

We are planning to upgrade to NBU76, Does this problem still persist hrer ?

 

Regards

 

Paul

 

Mark_Solutions
Level 6
Partner Accredited Certified

The issue is a Journal issue rather than a NBU one - so yes it will persist at 7.6

If you plan to turn Journal off then it is probably best to plan it for a full backup time (is that tonight - being Friday?) and then run a forced re-scan schedule backup to refresh the track log ready for future backups

Let us know how it goes

PaulDean
Level 3

Hi Mark,

I could get it to run a FULL on Saturday night. - Forced re-scan, That will be 48 hours then (From our last one)

It usually runs ok for a week or two before we get a large DIFF, so It might be a while before I post the results.

 

Regards

 

paul

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I have found this TN that says Archive Bits may not always be cleared after a successful backup.

Use Timestamp as workaround.

See http://www.symantec.com/docs/TECH215838 

PaulDean
Level 3

Hi all,

Well I unticked the "use change journal" from the client settings, selected "forced re-scan" from the policy and stopped and started the services on the client and ran a FULL.

This has now completed after 44 hours, but got this message from the activity log :-

26/04/2014 17:08:38 - Info bpbkar(pid=5308) change time comparison:<enabled>          
26/04/2014 17:08:38 - Info bpbkar(pid=5308) accelerator enabled backup, archive bit processing:<disabled>       
26/04/2014 17:08:38 - Info bpbkar(pid=5308) will attempt to use change journal data for <T:\>  

Is there a way to check if it actually used the journal or not ?

 

Also thank you Marianne for for the TN

Re Use Timestamp as workaround. - We already do (see above)

Regards

Paul

PaulDean
Level 3

Hi all,

The DIFF backups have bveen running, normal size but taking a very long time in duration and getting very low kb/s.

We attempted a FULL this weekend but have been failing with the following:-

03/05/2014 17:07:32 - Info bpbkar(pid=5380) accelerator enabled backup, archive bit processing:<disabled>       
03/05/2014 17:07:32 - Info bpbkar(pid=5380) will attempt to use change journal data for <T:\>    
03/05/2014 17:07:32 - Info bptm(pid=28359) start            
03/05/2014 17:07:33 - Info bptm(pid=28359) using 1048576 data buffer size        
03/05/2014 17:07:33 - Info bptm(pid=28359) using 512 data buffers         
03/05/2014 17:07:33 - Info nbulapp01(pid=28359) Using OpenStorage client direct to backup from client clientname to appliance1 
03/05/2014 17:07:38 - begin writing
03/05/2014 17:10:06 - Info bpbkar(pid=5380) not using change journal data for <T:\>: unable to validate change journal usage <reason=filter checksum validation failed>
03/05/2014 17:10:06 - Info bpbkar(pid=5380) not using change journal data for enumeration for <T:\> but will use it for change detection
04/05/2014 07:55:52 - Error bptm(pid=28359) Could not get bytes copied on client 2060057, by sts_ioctl,will use default value.
04/05/2014 07:55:52 - Error bptm(pid=28359) image copy failed: error 2060057: OpenStorage Proxy Plugin Error    
04/05/2014 07:55:52 - Error bptm(pid=28359) cannot set media state 2: retval = 2060057     
04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_get_image_prop failed: error 2060057: OpenStorage Proxy Plugin Error     
04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_close_handle failed: 2060057 OpenStorage Proxy Plugin Error      
04/05/2014 07:55:52 - Error bptm(pid=28359) Could not set last backup info to client 2060057    
04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_get_image_prop failed: error 2060057: OpenStorage Proxy Plugin Error     
04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_close_handle failed: 2060057 OpenStorage Proxy Plugin Error      
04/05/2014 07:55:52 - Critical bptm(pid=28359) cannot write image to disk, media close failed with status 2060057 OpenStorage Proxy Plugin Error
04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_get_server_prop failed: error 2060057 OpenStorage Proxy Plugin Error     
04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_get_server_prop failed: error 2060057 OpenStorage Proxy Plugin Error     
04/05/2014 07:55:59 - Info bptm(pid=28359) EXITING with status 14 <----------        
04/05/2014 07:57:01 - Error bpbrm(pid=27983) [ERROR][proxy_open_server_v7]CORBA::SystemException is caught in proxy_open_server_v7, minor = 1330446338, status = 1, info = system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0'.OMG minor code (2), described as '*unknown description*', completed = NO
04/05/2014 07:57:01 - Error bpbrm(pid=27983) libsts opensvh() 14/05/04 07:57:01: v11_open_server failed in plugin /usr/openv/lib/libstspinbostpxy.so err 2060057  
04/05/2014 07:57:01 - Error bpbrm(pid=27983) sts_open_server failed: error 2060057         
04/05/2014 07:58:00 - Error bpbrm(pid=27983) [ERROR][proxy_open_server_v7]CORBA::SystemException is caught in proxy_open_server_v7, minor = 1330446338, status = 1, info = system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0'.OMG minor code (2), described as '*unknown description*', completed = NO
04/05/2014 07:58:00 - Error bpbrm(pid=27983) libsts opensvh() 14/05/04 07:58:00: v11_open_server failed in plugin /usr/openv/lib/libstspinbostpxy.so err 2060057  
04/05/2014 07:58:00 - Error bpbrm(pid=27983) sts_open_server failed: error 2060057         
04/05/2014 07:58:01 - Info bpbkar(pid=5380) done. status: 14: file write failed       
04/05/2014 07:58:01 - end writing; write time: 14:50:23
file write failed(14)

We are getting nothing logged to the nbostpxy logs file. Is there something I need to enable to collect data here.

Could this be a corrupt track log file on the client ?

We have run a DIFF since and this has worked - again very slow / low kb/s

Any advice would be greatly appreciated.

Paul

Mark_Solutions
Level 6
Partner Accredited Certified

This drops the connection after 45 minutes (2700 seconds) - tends to look like a timeout somewhere during the filke system scan - without using the Journal it seems to be just taking a long time to scan the system ready for the backup

The pd.conf on the client has the setting to enable logs but they can get VERY large!