Forum Discussion

PaulDean's avatar
PaulDean
Level 3
11 years ago

Windows 2008 DIFF Suddenly Runs Big Backup

Hi All,

I would like to share a problem we have with one of our Windows 2008 backups and see if anyone has encountered a similar problem and if possable find a resolution.

Servers Involved:-

NBU Matser - HPUX 11.31 - NBU7504

Client - Windows 2008 x64 - NBU7507

Media server - Symantec 5220 Appliance - NBU 7506

The policy is using dedupe with accelerator and change journal - ALL_LOCAL_DRIVES (the D: drive is the problem drive)

A FULL once every 4 weeks (D: drive has 1.1 tb of data) then DIFFS daily (The Diffs are usually between 10 and 30 gigs and usually takes between 2 to 4 hours to complete)

Problem:-

Every so often the DIFF backup decides to backup 500 to 800 gigs of data and taking 10 to 20 hours to complete.

Anti virus is off during the backups and there appear not to have been any invasive maintenance (ie defraggs or data moves etc ) prior to the DIFFs going large.

We have had many calls logged with support and none have really identified the problem.

The latest was to upgrade to NBU7507 (Which we have done on the client) and this improved matters with the occurance of the BIG DIFFS being reduced drastically in frequency.

Our long term aim is to upgrade to NBU7601 but we are several weeks away from this.

Has anyone encountered a similar behaviour or is there anything that we could check or look into.

Regards

Paul

  • I would say turn up the logigng and see what it's doing from the backups, also look what is restorable from those jobs that have increased 5-800g as opposed to the other DIFF backups. It almost sounds like a DB dump or something along those lines is being generated.

  • Usually this happens when there is an issue with the change journal - perhaps a corrupt file or it detects a linked file (such as Enterprise Vault produces when it leaves shortcuts)

    When this happens it stops using the change journal for a set of files / folders and considers them all new so they get backed up again.

    It does usually write something in the detail of the job log about it but it is not very clear as to what it is saying and the consequences.

    The only way around it is to stop using change journal - it does slow things down a little but avoids the issue as it relies on the track log only which does not experience the same issues with those file types.

    Once the file has been accepted back into the change journal subsequent backups go back to normal - all very odd and annoying but a Microsoft thing rather than a NetBackup one!

    Some details and messages listed here: http://www.symantec.com/docs/HOWTO87030

    Hope this helps

  • Hi and thankyou for your replies.

    Re DG-2005 - We have no evidence of any large data moves or creations when these large diffs occur.

    The backup appears to work but decides to back up many more files than usual, although it is not backing up the whole of the drive fresh, just the majority of it.

    Re Mark - I think this is the more likely cause. We have no indication in the activity monitor when it goes large and no apparent errors in any logs.

    If we decide to disable the journal file, will this trigger a backup of the whole drive again in which case i guess I'd be better to time this for its scheduled FULL

     

    We are planning to upgrade to NBU76, Does this problem still persist hrer ?

     

    Regards

     

    Paul

     

  • The issue is a Journal issue rather than a NBU one - so yes it will persist at 7.6

    If you plan to turn Journal off then it is probably best to plan it for a full backup time (is that tonight - being Friday?) and then run a forced re-scan schedule backup to refresh the track log ready for future backups

    Let us know how it goes

  • Hi Mark,

    I could get it to run a FULL on Saturday night. - Forced re-scan, That will be 48 hours then (From our last one)

    It usually runs ok for a week or two before we get a large DIFF, so It might be a while before I post the results.

     

    Regards

     

    paul

  • Hi all,

    Well I unticked the "use change journal" from the client settings, selected "forced re-scan" from the policy and stopped and started the services on the client and ran a FULL.

    This has now completed after 44 hours, but got this message from the activity log :-

    26/04/2014 17:08:38 - Info bpbkar(pid=5308) change time comparison:<enabled>          
    26/04/2014 17:08:38 - Info bpbkar(pid=5308) accelerator enabled backup, archive bit processing:<disabled>       
    26/04/2014 17:08:38 - Info bpbkar(pid=5308) will attempt to use change journal data for <T:\>  

    Is there a way to check if it actually used the journal or not ?

     

    Also thank you Marianne for for the TN

    Re Use Timestamp as workaround. - We already do (see above)

    Regards

    Paul

  • Hi all,

    The DIFF backups have bveen running, normal size but taking a very long time in duration and getting very low kb/s.

    We attempted a FULL this weekend but have been failing with the following:-

    03/05/2014 17:07:32 - Info bpbkar(pid=5380) accelerator enabled backup, archive bit processing:<disabled>       
    03/05/2014 17:07:32 - Info bpbkar(pid=5380) will attempt to use change journal data for <T:\>    
    03/05/2014 17:07:32 - Info bptm(pid=28359) start            
    03/05/2014 17:07:33 - Info bptm(pid=28359) using 1048576 data buffer size        
    03/05/2014 17:07:33 - Info bptm(pid=28359) using 512 data buffers         
    03/05/2014 17:07:33 - Info nbulapp01(pid=28359) Using OpenStorage client direct to backup from client clientname to appliance1 
    03/05/2014 17:07:38 - begin writing
    03/05/2014 17:10:06 - Info bpbkar(pid=5380) not using change journal data for <T:\>: unable to validate change journal usage <reason=filter checksum validation failed>
    03/05/2014 17:10:06 - Info bpbkar(pid=5380) not using change journal data for enumeration for <T:\> but will use it for change detection
    04/05/2014 07:55:52 - Error bptm(pid=28359) Could not get bytes copied on client 2060057, by sts_ioctl,will use default value.
    04/05/2014 07:55:52 - Error bptm(pid=28359) image copy failed: error 2060057: OpenStorage Proxy Plugin Error    
    04/05/2014 07:55:52 - Error bptm(pid=28359) cannot set media state 2: retval = 2060057     
    04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_get_image_prop failed: error 2060057: OpenStorage Proxy Plugin Error     
    04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_close_handle failed: 2060057 OpenStorage Proxy Plugin Error      
    04/05/2014 07:55:52 - Error bptm(pid=28359) Could not set last backup info to client 2060057    
    04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_get_image_prop failed: error 2060057: OpenStorage Proxy Plugin Error     
    04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_close_handle failed: 2060057 OpenStorage Proxy Plugin Error      
    04/05/2014 07:55:52 - Critical bptm(pid=28359) cannot write image to disk, media close failed with status 2060057 OpenStorage Proxy Plugin Error
    04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_get_server_prop failed: error 2060057 OpenStorage Proxy Plugin Error     
    04/05/2014 07:55:52 - Critical bptm(pid=28359) sts_get_server_prop failed: error 2060057 OpenStorage Proxy Plugin Error     
    04/05/2014 07:55:59 - Info bptm(pid=28359) EXITING with status 14 <----------        
    04/05/2014 07:57:01 - Error bpbrm(pid=27983) [ERROR][proxy_open_server_v7]CORBA::SystemException is caught in proxy_open_server_v7, minor = 1330446338, status = 1, info = system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0'.OMG minor code (2), described as '*unknown description*', completed = NO
    04/05/2014 07:57:01 - Error bpbrm(pid=27983) libsts opensvh() 14/05/04 07:57:01: v11_open_server failed in plugin /usr/openv/lib/libstspinbostpxy.so err 2060057  
    04/05/2014 07:57:01 - Error bpbrm(pid=27983) sts_open_server failed: error 2060057         
    04/05/2014 07:58:00 - Error bpbrm(pid=27983) [ERROR][proxy_open_server_v7]CORBA::SystemException is caught in proxy_open_server_v7, minor = 1330446338, status = 1, info = system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0'.OMG minor code (2), described as '*unknown description*', completed = NO
    04/05/2014 07:58:00 - Error bpbrm(pid=27983) libsts opensvh() 14/05/04 07:58:00: v11_open_server failed in plugin /usr/openv/lib/libstspinbostpxy.so err 2060057  
    04/05/2014 07:58:00 - Error bpbrm(pid=27983) sts_open_server failed: error 2060057         
    04/05/2014 07:58:01 - Info bpbkar(pid=5380) done. status: 14: file write failed       
    04/05/2014 07:58:01 - end writing; write time: 14:50:23
    file write failed(14)

    We are getting nothing logged to the nbostpxy logs file. Is there something I need to enable to collect data here.

    Could this be a corrupt track log file on the client ?

    We have run a DIFF since and this has worked - again very slow / low kb/s

    Any advice would be greatly appreciated.

    Paul

  • This drops the connection after 45 minutes (2700 seconds) - tends to look like a timeout somewhere during the filke system scan - without using the Journal it seems to be just taking a long time to scan the system ready for the backup

    The pd.conf on the client has the setting to enable logs but they can get VERY large!