Forum Discussion

Kadir_Ovac_kl_'s avatar
11 years ago

When too much process on appliance, randomly some of database backup fails at the middle of process

Hello,

Firstly, this is a big company, nearly 20-25 TB data is backedup as weekly full backup. I think the problem occurs when i have duplication and backup processes at the same time. Randomly some db backup fails with 6 errors at the middle of job. Sometimes i make rebase is off and these errors are lower than to rebaseon. After a fail, i retry to job and it mostly finishes successfully. i dont exactly know which logs are needed. i try to explain my envoirement. 

Master -> 7.5.0.7, windows 2008r2

Media -> 5220 appliance, 2.5.4 

Clients -> windows 2008r2, sql2008

 

i found some link about the issue, but i dont know exactly these changes work or not.

http://www.symantec.com/connect/forums/interrupt-when-backup-large-sql-databases

http://www.symantec.com/connect/forums/5220-limits?page=0#comment-10092541

 

Thanks a lot 

 

  • The tuning on the first link wll help you - some will need acces to the O/S on the appliance, some can be done through the CLISH and web GUI

    The advice i gave on the other thread is very valid too - keep an eye on your queue and manually process it regularly to kep it down

    Upgrading to 7.6.0.2 will also help you greatly as the standard tuning is all done in that version and the de-dupe engine is greatly improved

    Of course restricting the number of jobs running at any one time may help

    7.6 also provides the SLP window .. so you can schedule your SLPs to run the duplication during the day to prevent the overload when your backups are running

    If possible get to 7.6.0.2 but if not process your queues regularly (crcontrol --processqueue) and use some of the appliance tuning shown in the first thread

    Hope this helps

  • The tuning on the first link wll help you - some will need acces to the O/S on the appliance, some can be done through the CLISH and web GUI

    The advice i gave on the other thread is very valid too - keep an eye on your queue and manually process it regularly to kep it down

    Upgrading to 7.6.0.2 will also help you greatly as the standard tuning is all done in that version and the de-dupe engine is greatly improved

    Of course restricting the number of jobs running at any one time may help

    7.6 also provides the SLP window .. so you can schedule your SLPs to run the duplication during the day to prevent the overload when your backups are running

    If possible get to 7.6.0.2 but if not process your queues regularly (crcontrol --processqueue) and use some of the appliance tuning shown in the first thread

    Hope this helps

  • Hi Mark,

    Thank you for your very fast reply.

    At least 2 months later, nbu will be upgraded, so, i need a different solution.

    Full backup is taken once a week, on saturday, and 3rd copy of this backup duplicates on tape. Tape library has 2 drivers, so, 3rd copy duplication finishes on wednesday. i face this problem only sunday, monday, and tuesday. If i killed all duplication jobs, i dont face these errors.

     

    Duplication jobs are tried to delay by using LIFECYLCE_PARAMETERS and crontab. Also rebase state is off automatically by crontab on saturday and sunday.

     

    i actually did 2nd link changes on LIFECYLCE_PARAMETERS on master server. According to my observation normally daily duplications finish until 8am, after these changes this jobs dont finish until 10am. Also i faced with 83,84 errors on duplication jobs.

    First link has different changes on appliance, actually i am a little afraid, if i do this changes, can i face with bigger problems?

     

    Thanks

  • You can safely do all of the changes from the first link as well but of you want to go carefull do the size and number DATA_BUFFERS, just:

    echo 800 > /usr/openv/netbackup/db/config/DPS_PROXYDEFAULTRECVTMO

    Set /usr/openv/pdde/pdcr/etc/contentrouter.cfg

    WorkerThreads=128

    and see if those help - you do need a full service re-start to get them to take effect

     

  • This contentrouter file is nearly empty, is it normal? Do i need to add this settings?

     

    /disk/etc/puredisk # vi contentrouter.cfg
    ; @validate [0-9]+
    SessionTimeout=3600

    ;
    ; The path where to create the "dellog" files in which the pathid and fileid
    ; of every segment that is deleted from storage are logged.
    ; No log files are written if the value is empty or the setting is not present.
    ;
    ; @reload
    DelLogPath=/disk/history

    ;
    ; The dellog files are rotated. The current file has extension 0 ("dellog.0"),
    ; the older files are renamed to .1, .2 etc. The older the file, the higher
    ; the number. Minimum allowed value is 1, maximum is 100, default is 10.
    ;
    ; @reload
    ; @validate [0-9]+
    DelLogMaxNum=10

    ;
    ; The maximum size of each dellog file. Default value is 10 Mib.
    ;
    ; Possible suffixes are:
    ; B, KiB (1024), MiB, GiB, TiB, PiB, EiB, KB (1000), MB, GB, TB, PB, EB.
    ;
    ; @reload
    DelLogMaxSize=10Mib

    ; Enable performance statistics logging by the Storage Manager. The statistics
    ; will be logged in the log file of the Storage Manager, usually
    ; "/Storage/log/spoold/storaged.log". The log lines have loglevel INFO and
    ; start with the word 'Performance'. The default value is false.
    PerformanceStats=false

    ; When there are too many references on a DO record in the content router
    ; database, then we assume that this DO record will never be removed from the
    ; content router anymore.  We exploit this by converting all the existing
    ; references to an "eternal reference" that is not removed ever from the record.
    ; We convert references to an eternal reference when the total size of the
    ; existing references on the record exceeds the EternalSizeThreshold.  The
    ; default threshold value is 128 KiB.  Warning: it is very DANGEROUS to lower this
    ; threshold without thorough analysis.  If it is too low, all data in the content
    ; router will irrevocably be stored permanently and data removal will be blocked.
    ; The minimum value is 16 KiB.
    ;
    ; @reload
    EternalSizeThreshold=128Kib

    ; When there are too many references on a DO record in the content router
    ; database, then we assume that this DO record will never be removed from the
    ; content router anymore.  We exploit this by converting all the existing
    ; references to an "eternal reference" that is not removed ever from the record.
    ; We convert references to an eternal reference when the total number of the
    ; existing references on the record exceeds the EternalCountThreshold.  The
    ; default threshold value is 8192.  Warning: it is very DANGEROUS to lower this
    ; threshold without thorough analysis.  If it is too low, all data in the content
    ; router will irrevocably be stored permanently and data removal will be blocked.
    ; The minimum value is 1000 references.
    "contentrouter.cfg" 1258L, 43766C  

  • That looks very small - maybe just how vi displays it?

    Does it look the same when using more?