Forum Discussion

FlyMountain's avatar
11 years ago

Reduce restore windows

We have bunch of databases whihc reside in a single server. Currently we have a decidated policy to back up around 2TB data, it will need 8 hours to back up and around 20 hours to restoe. We will need to tune the backup and restore, specially reduce the restore windows.

There are some ideas currently we are considering. If you have any other ideas, please let me know.

1. split the backup policy to two for using two LTO5 drives for backup and restore

2. use decidated volume pool to aviod mixing data with other backup policies with same retention (library are shared for multiple polices)

3. disable multiplexing to reduce restore time

4. exclude out all file system and log files form database policies

.......

thanks in advance

  • After some testing, we find a way to satify our request.

    1. turn off MPLX and multi-streams (for better restore performance)

    2. larger the fragment size (for better backup/restore performance)

    3. direct backup to two storage units which point to same library to force backup to use two tapes

    4. separate all unnessary data to different policy and use different storage unit

    5. split backup to two policies, each back up half data

    the changes made big diffeence (backup time reduced from 9 hours to 3, restore time from 15 hours to 3)

  • before giving any suggestions we would like to understand your enviornment too

    1) what is the netbackup versions that are in master /media and client

    2) how many tape drives can be assigned to take the backup of these data

    3) does the client and media server both are different?

    4) if not does clinet can conver to as Media server (may be SAN media server)? 

    etc..

  • If you are Oracle RMAN get the DBAs to tune the backup - they can internally multiplex a backup internal [ filesperset].

    4) above a reasonable idea - convert it to SAN media server as a last resort.  It makes NBU administration more painful. 

    I assumed you done the tuning from one your previous posts here  There also some restore tuning required as well.

    Tests you can do:

    1. Run a backup to /dev/null and compare it to speed to tape backup.
    2. Run a backup to tape and restore to /dev/null.  Oracle DBAs can do this using "validate" utility.  Other databses should have something similar.
    3. Restore to disk that has been formatted prior to the restore.  Sometimes its quicker to retore to clean disk where the restore is streamed to disk.  If disk filesystem conatins a lot other data the restore target wastes time to locate block space for the restore data.  

    What is the restore target :

    1. Enterprise disk array,  NetApp etc?
    2. Direct attached disks, What is the interfce SATA?  RAID type?
    3. Is the disk ever defraged?
    4. Is the disk tuned for anything specific?  I have direct attached disk tuned for a database which had serious impact on backup and recovering.
  • Nagalla, thanks for your reply.

    1. We are all on 7.5.0.6

    2. We are using one tape drive from a shared library, plan to add another one or use one decidated libary with total two drives.

    3. client and media server on same physical machie, all database files mounted on this media server. EMC SRDF tech has been used, after sync the database, we split the sync and back up the data from mount point

    4. it is hostial Cache database, multiple databases resided

     

  • thanks for your reply.

    It is EPIC cache database, EMC VMAX array has been used (no FAST) on IBM-UX machines. All databases will be synced then split to a media server (IBM-UX), then we back it up from media server mount point. Same hardware resource will be used for restore.

    Basically some big flat files to be backed up and restored back to.

  • What circumstances are you planning to deal with here? What I mean is ...expectation management? From your description it sounds like this is business continuity restricted to database corruption only: you state the hardware will be the same: you mean identically the same I take it.

    2tb of DB : thats not a lot of data really and if your DBs are like mine (Oracle) eg a small number of large files then the LTO can hit peak speed so long as you can feed it/them. Thats 140M/s RAW, twice that typically for average compression.

     

    2tb = 2000000 Mbytes so 2000000M/140M/s= 233minutes so lets say 4hrs, possibly as little as 2hrs with compression and hig bandwidth.

     So if you can feed just one drive at that speed, then very achievable by the drive, rather depends on if the server/storage can meet the same demand, but tape not likely to be the bottleneck and as is its own media server then effectively the tape is directly attached to the data. Tune your blocks to tape and it'll fly.

    You can test this with a synthetic test as NB can rapidly generate data for a backup ie exercise the backup without going near your intended dta and thus showing whats possible via the cpu/tape.

    Restore is a different proposal as you want to avoid contention. And you can deal with that to a degree with 2 drives. I suspect with 2 drives and careful segregation of policies to use different pools ie different media (balanced data volumes per pool) then your backups will be well within the window and your restores wont be affected by mpx at all.

    If you struggle to achieve then you'll have to go MPX route or analyse the data layout eg are all dbs across the same spindles? If not then you will be able to select data from different spindle sets to be backed up at the same time so spreading reads across the maximum number of spindles.

    Get back to the forum with your progress please.

    Jim

  • After some testing, we find a way to satify our request.

    1. turn off MPLX and multi-streams (for better restore performance)

    2. larger the fragment size (for better backup/restore performance)

    3. direct backup to two storage units which point to same library to force backup to use two tapes

    4. separate all unnessary data to different policy and use different storage unit

    5. split backup to two policies, each back up half data

    the changes made big diffeence (backup time reduced from 9 hours to 3, restore time from 15 hours to 3)