Forum Discussion

Alex-B's avatar
Alex-B
Level 2
8 years ago

Backups are getting so tight....

I need a bit of help playing "Tetris" with my backups...  with more and more machines to back up in my environment, I'm getting more and more 196-errors (Backup window expired).   With multiplexing settings, I know there are only so many concurrent backup streams that I can run at once.   Question:  What command can I run that would display in real-time, how many Streams are active?   Also, I'd love a command that would tell me how many tape drives are handling jobs (how many drives are spinning at any given moment).    I would use this to figure out where I have 'lulls' in my schedule, so I can move a policy to that gap, and fill it.   This way, I'm keeping all my drives spinning & using my resoureces properly.    Thanks!

 

10 Replies

  • I run this once an hour to count the drives in use (we have two silos):

    day=$(date "+%Y_%m_%d_%H")

    # l700 usage
    countl700=$(/usr/openv/volmgr/bin/vmoprcmd|grep ULT|grep -e T[0-9]{\5\}|wc -l|awk '{print $1}')
    # ts3310 usage
    countts3310=$(/usr/openv/volmgr/bin/vmoprcmd|grep ULT|grep -e R[0-9]{\5\}|wc -l|awk '{print $1}')

    echo $day, $countl700, $countts3310 >> /tmp/tape_drive_usage

    (T and R are the tape prefixes on each silo).

    So I get output like this:

    2017_07_11_12, 5, 1
    2017_07_11_13, 5, 1
    2017_07_11_14, 4, 3
    2017_07_11_15, 6, 1

    I then also run this:

    day=$(date "+%Y_%m_%d_%H")

    /usr/openv/netbackup/bin/admincmd/bpdbjobs -report -verbose -most_columns -ignore_parent_jobs | \
    awk '{FS=",";if ($3 == "1") print $3, $5, $45}' | sort -k 3 > /tmp/active_job_tapes

    /usr/openv/volmgr/bin/vmoprcmd |grep Yes|awk '{print $1,$4,$5}' | sort -k 3 > /tmp/active_tape_drives

    join -1 3 -2 3 /tmp/active_job_tapes /tmp/active_tape_drives | awk '{printf "%-8s %-20s %-20s\n", $1,$3,$4}' | while read x ; do
      echo $day, $x >> /tmp/active_jobs_drives_and_tapes
    done

    which gives me output like:

    2017_07_11_14, R20266 ORA_psoltp IBM.ULT3580-TD4.005
    2017_07_11_14, R20354 ORA_psoltp IBM.ULT3580-TD4.007
    2017_07_11_14, R20417 STD_DAILY_ALXPDB01 IBM.ULT3580-TD4.000
    2017_07_11_14, R20417 STD_DAILY_ALXPDB02 IBM.ULT3580-TD4.000
    2017_07_11_14, R20417 STD_DAILY_AUXPDB07 IBM.ULT3580-TD4.000
    2017_07_11_14, T21073 SAP_PBW HP.ULTRIUM4-SCSI.013
    2017_07_11_14, T21073 SAP_PBW HP.ULTRIUM4-SCSI.013
    2017_07_11_14, T21073 SAP_PBW HP.ULTRIUM4-SCSI.013
    2017_07_11_14, T21073 SAP_PBW HP.ULTRIUM4-SCSI.013
    2017_07_11_14, T21235 SAP_DNP HP.ULTRIUM4-SCSI.004
    2017_07_11_14, T21348 SAP_DCS HP.ULTRIUM4-SCSI.000
    2017_07_11_14, T21348 SAP_DCS HP.ULTRIUM4-SCSI.000
    2017_07_11_14, T21348 SAP_DCS HP.ULTRIUM4-SCSI.000
    2017_07_11_14, T21348 SAP_DCS HP.ULTRIUM4-SCSI.000
    2017_07_11_15, R20417 STD_DAILY_ALXPDB01 IBM.ULT3580-TD4.000
    2017_07_11_15, R20417 STD_DAILY_ALXPDB02 IBM.ULT3580-TD4.000
    2017_07_11_15, T21053 SAP_PBW HP.ULTRIUM4-SCSI.001
    2017_07_11_15, T21053 SAP_PBW HP.ULTRIUM4-SCSI.001
    2017_07_11_15, T21053 SAP_PBW HP.ULTRIUM4-SCSI.001
    2017_07_11_15, T21053 SAP_PBW HP.ULTRIUM4-SCSI.001
    2017_07_11_15, T21073 SAP_PBW HP.ULTRIUM4-SCSI.013
    2017_07_11_15, T21073 SAP_PBW HP.ULTRIUM4-SCSI.013
    2017_07_11_15, T21073 SAP_PBW HP.ULTRIUM4-SCSI.013
    2017_07_11_15, T21073 SAP_PBW HP.ULTRIUM4-SCSI.013
    2017_07_11_15, T21235 SAP_DFG HP.ULTRIUM4-SCSI.008
    2017_07_11_15, T21235 SAP_QNP HP.ULTRIUM4-SCSI.008
    2017_07_11_15, T22066 SAP_PBW HP.ULTRIUM4-SCSI.010
    2017_07_11_15, T22066 SAP_PBW HP.ULTRIUM4-SCSI.010
    2017_07_11_15, T22066 SAP_PBW HP.ULTRIUM4-SCSI.010
    2017_07_11_15, T22066 SAP_PBW HP.ULTRIUM4-SCSI.010

     

     

    • Genericus's avatar
      Genericus
      Moderator

      I found that it only took a few slow systems to totally gum up my drives, multiplexing only gets you so far

      I kept running into issues with drives getting assigned to slow backups, I ended up moving to an intermediary disk appliance. And I am quite happy now.

      Although Veritas has some nice features, we selected Data Domain for the throughput. I have three Data Domains with over 300 "drives" on each one, and I beat the daylights out of them. I am sending over 3600 MB/Sec maximum combined ingest to them, and sending out to 18 LTO5 at the same time.

      I had to go with VTL since we use mostly Fiber Channel, I am starting to use 10G and BOOSt/Accelerator - when it works well, it screams. 

       

       

      • X2's avatar
        X2
        Moderator

        While Genericus has a good solution working for him, Alex-B, you should give us a better picture of your environment first. Not every would go a buy a DD or similar solution because tape drives are getting busy. Even though a disk storage unit like DD over FC would give very good performance, it shouldn't be the first solution to try out.

        You need to find out where the bottle neck is. Are the systems writing directly to the tape drives? What speeds are you getting? Is it line speed or less than expected?

        If your systems are not able to give you enough data even after multiplexing, why not setup a disk staging unit from where you are sure you will get the required speed.

        Above was some general suggestion. Other experienced users can give better advice once we know more about your environment.

        PS: I have DD9800 as disk storage solution. SLPs are used to write to tape form some systems. Tapes are LTO7 drives and they just fly.

  • Seems Alex-B has not been back since posting a week ago...

    Since you are on a Windows master server, the easiest would be to use the GUI to count active jobs - use the sort or filter functions. (I like to sort by job type the by job id (highest first). This will show Queued jobs at the top, then Active jobs. 
    The command equivalent is bpdbjobs. Pipe to findstr to extract Active

    For drive usage, use Device Monitor or vmprcmd -d from cmd.

    These actions will need someone to be present to monitor all the time....