Backups are getting so tight....

Question

I need a bit of help playing "Tetris" with my backups... with more and more machines to back up in my environment, I'm getting more and more 196-errors (Backup window expired). With multiplexing settings, I know there are only so many concurrent backup streams that I can run at once. Question: What command can I run that would display in real-time, how many Streams are active? Also, I'd love a command that would tell me how many tape drives are handling jobs (how many drives are spinning at any given moment). I would use this to figure out where I have 'lulls' in my schedule, so I can move a policy to that gap, and fill it. This way, I'm keeping all my drives spinning & using my resoureces properly. Thanks!

x2 · Answer

While Genericus has a good solution working for him, Alex-B, you should give us a better picture of your environment first. Not every would go a buy a DD or similar solution because tape drives are getting busy. Even though a disk storage unit like DD over FC would give very good performance, it shouldn't be the first solution to try out.

You need to find out where the bottle neck is. Are the systems writing directly to the tape drives? What speeds are you getting? Is it line speed or less than expected?

If your systems are not able to give you enough data even after multiplexing, why not setup a disk staging unit from where you are sure you will get the required speed.

Above was some general suggestion. Other experienced users can give better advice once we know more about your environment.

PS: I have DD9800 as disk storage solution. SLPs are used to write to tape form some systems. Tapes are LTO7 drives and they just fly.

sclind · Answer

I run this once an hour to count the drives in use (we have two silos):day=$(date "+%Y_%m_%d_%H")# l700 usagecountl700=$(/usr/openv/volmgr/bin/vmoprcmd|grep ULT|grep -e T[0-9]{\5\}|wc -l|awk '{print $1}')# ts3310 usagecountts3310=$(/usr/openv/volmgr/bin/vmoprcmd|grep ULT|grep -e R[0-9]{\5\}|wc -l|awk '{print $1}')echo $day, $countl700, $countts3310 &gt;&gt; /tmp/tape_drive_usage(T and R are the tape prefixes on each silo).So I get output like this:2017_07_11_12, 5, 12017_07_11_13, 5, 12017_07_11_14, 4, 32017_07_11_15, 6, 1I then also run this:day=$(date "+%Y_%m_%d_%H")/usr/openv/netbackup/bin/admincmd/bpdbjobs -report -verbose -most_columns -ignore_parent_jobs | \awk '{FS=",";if ($3 == "1") print $3, $5, $45}' | sort -k 3 &gt; /tmp/active_job_tapes/usr/openv/volmgr/bin/vmoprcmd |grep Yes|awk '{print $1,$4,$5}' | sort -k 3 &gt; /tmp/active_tape_drivesjoin -1 3 -2 3 /tmp/active_job_tapes /tmp/active_tape_drives | awk '{printf "%-8s %-20s %-20s
", $1,$3,$4}' | while read x ; do&nbsp; echo $day, $x &gt;&gt; /tmp/active_jobs_drives_and_tapesdonewhich gives me output like:2017_07_11_14, R20266 ORA_psoltp IBM.ULT3580-TD4.0052017_07_11_14, R20354 ORA_psoltp IBM.ULT3580-TD4.0072017_07_11_14, R20417 STD_DAILY_ALXPDB01 IBM.ULT3580-TD4.0002017_07_11_14, R20417 STD_DAILY_ALXPDB02 IBM.ULT3580-TD4.0002017_07_11_14, R20417 STD_DAILY_AUXPDB07 IBM.ULT3580-TD4.0002017_07_11_14, T21073 SAP_PBW HP.ULTRIUM4-SCSI.0132017_07_11_14, T21073 SAP_PBW HP.ULTRIUM4-SCSI.0132017_07_11_14, T21073 SAP_PBW HP.ULTRIUM4-SCSI.0132017_07_11_14, T21073 SAP_PBW HP.ULTRIUM4-SCSI.0132017_07_11_14, T21235 SAP_DNP HP.ULTRIUM4-SCSI.0042017_07_11_14, T21348 SAP_DCS HP.ULTRIUM4-SCSI.0002017_07_11_14, T21348 SAP_DCS HP.ULTRIUM4-SCSI.0002017_07_11_14, T21348 SAP_DCS HP.ULTRIUM4-SCSI.0002017_07_11_14, T21348 SAP_DCS HP.ULTRIUM4-SCSI.0002017_07_11_15, R20417 STD_DAILY_ALXPDB01 IBM.ULT3580-TD4.0002017_07_11_15, R20417 STD_DAILY_ALXPDB02 IBM.ULT3580-TD4.0002017_07_11_15, T21053 SAP_PBW HP.ULTRIUM4-SCSI.0012017_07_11_15, T21053 SAP_PBW HP.ULTRIUM4-SCSI.0012017_07_11_15, T21053 SAP_PBW HP.ULTRIUM4-SCSI.0012017_07_11_15, T21053 SAP_PBW HP.ULTRIUM4-SCSI.0012017_07_11_15, T21073 SAP_PBW HP.ULTRIUM4-SCSI.0132017_07_11_15, T21073 SAP_PBW HP.ULTRIUM4-SCSI.0132017_07_11_15, T21073 SAP_PBW HP.ULTRIUM4-SCSI.0132017_07_11_15, T21073 SAP_PBW HP.ULTRIUM4-SCSI.0132017_07_11_15, T21235 SAP_DFG HP.ULTRIUM4-SCSI.0082017_07_11_15, T21235 SAP_QNP HP.ULTRIUM4-SCSI.0082017_07_11_15, T22066 SAP_PBW HP.ULTRIUM4-SCSI.0102017_07_11_15, T22066 SAP_PBW HP.ULTRIUM4-SCSI.0102017_07_11_15, T22066 SAP_PBW HP.ULTRIUM4-SCSI.0102017_07_11_15, T22066 SAP_PBW HP.ULTRIUM4-SCSI.010&nbsp;&nbsp;

genericus · Answer

I agree X2, we have to work with what we have, you rare on the money with the concept of determining the bottleneck and finding ways to overcome that.My issue was I was not sending data to tape fast enough, and I had drives locked for long periods writing slow backups. Sending data to disk first eliminated the drive locks, and the disk array sends to tape much faster - in fact, I was using LTO2 drives because we could not send data fast enough - now I am driving LTO5, so everything is faster.Break your process down to steps and figure out how you can improve those steps.

marianne · Answer

Seems&nbsp;Alex-B&nbsp;has not been back since posting a week ago...
Since you are on a Windows master server, the easiest would be to use the GUI to count active jobs - use the sort or filter functions. (I like to sort by job type the by job id&nbsp;(highest first). This will show Queued jobs at the top, then Active jobs.&nbsp;The command equivalent is bpdbjobs. Pipe to findstr to extract Active
For drive usage, use Device Monitor or vmprcmd -d from cmd.
These actions will need someone to be present to monitor all the time....

sdo · Answer

If all jobs use either:- setA - onsite-30days...or...- setB - offsite-2weeks...AND...only one storage unit for all tape drives, then there is a possibility that at certain times of the night, that most or all tape drives are occupied by either setA or setB, and thus not leaving any tape drives free for use by the other set... e.g. setA kicks in and has eight jobs, and you have multi-plexing, and so you get an even spread of two jobs per tape drive, because they multiplex because they have the same retention... and let's imaging four are long running and for are short-running, let's imagine the mix is: &nbsp; s+s d1, s+s d2, l+l d3, l+l d4... and so when the four short running jobs end, two tape drives come free for other jobs and/or other retentions to do their own multi-plexing.So, maybe what you could do is... is have two storage units pointing to the same set of 4 tape drives, but make the storage-unit only use two concurrent tape drives... and then point all of setA jobs to stuA and all of setB jobs to stuB... this was each pool+retention will always have two tape drives available.You might need to up the multi-plexing a bit.

Forum Discussion

Backups are getting so tight....

10 Replies

Related Content

where i can get Exchange Mailbox Analyzer

Steps to getting accredited

getting a list of VMs protected by Backup Exec

Backup getting hang

Aix client backup getting code 14

Recent Discussions

"failed, status 6" error after increasing datastore (LUN) capacity

command: bperror

MS-SharePoint policy restore error (2804) .

How to restore a backup

How to configure RBAC