NBU 7.0.1 - Performance Monitor on Windows issues

TROE · ‎09-04-2010

We are running NetBackup 7.0.1 on Windows Server 2008 x64.

Yesterday I decided that I wanted to set up nightly performance monitor captures on my master server (which also serves as a media server) so that I could keep an eye on network and tape drive utilization. That would let me make tweaks to multiplexing settings, client allocation, etc. and get feedback on whether I was properly streaming my tape drives.

I discovered the "NetBackup Disk/Tape" counters and was happy - "Disk/Tape Write Bytes/Sec" seemed to be the perfect counter. However, the docs (Admin Guide for Window II, 340106) indicated the following:

The System Monitor displays object instances when NetBackup reads or writes from the disk or tape. The read or write counters are updated depending on the type of NetBackup operation performed. The object instance is removed from the list once the NetBackup operation completes.

OK. So I kept starting and stopping jobs until I managed to get it simultaneously display all 4 tape drives in our library (they show up as IBM.ULT3580-TD3.000, IBM.ULT3580-TD3.001, etc.). So far, so good. Then I went into Data Collector Sets (this is still in "Reliability and Performance Monitor") and created a new Data Collector Set under User Defined. I adjusted the directory, created a schedule to have it start at 9 PM every night and to run for 4 hours, etc. I created a single Data Collector for it that captures "Disk/Tape Write Bytes/Sec" on all four IBM.ULT3580-TD3.* objects as well as "Bytes Received/Sec" on each of the four GigE network ports. So far, so good.

I specified a sample interval of 1 second because testing showed that the NetBackup counters are unreliable under load, but if you grab every second, you only miss 1 in 20 or so. if you grab every 15 seconds, a single miss during that 15 second window will cause a drop out, so the reliability falls dramatically and you are left with spotty data. Adjusted the file name to include a time stamp. Even went into Data Manager for the Data Collector Set to define an action for purging old data after 30 days so it won't fill up my disk.

Feeling good. Then when I looked at the captured data this morning, there's nothing for the tape drives. Doh! Took me about 5 seconds to figure out what happened. The first backup started at 9 PM, but didn't start writing to the tape drive until a minute or so later.. The Data Collector Set triggered at 9 PM sharp and when it went to start logging there were no "NetBackup Disk/Tape" objects, so it didn't connect to them. Sure, they got created later by NetBackup when it actually started writing, but the Data Collector Set didn't notice that, so it didn't collect any data.

Great. This is a royal pain. The only true hack I can think of for now is to work my fingers to the bone adding a gazillion Start entries to the Schedule (one every 5 minutes, say, from 9 PM until 12:55 AM, or for whatever period) and then setting the Stop Condition to stop the data collection after 5 minutes. If I set the Data Collector to Append to the log, I should be able to avoid having a bunch of separate logs. But this is way ugly! It doesn't work to wait until 9:05 PM to start the task, because there might only be one drive in use at that time, and the other drives won't come on line until later. Could do it five minutes past the hour if all of your drive status changes happen on the hour, but if one of them happens on the half-hour, you have to go to half hour intervals because the stop condition is a single elapsed time. Argh!

Has anyone else messed with data collection? Do I have to write my own collector support to get what I want? Another approach would be to develop a service to handle scheduling the Data Collector Set. The service would enumerate the "NetBackup Disk/Tape" objects periodically (every 15 seconds or so) and automatically restart the Data Collector Set whenever it saw a change. The latter might actually be useful, because it could run 24x7 and only leave that Data Collector Set running when it sees "NetBackup Disk/Tape" objects. That way I would have logging everytime something was actually happening, but I wouldn't be logging the rest of the time.

Still, this strikes me as a pain.

Thoughts?

TROE · ‎09-04-2010

I'm going to try the following and see how things work tonight.

On the "Stop Condition" tab for the Data Collector Set properties, I'm setting the following.

Check "Overall duration"
Overall duration: 4 Hours
Check "When a limit is reached, restart the data collector set"
Check "Duration"
Duration: 5 Minutes

Other settings that might be useful . . .

"Directory" tab for the Data Collector Set properties:

Subdirectory name format: <blank> (this prevents it from creating separate directories for each run)

"File" tab for the Data Collector properties:

Log file name: NetTape
File name format: yyyyMMdd
Check "Append"

I'll report on how this works after my next scheduled backup run.

Hopefully I can get some good logging. To tune, one must have data with which to direct that tuning (and to be able to identify potential troublespots). I'd rather not have to spend the rest of my nights monitoring and watching things.

TROE · ‎09-05-2010

Looks like the columns in the binary log format are not allowed to change within the file. So the first batch of data restricts what can be logged by subsequent data collector runs. So I ended up unchecking "Append" on the File tab in Data Collector properties and setting the File name format to yyyyMMddHHmm. I also discovered that when you go to set the Source in Performance Monitor to log files, you can only select one log file at a time. So to add in an hour of logging if you break the logs into 5 minute chunks requires 36 single-clicks and 24 double-clicks (more if you lose track of which file you are on). So I decided to go with 15 minute chunks aligned 5 minutes after the hour, which should capture fairly well if I align all backups to the 15 minute boundaries (I'll miss the first few minutes, but they should be safely underway by the time I start capturing).

Ugly, ugly, ugly. It would be nice if there were a way to pre-enumerate performance counters in NetBackup.

VOX

NBU 7.0.1 - Performance Monitor on Windows issues