cancel
Showing results for 
Search instead for 
Did you mean: 

Automatically Kill Job if Time Exceeded?

Jason_Brooks
Level 4
I've ran into an issue where certain clients, backing up to tape, took an excessively long time and caused everything else to blow up. My config is a Win2K3 Master, 1 Win2K Media backing up to a fibre attached ADIC Scalar i2K. We only have 2 LTO3 drives, so making sure that a client doesn't hog a drive is essential.

What I'm thinking of is killing a job if it exceeds x hours, where I can define x. I've done some digging through the forums, but didn't turn up anything.

Thanks,
Jason
11 REPLIES 11

zippy
Level 6
Jason,

There are ways to tell you (the admin) if a job is running a long time, but I think you should try to determine why the "job" is taking so long.

The ADIC Scalar i2K is very very fast if configured correctly.

If you are backing up over the than then check the Clients and master NIC Cards; check the swtich configurations master 1000 FD, client 1000FD switch 1000FD. Now check all your HBA's. Turn logging on you master. Check your logs.


JD

Stumpr2
Level 6
Jason,

I think I see your problem.
You want to kill or suspend a job that has a problem with throughput/slowness/etc in order to allow the tape resources to be used for cooperating clients during a time when you would rather be on the beach soaking up the sun instead of babysitting backups.
Is that correct?

And then you want to troubleshoot and fix the problem backup at a more convienent time like normal business hours?

Bob

Jason_Brooks
Level 4
That's the idea. Our weekend backup kicks off at 11PM on Friday, and has most of the weekend to run. I don't typically log in to look at things, and don't want to be in the habit. It appears to have been one or two boxes that killed everything else, so in such an event, I'd like to either cancel them or pause them programatically, if possible.


Thanks,
Jason

zippy
Level 6
5 points for bob and jim?

Stumpr2
Level 6
Are you concerned with specific boxes or do you want to generically kill ANY job that has been running more than the usual time. How big is your environment? how many clients/policies/schedules? Is it small enought to maybe use the bp*notify scripts?

Jason_Brooks
Level 4
I'm backing up about 50-60 servers, both Windows and Linux. I have about 24 policies and close to the same amount of schedules. The notify scripts may work, I'm getting ready to look them over.

My biggest concern would be to kill any excessivly long job especially on weekends or when we are closed. But weekends are the first foray into this subject.

Thanks

Stumpr2
Level 6
I don't do windows but on the Linux you could have bpstart touch a file and bpend remove the file. Then you would need a script to check if the touch file exists longer than a set threshold.

G_S
Level 4
If you just want to kill all jobs before production hours start you can write a script with an AT job that runs at say 7:45AM Monday morning. In the script you can just do a "bpdbjobs -kill" and it will cancel all active jobs.

Stumpr2
Level 6
Jason,
Have you gotten anywhere on this problem? Have you investigated using the touch files I talked about?
BS

Jason_Brooks
Level 4
Bob,
I've thought about it, but have't been able to work on it yet. One issue I have is that Windows Services for Unix (by MS) always leave a core dump. Scripts run, but there's always cleanup. That, so far, sounds to be the neatest way. But also, one question that's been bouncing around in the cruft of my brain: would this be by policy? That's the way I envision it, but is it also doable via a single host?

Jason

Stumpr2
Level 6
There are different scripts for various finctions. You can read about the notify scripts in the admin guide. There is a whole chapter on them. Here is also a short description of them in a technote.

Notification scripts and their usage
http://support.veritas.com/docs/274059