08-20-2008 03:54 AM
Simple, stupid question just to keep you on your toes:
Any reason why a job suspended thru' the Activity Monitor should then literally fail after an undetermined period of time (with Status 157) & restart?
Job initially just sits there as expected in a suspended state. It then fails (big red x) & restarts as a different job id & commences backing up!!!!
Obviously, if I suspend a job I've done it for a reason & therefore would not expect, nor want, NB to suddenly think "OOhh, I know, why don't I start this job again, he must've forgotten all about it "
Can't find any timeouts anywhere for length of time job is suspended - maybe it's under the 'lets annoy the operator tab'
NB 6.5.1
Solaris 9 Master/Media
Solved! Go to Solution.
09-26-2008 05:48 AM
Well that's confirmed it! (But not sure if it's a FEATURE or a BUG)
@Andy Welburn wrote:All I can put it down to is the "Clean Up" option "Move backup job from incomplete state to done state". This was set to the default 3 hours and the suspended job 'failed' after 3hours 1minute & 0seconds.
Anyway, have changed this setting to 6 hours so we'll see next time (or sooner if that setting causes me more problems in the meantime).
A job was suspended this morning & 6 hours later it 'failed' & restarted automatically !!!!
(Actually I suspended a child process in a multi-streamed job - for info yesterday it wasn't a multi-streamed job. The other children & parent processes continued running. When all the other child processes finished the parent process then went into a suspended state & then nothing. I presume at this stage I would've had to have waited a further 6 hours to elapse before anything happened. Being impatient, I cancelled the suspended parent process at which point new parent & child processes started.)
So BEWARE - if you suspend a job, check the time in the Master servers "Clean Up" option "Move backup job from incomplete state to done state" as your job will restart after this period has elapsed.
08-20-2008 11:43 AM
08-21-2008 12:02 AM
Thanks for the response Omar.
It was more a case of why a suspended job should suddenly 'decide' that it didn't want to be suspended anymore. Resources were available I just didn't want it to use them at that time!
As far as 6.5.2a is concerned - I think I might wait for 6.5.3 or 6.5.4 (I may have lost the responsibility for NB by then )
09-25-2008 06:51 AM
Just as an update - we had this happen again where I had to suspend a backup (a colleague wanted to copy the contents of the directory that was being backed up & to prevent disk I/O contention I suspended the backup) & several hours later the job went from 'Suspended' to 'Failed' (157) & then restarted as a different job id.
I had to suspend this new job for a few more hours before it could be resumed & also investigated what setting could be determining this action.
All I can put it down to is the "Clean Up" option "Move backup job from incomplete state to done state". This was set to the default 3 hours and the suspended job 'failed' after 3hours 1minute & 0seconds. Why a suspended job should be deemed to be incomplete is beyond me (I know it IS literally, but you know what I mean ...) it should be a separate entity. It's like cancelleing a job & then NetBackup thinking - "Hang on, I haven't done that yet, let's kick it off again" (oh yeah, it does that as well!!)
Anyway, have changed this setting to 6 hours so we'll see next time (or sooner if that setting causes me more problems in the meantime).
09-25-2008 09:11 AM
09-26-2008 05:48 AM
Well that's confirmed it! (But not sure if it's a FEATURE or a BUG)
@Andy Welburn wrote:All I can put it down to is the "Clean Up" option "Move backup job from incomplete state to done state". This was set to the default 3 hours and the suspended job 'failed' after 3hours 1minute & 0seconds.
Anyway, have changed this setting to 6 hours so we'll see next time (or sooner if that setting causes me more problems in the meantime).
A job was suspended this morning & 6 hours later it 'failed' & restarted automatically !!!!
(Actually I suspended a child process in a multi-streamed job - for info yesterday it wasn't a multi-streamed job. The other children & parent processes continued running. When all the other child processes finished the parent process then went into a suspended state & then nothing. I presume at this stage I would've had to have waited a further 6 hours to elapse before anything happened. Being impatient, I cancelled the suspended parent process at which point new parent & child processes started.)
So BEWARE - if you suspend a job, check the time in the Master servers "Clean Up" option "Move backup job from incomplete state to done state" as your job will restart after this period has elapsed.