β07-25-2011 12:53 PM
I have one large fileserver with realy slow disk that can only aprox 15MB/s due to cheap SAN gear. The backup starts friday @ 4pm and usualy finishes sometime Sunday.
The NetBackup 7.0.1 server has a script to pause all backups Sunday 5:00AM, then resume all paused backups Sunday 10:00AM.
pause: bpdbjobs -suspend type=all
resume: bpdbjobs -resume type=all
The policy has:
Everytime the filserver runs into the pause window one of the large drives is not complete and the jobs gets paused:
22-Jul-11 4:03:32 PM - requesting resource bcas46-hcart2-robot-tld-0 22-Jul-11 4:03:32 PM - requesting resource bcas46.NBU_CLIENT.MAXJOBS.XXXXXXXXXXXXX 22-Jul-11 4:03:32 PM - requesting resource bcas46.NBU_POLICY.MAXJOBS.XXXXXXXX 22-Jul-11 4:03:32 PM - granted resource bcas46.NBU_CLIENT.MAXJOBS.XXXXXXXXXXXXXX 22-Jul-11 4:03:32 PM - granted resource bcas46.NBU_POLICY.MAXJOBS.XXXXXXXXX 22-Jul-11 4:03:32 PM - granted resource 000044 22-Jul-11 4:03:32 PM - granted resource HP.ULTRIUM5-SCSI.000 22-Jul-11 4:03:32 PM - granted resource bcas46-hcart2-robot-tld-0 22-Jul-11 4:03:39 PM - estimated 1956191409 Kbytes needed 22-Jul-11 4:04:49 PM - mounting 000044 22-Jul-11 4:05:35 PM - mounted; mount time: 00:00:46 22-Jul-11 4:05:35 PM - positioning 000044 to file 559 22-Jul-11 4:08:43 PM - positioned 000044; position time: 00:03:08 22-Jul-11 4:18:59 PM - connecting 22-Jul-11 4:19:21 PM - connected; connect time: 00:00:22 22-Jul-11 4:19:21 PM - begin writing 22-Jul-11 10:10:55 PM - current media 000044 complete, requesting next resource Any 22-Jul-11 10:10:55 PM - current media -- complete, awaiting next media Any Reason: Drives are in use, Media Server: bcas46, Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A, Volume Pool: NetBackup, Storage Unit: bcas46-hcart2-robot-tld-0, Drive Scan Host: N/A 22-Jul-11 10:13:21 PM - granted resource 000031 22-Jul-11 10:13:21 PM - granted resource HP.ULTRIUM5-SCSI.000 22-Jul-11 10:13:21 PM - granted resource bcas46-hcart2-robot-tld-0 22-Jul-11 10:13:22 PM - mounting 000031 22-Jul-11 10:14:24 PM - mounted; mount time: 00:01:02 22-Jul-11 10:14:31 PM - positioning 000031 to file 1 22-Jul-11 10:14:48 PM - positioned 000031; position time: 00:00:17 22-Jul-11 10:14:48 PM - begin writing 23-Jul-11 8:16:36 AM - current media 000031 complete, requesting next resource Any 23-Jul-11 8:16:36 AM - current media -- complete, awaiting next media Any Reason: Drives are in use, Media Server: bcas46, Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A, Volume Pool: NetBackup, Storage Unit: bcas46-hcart2-robot-tld-0, Drive Scan Host: N/A 23-Jul-11 8:18:57 AM - granted resource 000012 23-Jul-11 8:18:57 AM - granted resource HP.ULTRIUM5-SCSI.000 23-Jul-11 8:18:57 AM - granted resource bcas46-hcart2-robot-tld-0 23-Jul-11 8:18:57 AM - mounting 000012 23-Jul-11 8:20:00 AM - mounted; mount time: 00:01:03 23-Jul-11 8:20:09 AM - positioning 000012 to file 1 23-Jul-11 8:20:26 AM - positioned 000012; position time: 00:00:17 23-Jul-11 8:20:26 AM - begin writing 23-Jul-11 6:34:53 PM - current media 000012 complete, requesting next resource Any 23-Jul-11 6:34:53 PM - granted resource 000022 23-Jul-11 6:34:53 PM - granted resource HP.ULTRIUM5-SCSI.001 23-Jul-11 6:34:53 PM - granted resource bcas46-hcart2-robot-tld-0 23-Jul-11 6:34:54 PM - mounting 000022 23-Jul-11 6:35:56 PM - mounted; mount time: 00:01:02 23-Jul-11 6:36:05 PM - positioning 000022 to file 1 23-Jul-11 6:36:23 PM - positioned 000022; position time: 00:00:18 23-Jul-11 6:36:23 PM - begin writing 24-Jul-11 5:00:20 AM - end writing; write time: 10:23:57 suspend requested by administrator(157) |
But then the job is marked as Failed and has an end time of Sunday 10:00:10AM. And I cannot find the parent job for this backup, it's probably replaced by the next parent job for this machine.
A parent job marked as attempt #2 starts at 10:00:01AM, ends at 10:00:10AM, error 50 (client process aborted)
24-Jul-11 10:00:00 AM - requesting resource bcas46-hcart2-robot-tld-0 24-Jul-11 10:00:00 AM - requesting resource bcas46.NBU_CLIENT.MAXJOBS.XXXXXXXXXX 24-Jul-11 10:00:00 AM - requesting resource bcas46.NBU_POLICY.MAXJOBS.XXXXXXX 24-Jul-11 10:00:01 AM - granted resource bcas46.NBU_CLIENT.MAXJOBS.XXXXXXXXXXXXX 24-Jul-11 10:00:01 AM - granted resource bcas46.NBU_POLICY.MAXJOBS.XXXXXXXXXXXX 24-Jul-11 10:00:01 AM - granted resource 000022 24-Jul-11 10:00:01 AM - granted resource HP.ULTRIUM5-SCSI.000 24-Jul-11 10:00:01 AM - granted resource bcas46-hcart2-robot-tld-0 24-Jul-11 10:00:01 AM - estimated 40793395 Kbytes needed 24-Jul-11 10:00:01 AM - begin Parent Job 24-Jul-11 10:00:01 AM - begin Snapshot, Start Notify Script 24-Jul-11 10:00:01 AM - started process RUNCMD (8440) 24-Jul-11 10:00:01 AM - ended process 0 (8440) Status 0 24-Jul-11 10:00:01 AM - end Snapshot, Start Notify Script; elapsed time: 00:00:00 24-Jul-11 10:00:01 AM - begin Snapshot, Step By Condition Status 0 24-Jul-11 10:00:01 AM - end Snapshot, Step By Condition; elapsed time: 00:00:00 24-Jul-11 10:00:01 AM - begin Snapshot, Policy Execution Manager Preprocessed Status 50 24-Jul-11 10:00:01 AM - end Snapshot, Policy Execution Manager Preprocessed; elapsed time: 00:00:00 24-Jul-11 10:00:01 AM - begin Snapshot, Stop On Error Status 0 24-Jul-11 10:00:01 AM - end Snapshot, Stop On Error; elapsed time: 00:00:00 24-Jul-11 10:00:01 AM - begin Snapshot, Delete Snapshot On Exit Status 0 24-Jul-11 10:00:01 AM - end Snapshot, Delete Snapshot On Exit; elapsed time: 00:00:00 24-Jul-11 10:00:01 AM - begin Snapshot, End Notify Script 24-Jul-11 10:00:01 AM - started process RUNCMD (8204) 24-Jul-11 10:00:01 AM - ended process 0 (8204) Status 0 24-Jul-11 10:00:01 AM - end Snapshot, End Notify Script; elapsed time: 00:00:00 Status 50 24-Jul-11 10:00:01 AM - end Parent Job; elapsed time: 00:00:00 client process aborted(50) |
Then a third parent job starts at 10:01:02AM, from 0MB, taking annother 30+ hours to complete.
I found mention that it may be the default value from the Master Server properties->cleanup->Move backup job from incomplete state to done state: 3hrs, so change the value to 6hrs.
I'd very much like to complete this backup during the weekend, and the pause windows is non-optional. Any ideas or sugestions would be greatly appreaciated.
-Dan