cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

NetBackup 7.0.1 pause then resume job causes failure and restart from beggining

daniel_watson
Not applicable
Partner

I have one large fileserver with realy slow disk that can only aprox 15MB/s due to cheap SAN gear. The backup starts friday @ 4pm and usualy finishes sometime Sunday.

The NetBackup 7.0.1 server has a script to pause all backups Sunday 5:00AM, then resume all paused backups Sunday 10:00AM.

pause: bpdbjobs -suspend type=all

resume: bpdbjobs -resume type=all

The policy has:

  • type: MS-Windows
  • take checkpoints every 15min
  • compression
  • collec true image restore information
  • with move detection
  • allow multiple data streams

Everytime the filserver runs into the pause window one of the large drives is not complete and the jobs gets paused:

22-Jul-11 4:03:32 PM - requesting resource bcas46-hcart2-robot-tld-0
22-Jul-11 4:03:32 PM - requesting resource bcas46.NBU_CLIENT.MAXJOBS.XXXXXXXXXXXXX
22-Jul-11 4:03:32 PM - requesting resource bcas46.NBU_POLICY.MAXJOBS.XXXXXXXX
22-Jul-11 4:03:32 PM - granted resource bcas46.NBU_CLIENT.MAXJOBS.XXXXXXXXXXXXXX
22-Jul-11 4:03:32 PM - granted resource bcas46.NBU_POLICY.MAXJOBS.XXXXXXXXX
22-Jul-11 4:03:32 PM - granted resource 000044
22-Jul-11 4:03:32 PM - granted resource HP.ULTRIUM5-SCSI.000
22-Jul-11 4:03:32 PM - granted resource bcas46-hcart2-robot-tld-0
22-Jul-11 4:03:39 PM - estimated 1956191409 Kbytes needed
22-Jul-11 4:04:49 PM - mounting 000044
22-Jul-11 4:05:35 PM - mounted; mount time: 00:00:46
22-Jul-11 4:05:35 PM - positioning 000044 to file 559
22-Jul-11 4:08:43 PM - positioned 000044; position time: 00:03:08
22-Jul-11 4:18:59 PM - connecting
22-Jul-11 4:19:21 PM - connected; connect time: 00:00:22
22-Jul-11 4:19:21 PM - begin writing
22-Jul-11 10:10:55 PM - current media 000044 complete, requesting next resource Any
22-Jul-11 10:10:55 PM - current media -- complete, awaiting next media Any Reason: Drives are in use, Media Server: bcas46,
     Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
     Volume Pool: NetBackup, Storage Unit: bcas46-hcart2-robot-tld-0, Drive Scan Host: N/A
    
22-Jul-11 10:13:21 PM - granted resource 000031
22-Jul-11 10:13:21 PM - granted resource HP.ULTRIUM5-SCSI.000
22-Jul-11 10:13:21 PM - granted resource bcas46-hcart2-robot-tld-0
22-Jul-11 10:13:22 PM - mounting 000031
22-Jul-11 10:14:24 PM - mounted; mount time: 00:01:02
22-Jul-11 10:14:31 PM - positioning 000031 to file 1
22-Jul-11 10:14:48 PM - positioned 000031; position time: 00:00:17
22-Jul-11 10:14:48 PM - begin writing
23-Jul-11 8:16:36 AM - current media 000031 complete, requesting next resource Any
23-Jul-11 8:16:36 AM - current media -- complete, awaiting next media Any Reason: Drives are in use, Media Server: bcas46,
     Robot Number: 0, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
     Volume Pool: NetBackup, Storage Unit: bcas46-hcart2-robot-tld-0, Drive Scan Host: N/A
    
23-Jul-11 8:18:57 AM - granted resource 000012
23-Jul-11 8:18:57 AM - granted resource HP.ULTRIUM5-SCSI.000
23-Jul-11 8:18:57 AM - granted resource bcas46-hcart2-robot-tld-0
23-Jul-11 8:18:57 AM - mounting 000012
23-Jul-11 8:20:00 AM - mounted; mount time: 00:01:03
23-Jul-11 8:20:09 AM - positioning 000012 to file 1
23-Jul-11 8:20:26 AM - positioned 000012; position time: 00:00:17
23-Jul-11 8:20:26 AM - begin writing
23-Jul-11 6:34:53 PM - current media 000012 complete, requesting next resource Any
23-Jul-11 6:34:53 PM - granted resource 000022
23-Jul-11 6:34:53 PM - granted resource HP.ULTRIUM5-SCSI.001
23-Jul-11 6:34:53 PM - granted resource bcas46-hcart2-robot-tld-0
23-Jul-11 6:34:54 PM - mounting 000022
23-Jul-11 6:35:56 PM - mounted; mount time: 00:01:02
23-Jul-11 6:36:05 PM - positioning 000022 to file 1
23-Jul-11 6:36:23 PM - positioned 000022; position time: 00:00:18
23-Jul-11 6:36:23 PM - begin writing
24-Jul-11 5:00:20 AM - end writing; write time: 10:23:57
suspend requested by administrator(157)

But then the job is marked as Failed and has an end time of Sunday 10:00:10AM. And I cannot find the parent job for this backup, it's probably replaced by the next parent job for this machine.

A parent job marked as attempt #2 starts at 10:00:01AM, ends at 10:00:10AM, error 50 (client process aborted)

24-Jul-11 10:00:00 AM - requesting resource bcas46-hcart2-robot-tld-0
24-Jul-11 10:00:00 AM - requesting resource bcas46.NBU_CLIENT.MAXJOBS.XXXXXXXXXX
24-Jul-11 10:00:00 AM - requesting resource bcas46.NBU_POLICY.MAXJOBS.XXXXXXX
24-Jul-11 10:00:01 AM - granted resource bcas46.NBU_CLIENT.MAXJOBS.XXXXXXXXXXXXX
24-Jul-11 10:00:01 AM - granted resource bcas46.NBU_POLICY.MAXJOBS.XXXXXXXXXXXX
24-Jul-11 10:00:01 AM - granted resource 000022
24-Jul-11 10:00:01 AM - granted resource HP.ULTRIUM5-SCSI.000
24-Jul-11 10:00:01 AM - granted resource bcas46-hcart2-robot-tld-0
24-Jul-11 10:00:01 AM - estimated 40793395 Kbytes needed
24-Jul-11 10:00:01 AM - begin Parent Job
24-Jul-11 10:00:01 AM - begin Snapshot, Start Notify Script
24-Jul-11 10:00:01 AM - started process RUNCMD (8440)
24-Jul-11 10:00:01 AM - ended process 0 (8440)
Status 0
24-Jul-11 10:00:01 AM - end Snapshot, Start Notify Script; elapsed time: 00:00:00
24-Jul-11 10:00:01 AM - begin Snapshot, Step By Condition
Status 0
24-Jul-11 10:00:01 AM - end Snapshot, Step By Condition; elapsed time: 00:00:00
24-Jul-11 10:00:01 AM - begin Snapshot, Policy Execution Manager Preprocessed
Status 50
24-Jul-11 10:00:01 AM - end Snapshot, Policy Execution Manager Preprocessed; elapsed time: 00:00:00
24-Jul-11 10:00:01 AM - begin Snapshot, Stop On Error
Status 0
24-Jul-11 10:00:01 AM - end Snapshot, Stop On Error; elapsed time: 00:00:00
24-Jul-11 10:00:01 AM - begin Snapshot, Delete Snapshot On Exit
Status 0
24-Jul-11 10:00:01 AM - end Snapshot, Delete Snapshot On Exit; elapsed time: 00:00:00
24-Jul-11 10:00:01 AM - begin Snapshot, End Notify Script
24-Jul-11 10:00:01 AM - started process RUNCMD (8204)
24-Jul-11 10:00:01 AM - ended process 0 (8204)
Status 0
24-Jul-11 10:00:01 AM - end Snapshot, End Notify Script; elapsed time: 00:00:00
Status 50
24-Jul-11 10:00:01 AM - end Parent Job; elapsed time: 00:00:00
client process aborted(50)

Then a third parent job starts at 10:01:02AM, from 0MB, taking annother 30+ hours to complete.

I found mention that it may be the default value from the Master Server properties->cleanup->Move backup job from incomplete state to done state: 3hrs, so change the value to 6hrs.

 

I'd very much like to complete this backup during the weekend, and the pause windows is non-optional. Any ideas or sugestions would be greatly appreaciated.

-Dan

0 REPLIES 0