Re: Is my restore really complete?

Bruce_Clegg · ‎08-14-2017

Is my restore really complete?

We had a HDD fail on a vm host over the weekend. We are unhappy with the way our predessessors set up the host - so I've been asked to restore the data to a different location.

Fortunately the Weekly full backup had finished shortly before the failure - so I still have all the tapes I need to restore on-site. I created a directory on the local media server called 'restorepoint,' set up the restore job to restore to this directory and kicked off the restore job.

After about 35 min the job failed. with Code 5 and code 2840. I'm attaching the results.

From the Task Progress:

Progress log filename : /usr/openv/netbackup/logs/user_ops/root/logs/jbp-60187502739486938786000000119-tyyZyf.log Restore Job Id=52836
Restore started 08/14/2017 13:38:07
14:14:42 (52836.001) chmod 555 /opt/restorepoint/koval to reset permissions.
14:14:42 (52836.001) chmod 555 /opt/restorepoint/koval/boot to reset permissions.
14:14:43 (52836.001) INF - TAR EXITING WITH STATUS = 0
14:14:43 (52836.001) INF - TAR RESTORED 170825 OF 301054 FILES SUCCESSFULLY
14:14:43 (52836.001) INF - TAR KEPT 113707 EXISTING FILES
14:14:43 (52836.001) INF - TAR PARTIALLY RESTORED 0 FILES

14:14:43 (52836.001) Status of restore from copy 1 of image created Sun 13 Aug 2017 06:52:04 AM MDT = the restore failed to recover the requested files

14:14:44 (52836.xxx) INF - Status = Restore error.

From the Activity Monitor:

08/14/2017 13:26:52 - begin Restore
08/14/2017 13:26:52 - Info bprd (pid=460) Found (301,054) files in (1) images for Restore Job ID 52835.xxx
08/14/2017 13:26:54 - Info bprd (pid=460) Estimated time to assemble file list: (0) days, (0) hours, (0) min, (6) sec
08/14/2017 13:26:57 - Info bprd (pid=460) Searched (250,000) files of (301,054) files for Restore Job ID 52835.xxx
08/14/2017 13:26:59 - number of images required: 1
08/14/2017 13:26:59 - media needed: 000058
08/14/2017 13:27:29 - restoring from image koval-backup_1502628724
08/14/2017 13:27:46 - Info bpbrm (pid=526) connect failed STATUS (18) CONNECT_FAILED
08/14/2017 13:27:46 - Info bpbrm (pid=526)    status: FAILED, (10) SOCKET_FAILED; system: (113) No route to host; FROM 0.0.0.0 TO koval-backup 10.254.253.15 bpcd VIA pbx
08/14/2017 13:27:46 - Info bpbrm (pid=526)    status: FAILED, (10) SOCKET_FAILED; system: (113) No route to host; FROM 0.0.0.0 TO koval-backup 10.254.253.15 bpcd VIA vnetd
08/14/2017 13:27:46 - Info bpbrm (pid=526)    status: FAILED, (10) SOCKET_FAILED; system: (113) No route to host; FROM 0.0.0.0 TO koval-backup 10.254.253.15 bpcd
08/14/2017 13:27:46 - Error bpbrm (pid=526) cannot connect to koval-backup, Operation now in progress (115)
08/14/2017 13:27:46 - Info bpbrm (pid=526) telling media manager to start restore on client
08/14/2017 13:27:46 - Info bpbrm (pid=526) spawning a brm child process
08/14/2017 13:27:46 - Info bpbrm (pid=526) child pid: 540
08/14/2017 13:27:46 - connecting
08/14/2017 13:27:46 - requesting resource 000058
08/14/2017 13:27:46 - awaiting resource 000058.
          A pending request has been generated for this resource request.
          Operator action may be required. Pending Action: No action,
          Media ID: 000058, Barcode: 000058, Density: hcart, Access Mode: Read,
          Action Drive Name: N/A, Action Media Server: hyden, Robot Type(Number): 0(N/A),
          Volume Group: ---, Action Acs: N/A, Action Lsm: N/A
08/14/2017 13:27:55 - Info bpbrm (pid=526) connect failed STATUS (18) CONNECT_FAILED
08/14/2017 13:27:55 - Info bpbrm (pid=526)    status: FAILED, (10) SOCKET_FAILED; system: (113) No route to host; FROM 0.0.0.0 TO koval-backup 10.254.253.15 bpcd VIA pbx
08/14/2017 13:27:55 - Info bpbrm (pid=526)    status: FAILED, (10) SOCKET_FAILED; system: (113) No route to host; FROM 0.0.0.0 TO koval-backup 10.254.253.15 bpcd VIA vnetd
08/14/2017 13:27:55 - Info bpbrm (pid=526)    status: FAILED, (10) SOCKET_FAILED; system: (113) No route to host; FROM 0.0.0.0 TO koval-backup 10.254.253.15 bpcd
08/14/2017 13:27:55 - Error bpbrm (pid=540) cannot connect to koval-backup, Operation now in progress (115)
08/14/2017 13:27:55 - Error bpbrm (pid=540) cannot put rename file on koval-backup
08/14/2017 13:27:55 - Info bpbrm (pid=526) sending message to media manager: STOP RESTORE koval-backup_1502628724
08/14/2017 13:27:56 - awaiting resource 000058.
          A pending request has been generated for this resource request.
          Operator action may be required. Pending Action: No action,
          Media ID: 000058, Barcode: 000058, Density: hcart, Access Mode: Read,
          Action Drive Name: N/A, Action Media Server: hyden, Robot Type(Number): 0(N/A),
          Volume Group: ---, Action Acs: N/A, Action Lsm: N/A
08/14/2017 13:27:57 - Info bpbrm (pid=526) media manager for backup id koval-backup_1502628724 exited with status 150: termination requested by administrator
08/14/2017 13:27:57 - restored from image koval-backup_1502628724; restore time: 0:00:28
08/14/2017 13:27:58 - Warning bprd (pid=460) Restore must be resumed prior to first image expiration on Fri 01 Sep 2017 06:52:04 AM MDT
08/14/2017 13:27:58 - end Restore; elapsed time 0:01:06
Restore error (2850)

I looked on the forums and saw someone with a similar error had been asked to check the permissions. - I chmod'd the target directory to 755 and restarted the job. (I did not cancel and restart - I restarted)

From the Task Progress:

Progress log filename : /usr/openv/netbackup/logs/user_ops/root/logs/jbp-60187502739486938786000000119-tyyZyf.log Restore Job Id=52836
Restore started 08/14/2017 13:38:07
15:07:48 (52836.001) chmod 550 /opt/restorepoint/koval/opt/ossec-2.7/active-response to reset permissions.
15:07:48 (52836.001) chmod 550 /opt/restorepoint/koval/opt/ossec-2.7 to reset permissions.
15:07:48 (52836.001) INF - TAR EXITING WITH STATUS = 0
15:07:48 (52836.001) INF - TAR RESTORED 245 OF 2396 FILES SUCCESSFULLY
15:07:48 (52836.001) INF - TAR KEPT 2151 EXISTING FILES
15:07:48 (52836.001) INF - TAR PARTIALLY RESTORED 0 FILES

15:07:49 (52836.001) Status of restore from copy 1 of image created Sun 13 Aug 2017 06:52:04 AM MDT = the requested operation was successfully completed

15:07:49 (52836.xxx) INF - Status = the requested operation was successfully completed.

From the Activity Monitor:

08/14/2017 14:47:15 - begin Restore
08/14/2017 14:47:15 - Info bprd (pid=4805) Found (301,054) files in (1) images for Restore Job ID 52836.xxx
08/14/2017 14:47:17 - Info bprd (pid=4805) Estimated time to assemble file list: (0) days, (0) hours, (0) min, (6) sec
08/14/2017 14:47:21 - Info bprd (pid=4805) Searched (250,000) files of (301,054) files for Restore Job ID 52836.xxx
08/14/2017 14:47:22 - number of images required: 1
08/14/2017 14:47:22 - media needed: 000058
08/14/2017 14:47:53 - restoring from image koval-backup_1502628724
08/14/2017 14:47:53 - Info bpbrm (pid=4818) hyden is the host to restore to
08/14/2017 14:47:53 - Info bpbrm (pid=4818) telling media manager to start restore on client
08/14/2017 14:47:53 - Info bpbrm (pid=4818) spawning a brm child process
08/14/2017 14:47:53 - Info bpbrm (pid=4818) child pid: 4826
08/14/2017 14:47:53 - connecting
08/14/2017 14:47:53 - Info bpbrm (pid=4826) start tar on client
08/14/2017 14:47:53 - Info tar (pid=4841) Restore started.
08/14/2017 14:47:53 - connected; connect time: 0:00:00
08/14/2017 14:47:53 - Info bptm (pid=4823) Waiting for mount of media id 000058 (copy 1) on server hyden.
08/14/2017 14:47:53 - started process bptm (pid=4823)
08/14/2017 14:47:53 - mounting 000058
08/14/2017 14:47:53 - requesting resource 000058
08/14/2017 14:47:53 - granted resource 000058
08/14/2017 14:47:53 - granted resource HP.ULTRIUM4-SCSI.000
08/14/2017 14:47:54 - Info bptm (pid=4823) INF - Waiting for mount of media id 000058 on server hyden for reading.
08/14/2017 14:48:54 - mounted 000058; mount time: 0:01:01
08/14/2017 14:48:54 - Info bptm (pid=4823) 000058
08/14/2017 14:48:54 - Info bptm (pid=4823) INF - Waiting for positioning of media id 000058 on server hyden for reading.
08/14/2017 14:48:54 - positioning 000058 to file 35
08/14/2017 14:53:24 - positioned 000058; position time: 0:04:30
08/14/2017 14:53:24 - begin reading
08/14/2017 15:07:48 - Info tar (pid=4841) done. status: 0
08/14/2017 15:07:48 - Info tar (pid=4841) done. status: 0 - the requested operation was successfully completed
08/14/2017 15:07:49 - end reading; read time: 0:14:25
08/14/2017 15:07:49 - Info bpbrm (pid=4818) media manager for backup id koval-backup_1502628724 exited with status 0: the requested operation was successfully completed
08/14/2017 15:07:49 - restored from image koval-backup_1502628724; restore time: 0:19:56
08/14/2017 15:07:49 - end Restore; elapsed time 0:20:34
the requested operation was successfully completed (0)

Status Zero? Outstanding! But I'm not sure it actually worked. How am I to interpret?

15:07:48 (52836.001) INF - TAR EXITING WITH STATUS = 0
15:07:48 (52836.001) INF - TAR RESTORED 245 OF 2396 FILES SUCCESSFULLY
15:07:48 (52836.001) INF - TAR KEPT 2151 EXISTING FILES
15:07:48 (52836.001) INF - TAR PARTIALLY RESTORED 0 FILES

Did it really only restore 245 files?

The first time through it indicated it was going to restore 300,000+ files.

Amol_Nair · ‎08-14-2017

It seems like the job details that you have shared are for 2 different restore jobs..

The activity monitor job details are for jobid 52835 where the media seems to not be available in the tape library and the BAR gui job details are for job 52836 where the restore ran and restored only 170825 files..

check the progress log file which is mentioned in the 1st line of the progress job details from the BAR gui and that file should give you all the details regarding the list of files that were restored successfully and the files which were skipped along with the reason why they were skipped..

Marianne · ‎08-14-2017

If this was a VMware or Filesystem backup, then you can restore individual files to alternate location.
To see filenames that were restore and list files not restored, you need to check the user_ops log file where you performed the restore.
tar log on the destination client will also help. That is if tar log folder existed before you started.

It seems the restore destination already existed with at least 2151 filenames that already existed. You seem to have chosen not to override existing files.

Handy NetBackup Links

Lowell_Palecek · ‎08-15-2017

What version of NetBackup are you using on the master server?

VOX

Is my restore really complete?