Solved: Linux FS restore failure - "Standard Policy restor...

Adnan_F · ‎11-03-2012

Hello All,

I've been trying to restore linux FS but seems to be failing with a Standard policy restore error (no.2800). The backups for the source machine run successfully on a daily basis (on a pure disk storage unit) however the restore just fails. The master and media server for his job are both the same with Netbackup 7.5.0.4 installed on Windows Server 2008 R2.

Any help would be highly appreciated.

Below are the job details:

11/3/2012 10:39:48 AM - begin Restore
11/3/2012 10:40:38 AM - restoring image "Source Machine"_1351767623
11/3/2012 10:40:38 AM - Info bprd(pid=2100) Restoring from copy 1 of image created 11/01/12 14:00:23
11/3/2012 10:40:38 AM - requesting resource @aaaab
11/3/2012 10:40:38 AM - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=Deduplication_Pool;Path=PureDiskVolume;StorageServ...
11/3/2012 10:40:39 AM - Info bpbrm(pid=10724) "Target Machine" is the host to restore to
11/3/2012 10:40:39 AM - Info bpbrm(pid=10724) reading file list from client
11/3/2012 10:40:41 AM - connecting
11/3/2012 10:40:41 AM - Info bpbrm(pid=10724) starting bptm
11/3/2012 10:40:42 AM - Info tar32(pid=5954) Restore started
11/3/2012 10:40:42 AM - connected; connect time: 00:00:01
11/3/2012 10:40:42 AM - Info bptm(pid=10200) start
11/3/2012 10:40:42 AM - started process bptm (10200)
11/3/2012 10:40:42 AM - Info bpdm(pid=10200) reading backup image
11/3/2012 10:40:42 AM - Info bptm(pid=10200) using 30 data buffers
11/3/2012 10:40:42 AM - Info bptm(pid=10200) spawning a child process
11/3/2012 10:40:42 AM - Info bptm(pid=10200) child pid: 7768
11/3/2012 10:40:42 AM - Info bptm(pid=7768) start
11/3/2012 10:40:42 AM - started process bptm (7768)
11/3/2012 10:40:47 AM - begin reading
11/3/2012 10:45:07 AM - Info bptm(pid=10200) waited for empty buffer 271 times, delayed 16479 times
11/3/2012 10:45:07 AM - end reading; read time: 00:04:20
11/3/2012 10:45:08 AM - begin reading
11/3/2012 11:53:00 AM - Error bptm(pid=7768) cannot write data to socket, 10054
11/3/2012 11:53:00 AM - Error bptm(pid=7768) The following files/folders were not restored:
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGA10dn.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGA3Fpy.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGLH2Ds.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGNfZXS.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGSWiPd.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGdaxC6.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGeIYSY.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01...../OFGedFVV.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGitiBq.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGmN1ET.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) more than 10 files were not restored, remaining ones are shown in the progress log.
11/3/2012 11:53:03 AM - Info tar32(pid=5954) done. status 3
11/3/2012 11:53:03 AM - Info tar32(pid=5954) done. status: 183
11/3/2012 12:31:02 PM - Info bptm(pid=10200) EXITING with status 24 <----------
11/3/2012 12:31:02 PM - Info "Master Server"(pid=10200) StorageServer=PureDisk:Master Server; Report=PDDO Stats for (Master Server): read: 1871285 KB, CR received: 1895266 KB, CR received over FC: 0 KB, dedup: 0.0%
11/3/2012 12:31:32 PM - Info tar32(pid=5954) done. status: 24: socket write failed
11/3/2012 12:31:32 PM - Error bpbrm(pid=10724) client restore EXIT STATUS 24: socket write failed
11/3/2012 12:31:33 PM - restored image Source Machine_1351767623 - (socket write failed(24)); restore time 01:50:55
11/3/2012 12:31:36 PM - Warning bprd(pid=2100) Restore must be resumed prior to first image expiration on 11/15/2012 2:00:23 PM
11/3/2012 12:31:36 PM - end Restore; elapsed time: 01:51:48
Standard policy restore error(2800)

Thank you.

Regards,

Adnan

Adnan_F · ‎12-12-2012

Hello All,

Thank you for thre responses. Mark, it is MSDP.

After weeks of troubleshooting with support, the issue was resolved somehow by itself. Below is briefo what happened:

1. Restores were failing continuously with 2800.

2. I did notice that in the job activity there was an image read failure. (Although backups were successful daily). Image verifications through Catalog were failing.

3. Opened ticket with Symantec who worked for couple of weeks.

4. Decided to move the backup to another storage unit and try the restore. IT WORKED.

5. Symantec support did a "crchk" on the entire MDSP unit where backups were successful but restore was failing.

6. During this time other policies on same storage unit started to give MEDIA READ & WRITE ERRORS.

7. A week later Symantec identified corrupted images and asked to expire them and did crchk again.

8. Backups were moved back to orignial MSDP and restores were successful.

I dont believe the crchk helped much it was probably the case of corrupted images expiring. I do take full backups for our linux environment on a daily basis.

Important lesson here is that:

1. You must ideally have an alternate free space MSDP pool (or even a basic pool) of 2 TB (or whatever your most critical server backup size is) to test such backup/restore failures scenario.

2. Verify the integrity of your critical backup images on regular basis through Catalog.

3. This goes without saying but if you're not testing your backups through restore, you're NOT doing your job as a backup administrator.

4. Incase you're restoring to a VM, ensure that you've latest VMware Tools installed on the target VM.

View solution in original post

Mick_Scott · ‎11-05-2012

Have a look at the other error codes in the log ; namley 3 and 183

do you have permissions to write to the destination ?

Is encryption enabled

Marianne · ‎11-05-2012

Info tar32(pid=5954) Restore started

tar32 = Windows destination/target machine?

Are you trying to restore Linux FileSystem to Windows machine?
This is not supported. Some users have managed to restore data across platforms, but this is not supported by NBU.

Handy NetBackup Links

Adnan_F · ‎11-06-2012

Thank you Mick & Marianne for your response.

Mick: yes v have the permissions & encryption is NOT enabled

Marianne: The target machine is a Linux machine as well. It is a virtual machine (VMware), however in order to isolate the issue I did try the restore on physical machine as well which gave similar results.

Initially the restores were giving Standar Policy Error (2800) with Exit status 3,183, 24. But restores done yesterday and earlier today gave the overall error 24 Socket write failed with EXIT status 85 media read error. I'm not sure why the main error code has changed, but changes I made to the linux VM was update the VM tools, changed timeouts to 9000 (from 300)

11/6/2012 9:12:51 AM - Info bpdm(pid=1012) reading backup image
11/6/2012 9:12:51 AM - Info bptm(pid=1012) using 30 data buffers
11/6/2012 9:12:51 AM - Info bptm(pid=1012) spawning a child process
11/6/2012 9:12:51 AM - Info bptm(pid=1012) child pid: 8520
11/6/2012 9:12:51 AM - Info bptm(pid=8520) start
11/6/2012 9:12:51 AM - started process bptm (8520)
11/6/2012 9:12:58 AM - begin reading
11/6/2012 9:13:03 AM - Info bptm(pid=1012) waited for empty buffer 83 times, delayed 104 times
11/6/2012 9:13:03 AM - end reading; read time: 00:00:05
11/6/2012 9:13:03 AM - begin reading
11/6/2012 9:33:20 AM - Critical bptm(pid=1012) sts_read_image failed: error 2060017 system call failed
11/6/2012 9:33:20 AM - Critical bptm(pid=1012) image read failed: error 2060017: system call failed
11/6/2012 9:33:20 AM - Error bptm(pid=1012) cannot read image from disk, Invalid argument
11/6/2012 9:33:20 AM - Info bptm(pid=1012) EXITING with status 85 <----------
11/6/2012 9:33:20 AM - Error bptm(pid=8520) The following files/folders were not restored:
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........../o10684138.out
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........../o10684140.out
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/...........o10684141.out
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/..........o10684145.out
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/........./o10684146.out
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........o10684147.out
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.......o10684152.out
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........o10684156.out
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/........./o10684160.out
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.......o10684161.out
11/6/2012 9:33:20 AM - Error bptm(pid=8520) more than 10 files were not restored, remaining ones are shown in the progress log.
11/6/2012 9:57:20 AM - Info tar32(pid=9844) done. status 3
11/6/2012 9:57:20 AM - Info MasterServer(pid=1012) StorageServer=PureDisk:MasterServer; Report=PDDO Stats for (MasterServer): read: 23196295 KB, CR received: 23228521 KB, CR received over FC: 0 KB, dedup: 0.0%
11/6/2012 9:57:20 AM - Info tar32(pid=9844) done. status: 183
11/6/2012 9:57:21 AM - Info tar32(pid=9844) done. status: 85: media read error
11/6/2012 9:57:21 AM - Error bpbrm(pid=6580) client restore EXIT STATUS 85: media read error
11/6/2012 9:57:23 AM - Warning bprd(pid=6592) Restore must be resumed prior to first image expiration on 11/17/2012 6:30:06 PM
11/6/2012 9:57:23 AM - end Restore; elapsed time: 00:45:23
socket write failed(24)

Marianne · ‎11-06-2012

What has changed between yesterday and today?

Errors are completely different today.

Yesterday you definitely had a Windows machine as destination - tar32 process is proof.

You will need to enable logs to give us a full picture.

On master: bprd (restart NBU after creating log folder)
On media server: bpbrm, bptm and bpdm
On destination client: tar

Please rename logs to reflect process name (e.g. bprd.txt) and upload as File attachments.

Handy NetBackup Links

Adnan_F · ‎11-08-2012

Hello Marianne,

Thank you for your reply again.

I did check/read about the tar32 process, but trust me the target machine is a linux virtual machine (on vmware)

We had success through an alternate way today.

I did a number of restores and ultimately each one was a giving a 2800 (standard policy retore error) or 24 (socket write failed). Other than the main error code I found that one recurring exit status:

11/6/2012 5:52:58 PM - Info tar32(pid=6505) done. status: 85: media read error

This led me to verify the backup image of the linux FS backup copy itself. Did a number of verification jobs (verify media) for several different full images, however we were getting "failed media read" errors for each of them. Now the backup status of the source machine is alwasy showing SUCCESSFUL everyday but the same image when being verified through "Catalog" was giving media read errors.

This led me to believe that maybe the backup images are not being copied properly on the storage unit (corruption, etc. i still havent figured out this yet). Hence I took a new full backup on a different storage unit and performed a successful restore on the same VM.

I'm dealing with Sym Support on the issue now, hopefully they would be able to give more details.

I'll keep this post updated.

Kiran_Loni · ‎11-26-2012

Hi Adnan,

Check wether all the hostnames are in place like

Hostname and IP address of Backup server & client server ( server you have taken backup )in destination server .

Mark_Solutions · ‎11-26-2012

Out of interest what sort of PureDisk is this? Is it MSDP or an appliance?

Does the client in question reside in more than one policy and if so does it appear in a different case in each policy?

I have seen very similar results with 7.5.0.4 when the client appears in more than one policy and is listed differently .. i.e. client Client CLIENT

This is corrected using softlinks in the PD storage area but it depends exactly what that storage is - and it really needs Symantec Support to do this for you to be on the same side

Adnan_F · ‎12-12-2012