cancel
Showing results for 
Search instead for 
Did you mean: 

Linux FS restore failure - "Standard Policy restore error 2800"

Adnan_F
Level 4

Hello All,

 

I've been trying to restore linux FS but seems to be failing with a Standard policy restore error (no.2800).  The backups for the source machine run successfully on a daily basis (on a pure disk storage unit) however the restore just fails. The master and media server for his job are both the same with Netbackup 7.5.0.4 installed on Windows Server 2008 R2.

Any help would be highly appreciated.

Below are the job details:

11/3/2012 10:39:48 AM - begin Restore
11/3/2012 10:40:38 AM - restoring image "Source Machine"_1351767623
11/3/2012 10:40:38 AM - Info bprd(pid=2100) Restoring from copy 1 of image created 11/01/12 14:00:23   
11/3/2012 10:40:38 AM - requesting resource @aaaab
11/3/2012 10:40:38 AM - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=Deduplication_Pool;Path=PureDiskVolume;StorageServ...
11/3/2012 10:40:39 AM - Info bpbrm(pid=10724) "Target Machine" is the host to restore to     
11/3/2012 10:40:39 AM - Info bpbrm(pid=10724) reading file list from client       
11/3/2012 10:40:41 AM - connecting
11/3/2012 10:40:41 AM - Info bpbrm(pid=10724) starting bptm          
11/3/2012 10:40:42 AM - Info tar32(pid=5954) Restore started          
11/3/2012 10:40:42 AM - connected; connect time: 00:00:01
11/3/2012 10:40:42 AM - Info bptm(pid=10200) start           
11/3/2012 10:40:42 AM - started process bptm (10200)
11/3/2012 10:40:42 AM - Info bpdm(pid=10200) reading backup image         
11/3/2012 10:40:42 AM - Info bptm(pid=10200) using 30 data buffers        
11/3/2012 10:40:42 AM - Info bptm(pid=10200) spawning a child process        
11/3/2012 10:40:42 AM - Info bptm(pid=10200) child pid: 7768         
11/3/2012 10:40:42 AM - Info bptm(pid=7768) start           
11/3/2012 10:40:42 AM - started process bptm (7768)
11/3/2012 10:40:47 AM - begin reading
11/3/2012 10:45:07 AM - Info bptm(pid=10200) waited for empty buffer 271 times, delayed 16479 times   
11/3/2012 10:45:07 AM - end reading; read time: 00:04:20
11/3/2012 10:45:08 AM - begin reading
11/3/2012 11:53:00 AM - Error bptm(pid=7768) cannot write data to socket, 10054      
11/3/2012 11:53:00 AM - Error bptm(pid=7768) The following files/folders were not restored:      
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGA10dn.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGA3Fpy.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGLH2Ds.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGNfZXS.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGSWiPd.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGdaxC6.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGeIYSY.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01...../OFGedFVV.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGitiBq.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGmN1ET.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) more than 10 files were not restored, remaining ones are shown in the progress log.
11/3/2012 11:53:03 AM - Info tar32(pid=5954) done. status 3         
11/3/2012 11:53:03 AM - Info tar32(pid=5954) done. status: 183         
11/3/2012 12:31:02 PM - Info bptm(pid=10200) EXITING with status 24 <----------       
11/3/2012 12:31:02 PM - Info "Master Server"(pid=10200) StorageServer=PureDisk:Master Server; Report=PDDO Stats for (Master Server): read: 1871285 KB, CR received: 1895266 KB, CR received over FC: 0 KB, dedup: 0.0%
11/3/2012 12:31:32 PM - Info tar32(pid=5954) done. status: 24: socket write failed      
11/3/2012 12:31:32 PM - Error bpbrm(pid=10724) client restore EXIT STATUS 24: socket write failed    
11/3/2012 12:31:33 PM - restored image Source Machine_1351767623 - (socket write failed(24)); restore time 01:50:55
11/3/2012 12:31:36 PM - Warning bprd(pid=2100) Restore must be resumed prior to first image expiration on 11/15/2012 2:00:23 PM
11/3/2012 12:31:36 PM - end Restore; elapsed time: 01:51:48
Standard policy restore error(2800)

Thank you.

Regards,

Adnan

1 ACCEPTED SOLUTION

Accepted Solutions

Adnan_F
Level 4

Hello All,

Thank you for thre responses. Mark, it is MSDP.

After weeks of troubleshooting with support, the issue was resolved somehow by itself. Below is briefo what happened:

1. Restores were failing continuously with 2800.

2. I did notice that in the job activity there was an image read failure. (Although backups were successful daily). Image verifications through Catalog were failing.

3. Opened ticket with Symantec who worked for couple of weeks.

4. Decided to move the backup to another storage unit and try the restore. IT WORKED.

5. Symantec support did a "crchk" on the entire MDSP unit where backups were successful but restore was failing.

6. During this time other policies on same storage unit started to give MEDIA READ & WRITE ERRORS.

7. A week later Symantec identified corrupted images and asked to expire them and did  crchk again.

8. Backups were moved back to orignial MSDP and restores were successful.

I dont believe the crchk helped much it was probably the case of corrupted images expiring. I do take full backups for our linux environment on a daily basis.

Important lesson here is that: 

1. You must ideally have an alternate free space MSDP pool (or even a basic pool) of 2 TB (or whatever your most critical server backup size is) to test such backup/restore failures scenario.

2. Verify the integrity of your critical backup images on regular basis through Catalog.

3. This goes without saying but if you're not testing your backups through restore, you're NOT doing your job as a backup administrator.

4. Incase you're restoring to a VM, ensure that you've latest VMware Tools installed on the target VM.

View solution in original post

8 REPLIES 8

Mick_Scott
Level 4
Partner Accredited Certified

Have a look at the other error codes in the log ; namley 3 and 183

do you have permissions to write to the destination ?

Is encryption enabled

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Info tar32(pid=5954) Restore started   

tar32 = Windows destination/target machine?

Are you trying to restore Linux FileSystem to Windows machine?
This is not supported. Some users have managed to restore data across platforms, but this is not supported by NBU.

 

Adnan_F
Level 4

Thank you Mick & Marianne for your response.

Mick: yes v have the permissions & encryption is NOT enabled

Marianne: The target machine is a Linux machine as well. It is a virtual machine (VMware), however in order to isolate the issue I did try the restore on physical machine as well which gave similar results.

Initially the restores were giving Standar Policy Error (2800) with Exit status 3,183, 24. But restores done yesterday and earlier today gave the overall error 24 Socket write failed with EXIT status 85 media read error. I'm not sure why the main error code has changed, but changes I made to the linux VM was update the VM tools, changed timeouts to 9000 (from 300)

11/6/2012 9:12:51 AM - Info bpdm(pid=1012) reading backup image         
11/6/2012 9:12:51 AM - Info bptm(pid=1012) using 30 data buffers        
11/6/2012 9:12:51 AM - Info bptm(pid=1012) spawning a child process        
11/6/2012 9:12:51 AM - Info bptm(pid=1012) child pid: 8520         
11/6/2012 9:12:51 AM - Info bptm(pid=8520) start           
11/6/2012 9:12:51 AM - started process bptm (8520)
11/6/2012 9:12:58 AM - begin reading
11/6/2012 9:13:03 AM - Info bptm(pid=1012) waited for empty buffer 83 times, delayed 104 times   
11/6/2012 9:13:03 AM - end reading; read time: 00:00:05
11/6/2012 9:13:03 AM - begin reading
11/6/2012 9:33:20 AM - Critical bptm(pid=1012) sts_read_image failed: error 2060017 system call failed     
11/6/2012 9:33:20 AM - Critical bptm(pid=1012) image read failed: error 2060017: system call failed    
11/6/2012 9:33:20 AM - Error bptm(pid=1012) cannot read image from disk, Invalid argument     
11/6/2012 9:33:20 AM - Info bptm(pid=1012) EXITING with status 85 <----------       
11/6/2012 9:33:20 AM - Error bptm(pid=8520) The following files/folders were not restored:      
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........../o10684138.out         
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........../o10684140.out         
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/...........o10684141.out         
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/..........o10684145.out         
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/........./o10684146.out         
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........o10684147.out         
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.......o10684152.out         
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........o10684156.out         
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/........./o10684160.out         
11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.......o10684161.out         
11/6/2012 9:33:20 AM - Error bptm(pid=8520) more than 10 files were not restored, remaining ones are shown in the progress log.
11/6/2012 9:57:20 AM - Info tar32(pid=9844) done. status 3         
11/6/2012 9:57:20 AM - Info MasterServer(pid=1012) StorageServer=PureDisk:MasterServer; Report=PDDO Stats for (MasterServer): read: 23196295 KB, CR received: 23228521 KB, CR received over FC: 0 KB, dedup: 0.0%
11/6/2012 9:57:20 AM - Info tar32(pid=9844) done. status: 183         
11/6/2012 9:57:21 AM - Info tar32(pid=9844) done. status: 85: media read error      
11/6/2012 9:57:21 AM - Error bpbrm(pid=6580) client restore EXIT STATUS 85: media read error    
11/6/2012 9:57:23 AM - Warning bprd(pid=6592) Restore must be resumed prior to first image expiration on 11/17/2012 6:30:06 PM
11/6/2012 9:57:23 AM - end Restore; elapsed time: 00:45:23
socket write failed(24) 

 

 

Marianne
Level 6
Partner    VIP    Accredited Certified

What has changed between yesterday and today?

Errors are completely different today.

Yesterday you definitely had a Windows machine as destination - tar32 process is proof.

You will need to enable logs to give us a full picture.

On master: bprd (restart NBU after creating log folder)
On media server: bpbrm, bptm and bpdm
On destination client: tar

Please rename logs to reflect process name (e.g. bprd.txt) and upload as File attachments.

Adnan_F
Level 4

Hello Marianne,

Thank you for your reply again.

I did check/read about the tar32 process, but trust me the target machine is a linux virtual machine (on vmware)

We had success through an alternate way today.

I did a number of restores and ultimately each one was a giving a 2800 (standard policy retore error) or 24 (socket write failed). Other than the main error code I found that one recurring exit status:

11/6/2012 5:52:58 PM - Info tar32(pid=6505) done. status: 85: media read error      

This led me to verify the backup image of the linux FS backup copy itself. Did a number of verification jobs (verify media) for several different full images, however we were getting "failed media read" errors for each of them. Now the backup status of the source machine is alwasy showing SUCCESSFUL everyday but the same image when being verified through "Catalog" was giving media read errors.

This led me to believe that maybe the backup images are not being copied properly on the storage unit (corruption, etc. i still havent figured out this yet). Hence I took a new full backup on a different storage unit and performed a successful restore on the same VM.

I'm dealing with Sym Support on the issue now, hopefully they would be able to give more details.

I'll keep this post updated.

 

Kiran_Loni
Not applicable

Hi Adnan,

 

Check wether all the hostnames are in place like

Hostname and IP address of Backup server & client server ( server you have taken backup )in destination server .

 

Mark_Solutions
Level 6
Partner Accredited Certified

Out of interest what sort of PureDisk is this? Is it MSDP or an appliance?

Does the client in question reside in more than one policy and if so does it appear in a different case in each policy?

I have seen very similar results with 7.5.0.4 when the client appears in more than one policy and is listed differently .. i.e. client Client CLIENT

This is corrected using softlinks in the PD storage area but it depends exactly what that storage is - and it really needs Symantec Support to do this for you to be on the same side

Adnan_F
Level 4

Hello All,

Thank you for thre responses. Mark, it is MSDP.

After weeks of troubleshooting with support, the issue was resolved somehow by itself. Below is briefo what happened:

1. Restores were failing continuously with 2800.

2. I did notice that in the job activity there was an image read failure. (Although backups were successful daily). Image verifications through Catalog were failing.

3. Opened ticket with Symantec who worked for couple of weeks.

4. Decided to move the backup to another storage unit and try the restore. IT WORKED.

5. Symantec support did a "crchk" on the entire MDSP unit where backups were successful but restore was failing.

6. During this time other policies on same storage unit started to give MEDIA READ & WRITE ERRORS.

7. A week later Symantec identified corrupted images and asked to expire them and did  crchk again.

8. Backups were moved back to orignial MSDP and restores were successful.

I dont believe the crchk helped much it was probably the case of corrupted images expiring. I do take full backups for our linux environment on a daily basis.

Important lesson here is that: 

1. You must ideally have an alternate free space MSDP pool (or even a basic pool) of 2 TB (or whatever your most critical server backup size is) to test such backup/restore failures scenario.

2. Verify the integrity of your critical backup images on regular basis through Catalog.

3. This goes without saying but if you're not testing your backups through restore, you're NOT doing your job as a backup administrator.

4. Incase you're restoring to a VM, ensure that you've latest VMware Tools installed on the target VM.