Forum Discussion

Adnan_F's avatar
Adnan_F
Level 4
12 years ago

Linux FS restore failure - "Standard Policy restore error 2800"

Hello All,

 

I've been trying to restore linux FS but seems to be failing with a Standard policy restore error (no.2800).  The backups for the source machine run successfully on a daily basis (on a pure disk storage unit) however the restore just fails. The master and media server for his job are both the same with Netbackup 7.5.0.4 installed on Windows Server 2008 R2.

Any help would be highly appreciated.

Below are the job details:

11/3/2012 10:39:48 AM - begin Restore
11/3/2012 10:40:38 AM - restoring image "Source Machine"_1351767623
11/3/2012 10:40:38 AM - Info bprd(pid=2100) Restoring from copy 1 of image created 11/01/12 14:00:23   
11/3/2012 10:40:38 AM - requesting resource @aaaab
11/3/2012 10:40:38 AM - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=Deduplication_Pool;Path=PureDiskVolume;StorageServer=;MediaServer=
11/3/2012 10:40:39 AM - Info bpbrm(pid=10724) "Target Machine" is the host to restore to     
11/3/2012 10:40:39 AM - Info bpbrm(pid=10724) reading file list from client       
11/3/2012 10:40:41 AM - connecting
11/3/2012 10:40:41 AM - Info bpbrm(pid=10724) starting bptm          
11/3/2012 10:40:42 AM - Info tar32(pid=5954) Restore started          
11/3/2012 10:40:42 AM - connected; connect time: 00:00:01
11/3/2012 10:40:42 AM - Info bptm(pid=10200) start           
11/3/2012 10:40:42 AM - started process bptm (10200)
11/3/2012 10:40:42 AM - Info bpdm(pid=10200) reading backup image         
11/3/2012 10:40:42 AM - Info bptm(pid=10200) using 30 data buffers        
11/3/2012 10:40:42 AM - Info bptm(pid=10200) spawning a child process        
11/3/2012 10:40:42 AM - Info bptm(pid=10200) child pid: 7768         
11/3/2012 10:40:42 AM - Info bptm(pid=7768) start           
11/3/2012 10:40:42 AM - started process bptm (7768)
11/3/2012 10:40:47 AM - begin reading
11/3/2012 10:45:07 AM - Info bptm(pid=10200) waited for empty buffer 271 times, delayed 16479 times   
11/3/2012 10:45:07 AM - end reading; read time: 00:04:20
11/3/2012 10:45:08 AM - begin reading
11/3/2012 11:53:00 AM - Error bptm(pid=7768) cannot write data to socket, 10054      
11/3/2012 11:53:00 AM - Error bptm(pid=7768) The following files/folders were not restored:      
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGA10dn.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGA3Fpy.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGLH2Ds.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGNfZXS.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGSWiPd.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGdaxC6.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGeIYSY.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01...../OFGedFVV.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGitiBq.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGmN1ET.t         
11/3/2012 11:53:00 AM - Error bptm(pid=7768) more than 10 files were not restored, remaining ones are shown in the progress log.
11/3/2012 11:53:03 AM - Info tar32(pid=5954) done. status 3         
11/3/2012 11:53:03 AM - Info tar32(pid=5954) done. status: 183         
11/3/2012 12:31:02 PM - Info bptm(pid=10200) EXITING with status 24 <----------       
11/3/2012 12:31:02 PM - Info "Master Server"(pid=10200) StorageServer=PureDisk:Master Server; Report=PDDO Stats for (Master Server): read: 1871285 KB, CR received: 1895266 KB, CR received over FC: 0 KB, dedup: 0.0%
11/3/2012 12:31:32 PM - Info tar32(pid=5954) done. status: 24: socket write failed      
11/3/2012 12:31:32 PM - Error bpbrm(pid=10724) client restore EXIT STATUS 24: socket write failed    
11/3/2012 12:31:33 PM - restored image Source Machine_1351767623 - (socket write failed(24)); restore time 01:50:55
11/3/2012 12:31:36 PM - Warning bprd(pid=2100) Restore must be resumed prior to first image expiration on 11/15/2012 2:00:23 PM
11/3/2012 12:31:36 PM - end Restore; elapsed time: 01:51:48
Standard policy restore error(2800)

Thank you.

Regards,

Adnan

  • Hello All,

    Thank you for thre responses. Mark, it is MSDP.

    After weeks of troubleshooting with support, the issue was resolved somehow by itself. Below is briefo what happened:

    1. Restores were failing continuously with 2800.

    2. I did notice that in the job activity there was an image read failure. (Although backups were successful daily). Image verifications through Catalog were failing.

    3. Opened ticket with Symantec who worked for couple of weeks.

    4. Decided to move the backup to another storage unit and try the restore. IT WORKED.

    5. Symantec support did a "crchk" on the entire MDSP unit where backups were successful but restore was failing.

    6. During this time other policies on same storage unit started to give MEDIA READ & WRITE ERRORS.

    7. A week later Symantec identified corrupted images and asked to expire them and did  crchk again.

    8. Backups were moved back to orignial MSDP and restores were successful.

    I dont believe the crchk helped much it was probably the case of corrupted images expiring. I do take full backups for our linux environment on a daily basis.

    Important lesson here is that: 

    1. You must ideally have an alternate free space MSDP pool (or even a basic pool) of 2 TB (or whatever your most critical server backup size is) to test such backup/restore failures scenario.

    2. Verify the integrity of your critical backup images on regular basis through Catalog.

    3. This goes without saying but if you're not testing your backups through restore, you're NOT doing your job as a backup administrator.

    4. Incase you're restoring to a VM, ensure that you've latest VMware Tools installed on the target VM.

  • Have a look at the other error codes in the log ; namley 3 and 183

    do you have permissions to write to the destination ?

    Is encryption enabled

     

  • Info tar32(pid=5954) Restore started   

    tar32 = Windows destination/target machine?

    Are you trying to restore Linux FileSystem to Windows machine?
    This is not supported. Some users have managed to restore data across platforms, but this is not supported by NBU.

     

  • Thank you Mick & Marianne for your response.

    Mick: yes v have the permissions & encryption is NOT enabled

    Marianne: The target machine is a Linux machine as well. It is a virtual machine (VMware), however in order to isolate the issue I did try the restore on physical machine as well which gave similar results.

    Initially the restores were giving Standar Policy Error (2800) with Exit status 3,183, 24. But restores done yesterday and earlier today gave the overall error 24 Socket write failed with EXIT status 85 media read error. I'm not sure why the main error code has changed, but changes I made to the linux VM was update the VM tools, changed timeouts to 9000 (from 300)

    11/6/2012 9:12:51 AM - Info bpdm(pid=1012) reading backup image         
    11/6/2012 9:12:51 AM - Info bptm(pid=1012) using 30 data buffers        
    11/6/2012 9:12:51 AM - Info bptm(pid=1012) spawning a child process        
    11/6/2012 9:12:51 AM - Info bptm(pid=1012) child pid: 8520         
    11/6/2012 9:12:51 AM - Info bptm(pid=8520) start           
    11/6/2012 9:12:51 AM - started process bptm (8520)
    11/6/2012 9:12:58 AM - begin reading
    11/6/2012 9:13:03 AM - Info bptm(pid=1012) waited for empty buffer 83 times, delayed 104 times   
    11/6/2012 9:13:03 AM - end reading; read time: 00:00:05
    11/6/2012 9:13:03 AM - begin reading
    11/6/2012 9:33:20 AM - Critical bptm(pid=1012) sts_read_image failed: error 2060017 system call failed     
    11/6/2012 9:33:20 AM - Critical bptm(pid=1012) image read failed: error 2060017: system call failed    
    11/6/2012 9:33:20 AM - Error bptm(pid=1012) cannot read image from disk, Invalid argument     
    11/6/2012 9:33:20 AM - Info bptm(pid=1012) EXITING with status 85 <----------       
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) The following files/folders were not restored:      
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........../o10684138.out         
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........../o10684140.out         
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/...........o10684141.out         
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/..........o10684145.out         
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/........./o10684146.out         
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........o10684147.out         
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.......o10684152.out         
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.........o10684156.out         
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/........./o10684160.out         
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) UTF - /u01/oracle/PROD/apps/.......o10684161.out         
    11/6/2012 9:33:20 AM - Error bptm(pid=8520) more than 10 files were not restored, remaining ones are shown in the progress log.
    11/6/2012 9:57:20 AM - Info tar32(pid=9844) done. status 3         
    11/6/2012 9:57:20 AM - Info MasterServer(pid=1012) StorageServer=PureDisk:MasterServer; Report=PDDO Stats for (MasterServer): read: 23196295 KB, CR received: 23228521 KB, CR received over FC: 0 KB, dedup: 0.0%
    11/6/2012 9:57:20 AM - Info tar32(pid=9844) done. status: 183         
    11/6/2012 9:57:21 AM - Info tar32(pid=9844) done. status: 85: media read error      
    11/6/2012 9:57:21 AM - Error bpbrm(pid=6580) client restore EXIT STATUS 85: media read error    
    11/6/2012 9:57:23 AM - Warning bprd(pid=6592) Restore must be resumed prior to first image expiration on 11/17/2012 6:30:06 PM
    11/6/2012 9:57:23 AM - end Restore; elapsed time: 00:45:23
    socket write failed(24) 

     

     

  • What has changed between yesterday and today?

    Errors are completely different today.

    Yesterday you definitely had a Windows machine as destination - tar32 process is proof.

    You will need to enable logs to give us a full picture.

    On master: bprd (restart NBU after creating log folder)
    On media server: bpbrm, bptm and bpdm
    On destination client: tar

    Please rename logs to reflect process name (e.g. bprd.txt) and upload as File attachments.

  • Hello Marianne,

    Thank you for your reply again.

    I did check/read about the tar32 process, but trust me the target machine is a linux virtual machine (on vmware)

    We had success through an alternate way today.

    I did a number of restores and ultimately each one was a giving a 2800 (standard policy retore error) or 24 (socket write failed). Other than the main error code I found that one recurring exit status:

    11/6/2012 5:52:58 PM - Info tar32(pid=6505) done. status: 85: media read error      

    This led me to verify the backup image of the linux FS backup copy itself. Did a number of verification jobs (verify media) for several different full images, however we were getting "failed media read" errors for each of them. Now the backup status of the source machine is alwasy showing SUCCESSFUL everyday but the same image when being verified through "Catalog" was giving media read errors.

    This led me to believe that maybe the backup images are not being copied properly on the storage unit (corruption, etc. i still havent figured out this yet). Hence I took a new full backup on a different storage unit and performed a successful restore on the same VM.

    I'm dealing with Sym Support on the issue now, hopefully they would be able to give more details.

    I'll keep this post updated.

     

  • Hi Adnan,

     

    Check wether all the hostnames are in place like

    Hostname and IP address of Backup server & client server ( server you have taken backup )in destination server .

     

  • Out of interest what sort of PureDisk is this? Is it MSDP or an appliance?

    Does the client in question reside in more than one policy and if so does it appear in a different case in each policy?

    I have seen very similar results with 7.5.0.4 when the client appears in more than one policy and is listed differently .. i.e. client Client CLIENT

    This is corrected using softlinks in the PD storage area but it depends exactly what that storage is - and it really needs Symantec Support to do this for you to be on the same side

  • Hello All,

    Thank you for thre responses. Mark, it is MSDP.

    After weeks of troubleshooting with support, the issue was resolved somehow by itself. Below is briefo what happened:

    1. Restores were failing continuously with 2800.

    2. I did notice that in the job activity there was an image read failure. (Although backups were successful daily). Image verifications through Catalog were failing.

    3. Opened ticket with Symantec who worked for couple of weeks.

    4. Decided to move the backup to another storage unit and try the restore. IT WORKED.

    5. Symantec support did a "crchk" on the entire MDSP unit where backups were successful but restore was failing.

    6. During this time other policies on same storage unit started to give MEDIA READ & WRITE ERRORS.

    7. A week later Symantec identified corrupted images and asked to expire them and did  crchk again.

    8. Backups were moved back to orignial MSDP and restores were successful.

    I dont believe the crchk helped much it was probably the case of corrupted images expiring. I do take full backups for our linux environment on a daily basis.

    Important lesson here is that: 

    1. You must ideally have an alternate free space MSDP pool (or even a basic pool) of 2 TB (or whatever your most critical server backup size is) to test such backup/restore failures scenario.

    2. Verify the integrity of your critical backup images on regular basis through Catalog.

    3. This goes without saying but if you're not testing your backups through restore, you're NOT doing your job as a backup administrator.

    4. Incase you're restoring to a VM, ensure that you've latest VMware Tools installed on the target VM.