Linux FS restore failure - "Standard Policy restore error 2800"
Hello All,
I've been trying to restore linux FS but seems to be failing with a Standard policy restore error (no.2800). The backups for the source machine run successfully on a daily basis (on a pure disk storage unit) however the restore just fails. The master and media server for his job are both the same with Netbackup 7.5.0.4 installed on Windows Server 2008 R2.
Any help would be highly appreciated.
Below are the job details:
11/3/2012 10:39:48 AM - begin Restore
11/3/2012 10:40:38 AM - restoring image "Source Machine"_1351767623
11/3/2012 10:40:38 AM - Info bprd(pid=2100) Restoring from copy 1 of image created 11/01/12 14:00:23
11/3/2012 10:40:38 AM - requesting resource @aaaab
11/3/2012 10:40:38 AM - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=Deduplication_Pool;Path=PureDiskVolume;StorageServer=;MediaServer=
11/3/2012 10:40:39 AM - Info bpbrm(pid=10724) "Target Machine" is the host to restore to
11/3/2012 10:40:39 AM - Info bpbrm(pid=10724) reading file list from client
11/3/2012 10:40:41 AM - connecting
11/3/2012 10:40:41 AM - Info bpbrm(pid=10724) starting bptm
11/3/2012 10:40:42 AM - Info tar32(pid=5954) Restore started
11/3/2012 10:40:42 AM - connected; connect time: 00:00:01
11/3/2012 10:40:42 AM - Info bptm(pid=10200) start
11/3/2012 10:40:42 AM - started process bptm (10200)
11/3/2012 10:40:42 AM - Info bpdm(pid=10200) reading backup image
11/3/2012 10:40:42 AM - Info bptm(pid=10200) using 30 data buffers
11/3/2012 10:40:42 AM - Info bptm(pid=10200) spawning a child process
11/3/2012 10:40:42 AM - Info bptm(pid=10200) child pid: 7768
11/3/2012 10:40:42 AM - Info bptm(pid=7768) start
11/3/2012 10:40:42 AM - started process bptm (7768)
11/3/2012 10:40:47 AM - begin reading
11/3/2012 10:45:07 AM - Info bptm(pid=10200) waited for empty buffer 271 times, delayed 16479 times
11/3/2012 10:45:07 AM - end reading; read time: 00:04:20
11/3/2012 10:45:08 AM - begin reading
11/3/2012 11:53:00 AM - Error bptm(pid=7768) cannot write data to socket, 10054
11/3/2012 11:53:00 AM - Error bptm(pid=7768) The following files/folders were not restored:
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGA10dn.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGA3Fpy.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGLH2Ds.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGNfZXS.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGSWiPd.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGdaxC6.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGeIYSY.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01...../OFGedFVV.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/...../OFGitiBq.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) UTF - /u01/..../OFGmN1ET.t
11/3/2012 11:53:00 AM - Error bptm(pid=7768) more than 10 files were not restored, remaining ones are shown in the progress log.
11/3/2012 11:53:03 AM - Info tar32(pid=5954) done. status 3
11/3/2012 11:53:03 AM - Info tar32(pid=5954) done. status: 183
11/3/2012 12:31:02 PM - Info bptm(pid=10200) EXITING with status 24 <----------
11/3/2012 12:31:02 PM - Info "Master Server"(pid=10200) StorageServer=PureDisk:Master Server; Report=PDDO Stats for (Master Server): read: 1871285 KB, CR received: 1895266 KB, CR received over FC: 0 KB, dedup: 0.0%
11/3/2012 12:31:32 PM - Info tar32(pid=5954) done. status: 24: socket write failed
11/3/2012 12:31:32 PM - Error bpbrm(pid=10724) client restore EXIT STATUS 24: socket write failed
11/3/2012 12:31:33 PM - restored image Source Machine_1351767623 - (socket write failed(24)); restore time 01:50:55
11/3/2012 12:31:36 PM - Warning bprd(pid=2100) Restore must be resumed prior to first image expiration on 11/15/2012 2:00:23 PM
11/3/2012 12:31:36 PM - end Restore; elapsed time: 01:51:48
Standard policy restore error(2800)
Thank you.
Regards,
Adnan
Hello All,
Thank you for thre responses. Mark, it is MSDP.
After weeks of troubleshooting with support, the issue was resolved somehow by itself. Below is briefo what happened:
1. Restores were failing continuously with 2800.
2. I did notice that in the job activity there was an image read failure. (Although backups were successful daily). Image verifications through Catalog were failing.
3. Opened ticket with Symantec who worked for couple of weeks.
4. Decided to move the backup to another storage unit and try the restore. IT WORKED.
5. Symantec support did a "crchk" on the entire MDSP unit where backups were successful but restore was failing.
6. During this time other policies on same storage unit started to give MEDIA READ & WRITE ERRORS.
7. A week later Symantec identified corrupted images and asked to expire them and did crchk again.
8. Backups were moved back to orignial MSDP and restores were successful.
I dont believe the crchk helped much it was probably the case of corrupted images expiring. I do take full backups for our linux environment on a daily basis.
Important lesson here is that:
1. You must ideally have an alternate free space MSDP pool (or even a basic pool) of 2 TB (or whatever your most critical server backup size is) to test such backup/restore failures scenario.
2. Verify the integrity of your critical backup images on regular basis through Catalog.
3. This goes without saying but if you're not testing your backups through restore, you're NOT doing your job as a backup administrator.
4. Incase you're restoring to a VM, ensure that you've latest VMware Tools installed on the target VM.