01-06-2015 06:39 PM
Hi Guys,
I am currently facing issue to backup a big data on Symantec NetBackup Appliance 5220 more than 14 TB with accelarator feature in one stream.
Environment:
Master: Enterprise 2008 R2 64 bit
Media: NetBackup 5220 (MSDP)
Client: RedHat 2.6.18
Everytime it was failing after 9 to 10 hours kindly please find below the activity monitor logs. If you guys required any logs from appliance or client i will upload it.
1/6/2015 8:51:47 AM - Info nbjm(pid=7896) starting backup job (jobid=337001) for client nebres, policy PSS_NEBRES_PROD, schedule full
1/6/2015 8:51:47 AM - Info nbjm(pid=7896) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=337001, request id:{06B912E6-BFDC-4000-8BB2-A503CB202AA2})
1/6/2015 8:51:47 AM - requesting resource stu_disk_adconbuapp4
1/6/2015 8:51:47 AM - requesting resource adconbu.NBU_CLIENT.MAXJOBS.nebres
1/6/2015 8:51:47 AM - requesting resource adconbu.NBU_POLICY.MAXJOBS.PSS_NEBRES_PROD
1/6/2015 8:51:47 AM - granted resource adconbu.NBU_CLIENT.MAXJOBS.nebres
1/6/2015 8:51:47 AM - granted resource adconbu.NBU_POLICY.MAXJOBS.PSS_NEBRES_PROD
1/6/2015 8:51:47 AM - granted resource MediaID=@aaaak;DiskVolume=PureDiskVolume;DiskPool=dp_disk_app4;Path=PureDiskVolume;StorageServer=app4;MediaServer=app4
1/6/2015 8:51:47 AM - granted resource stu_disk_app4
1/6/2015 8:51:48 AM - estimated 0 Kbytes needed
1/6/2015 8:51:48 AM - Info nbjm(pid=7896) started backup (backupid=nebres_1420519907) job for client nebres, policy PSS_NEBRES_PROD, schedule full on storage unit stu_disk_app4
1/6/2015 8:51:49 AM - started process bpbrm (8490)
1/6/2015 8:51:51 AM - connecting
1/6/2015 8:51:51 AM - connected; connect time: 00:00:00
1/6/2015 8:51:53 AM - Info bpbrm(pid=8490) nebres is the host to backup data from
1/6/2015 8:51:53 AM - Info bpbrm(pid=8490) reading file list from client
1/6/2015 8:51:53 AM - Info bpbrm(pid=8490) accelerator enabled
1/6/2015 8:51:53 AM - Info bpbrm(pid=8490) There is no complete backup image match with track journal, a regular full backup will be performed.
1/6/2015 8:51:54 AM - Info bpbrm(pid=8490) starting bpbkar on client
1/6/2015 8:51:54 AM - Info bpbkar(pid=1809) Backup started
1/6/2015 8:51:54 AM - Info bpbrm(pid=8490) bptm pid: 8491
1/6/2015 8:51:55 AM - Info bptm(pid=8491) start
1/6/2015 8:51:55 AM - Info bptm(pid=8491) using 262144 data buffer size
1/6/2015 8:51:55 AM - Info bptm(pid=8491) using 30 data buffers
1/6/2015 8:51:55 AM - begin writing
1/6/2015 8:51:56 AM - Info bptm(pid=8491) start backup
1/6/2015 8:51:57 AM - Info bptm(pid=8491) backup child process is pid 8502
1/6/2015 1:45:33 PM - Info bpbkar(pid=1809) 4999 entries sent to bpdbm
1/6/2015 1:51:38 PM - Info bpbkar(pid=1809) 9999 entries sent to bpdbm
1/6/2015 1:52:09 PM - Info bpbkar(pid=1809) 14999 entries sent to bpdbm
1/6/2015 1:59:15 PM - Info bpbkar(pid=1809) 19999 entries sent to bpdbm
1/6/2015 2:01:18 PM - Info bpbkar(pid=1809) 24999 entries sent to bpdbm
1/6/2015 2:04:00 PM - Info bpbkar(pid=1809) 29999 entries sent to bpdbm
1/6/2015 2:14:27 PM - Info bpbkar(pid=1809) 34999 entries sent to bpdbm
1/6/2015 2:18:31 PM - Info bpbkar(pid=1809) 39999 entries sent to bpdbm
1/6/2015 2:23:42 PM - Info bpbkar(pid=1809) 44999 entries sent to bpdbm
1/6/2015 2:25:13 PM - Info bpbkar(pid=1809) 49999 entries sent to bpdbm
1/6/2015 2:26:40 PM - Info bpbkar(pid=1809) 54999 entries sent to bpdbm
1/6/2015 2:34:25 PM - Info bpbkar(pid=1809) 59999 entries sent to bpdbm
1/6/2015 2:39:19 PM - Info bpbkar(pid=1809) 64999 entries sent to bpdbm
1/6/2015 2:43:47 PM - Info bpbkar(pid=1809) 69999 entries sent to bpdbm
1/6/2015 6:06:46 PM - Error bpbrm(pid=8490) socket read failed: errno = 62 - Timer expired
1/6/2015 6:13:12 PM - Error bptm(pid=8491) media manager terminated by parent process
1/6/2015 6:16:36 PM - end writing; write time: 09:24:41
1/6/2015 6:16:38 PM - Info adconbuapp4(pid=8491) StorageServer=PureDisk:adconbuapp4; Report=PDDO Stats for (app4): scanned: 1075213482 KB, CR sent: 120589604 KB, CR sent over FC: 0 KB, dedup: 88.8%
1/6/2015 6:16:39 PM - Info bpbkar(pid=1809) done. status: 13: file read failed
file read failed(13)
waiting for your response
Thanks
Kamran
01-06-2015 08:31 PM
The "Timer expired" indicates a timeout issue.
If you have already increased the "Client Read Timeout" of the client to a larger value (such as 7200 sec), and it still failed. Then try to lower the TCP keepalive time in master server (your Windows 2008 R2), refer to this technote:
http://www.symantec.com/docs/HOWTO99910
01-06-2015 08:55 PM
Client Read Timeout value i checked in host properties it was set on default 300 sec i increased it and re-run the backup after that i am trying to applying the technote. I will update you soon.
Thanks
01-07-2015 05:30 AM
It is not a client read time out issue. The "timer expired" is simply telling you that "this is taking way too long to send data so I am calling it".
The time between the last buffer dump and the error was 3 hours and 23 minutes. That would be 12180 seconds. Up until 02:43:47 PM the backups were clipping right along at 5000 entries every 2 - 10 minutes. Notice there was almost five hours between when the job started and when data start to be transferred. If your assumption that it was a client read time out the it would have failed after 8:56 AM or even better yet after 02:48:47 PM.
What did the bpbkar log on the client say as to what was happening between 02:43:47 PM and 06:06:46 PM?
01-11-2015 06:26 PM
I already increase the client read timeout on the media server and re-run the backup now it was failing with the "file write failed(14)" i didnot copy the log because it was restart again but i checked in the policy checkpoint start option is disabled and also the data stream is disabled.
client machine is on Linux RedHat 64 bit.
Master Server: ENT Windows 2008 R2 64 bit (NBU7.5.0.7).
Media Server: NetBackup Appliance 5220 (2.5.4)
waiting for your response.
Thanks
01-11-2015 06:52 PM
sorry the error was "Exit: client backup EXIT STATUS 84: media write error"
01-12-2015 12:06 AM
Status 84 is a different issue and need to be looked at separately.
To troubleshoot status 13, you will need these logs:
On client - bpbkar
On media server: bpbrm and bptm
To troubleshoot status 84, you need these logs on the media server:
bptm and bpdm.
If status 84's continue, please start a separate discussion for this error.