03-28-2013 06:47 PM
Hi
While restoring data of a unix client we faced the error :
"Cannot write data to socket, Broken pipe."
We are having Netbackup 7.5.0.3 master server (Solaris 10) , client is HP-UX 11.23.
Please find the below logs :. This is a very critical restore for us. Please suggest
Thanks
28/03/2013 12:44:30 - begin Restore
28/03/2013 12:44:33 - media needed: 006116
28/03/2013 12:44:33 - media needed: 006108
28/03/2013 12:44:34 - restoring from image xxxxxxx_1363861974
28/03/2013 12:44:34 - Info bprd (pid=19456) Restoring from copy 1 of image created Thu Mar 21 07:32:54 2013
28/03/2013 12:44:44 - started process bptm (pid=28126)
28/03/2013 12:44:46 - requesting resource 006108
28/03/2013 12:44:47 - granted resource 006108
28/03/2013 12:44:47 - granted resource HP.ULTRIUM4-SCSI.005
28/03/2013 12:44:48 - started process bptm (pid=28126)
28/03/2013 12:44:48 - mounting 006108
28/03/2013 12:46:05 - mounted 006108; mount time: 0:01:17
28/03/2013 12:46:07 - positioning 006108 to file 8
28/03/2013 12:47:32 - positioned 006108; position time: 0:01:25
28/03/2013 12:47:34 - begin reading
28/03/2013 12:48:50 - Error bptm (pid=28127) cannot write data to socket, Broken pipe
===============================================================
Solved! Go to Solution.
03-29-2013 01:35 AM
Thanks for the logs - I will go through them a bit later in the day.
We need more logs, please:
bpbrm on the media server
bpcd and tar logs on the client.
If these log folders don't exist, please create them and retry the restore. Collect a full set of logs (including the ones previously posted) and upload.
If this matter is urgent, please log a call with Symantec Support as today is a public/bank holiday in most parts of the world.
03-29-2013 01:41 AM
If not done already set CLIENT_READ_TIMEOUT to 36000 on client and master/media server.
If there is a firewall between master/media and clint set TCP_KEEPALIVE_INTERVAL to 15 minutes using ndd -set /dev/tcp tcp_keepalive_interval {time}. The time is in miliseconds.
03-29-2013 08:23 AM
Hi,Guys
Restoring from a backup is FlashBackup (granular)
Follow in annex the logs requested (by Marianne)
The configuration parameter (comment buy Nocalai) TCP_KEEPALIVE_INTERVAL is : 7200000
But we cannot firewall between master/media
ndd -get /dev/tcp tcp_keepalive_interval
7200000
The configuration parameter (comment buy Nocalai) CLIENT_READ_TIMEOUT is : 10800
Thanks
03-29-2013 11:45 AM
We are still missing bpcd and tar logs from the client....
We need to see what is happening on the client as well.
*** EDIT ***
The timestamps in the logs also do not correspond with the Job details.
We see above that restore was started at 12:44 and failed at 12:48.
We see in bpbrm a restore that completed successfully at 12:24.
The next timestamp is 14:36:
bptm starts at 14:59.
bprd seems to come from media server instead of master server:
bprd: canopus is not the primary server pabkp.nextel.com.br...exiting
We need a full set of logs that contain information about start and end of failed restore from:
master: bprd
media server: bptm and bpbrm
client: bpcd and tar.
03-29-2013 11:58 AM
Marianne,
Informations very important (about this problem)
The media-sever was the one who backed(Backup FlashBackup) up and that's where I need to restore
The log TAR, I believe that is clean (because restore not initialized on netbackup), please verify
Thanks
Mauricio
03-29-2013 12:50 PM
Tar shows successful restore of a single file that completed at 12:24:
Please read through my previous post again.
You have not given us any logs that contains evidence of the failed restore between 12:44 and 12:48.
We also need media server's bptm log that corresponds with timestamps in bpbrm log.
As per my post above:
We need a full set of logs that contain information about start and end of failed restore from:
master: bprd
media server: bptm and bpbrm
client: bpcd and tar.
If media server is also the client, then we need those logs on the media server.
We still need the master's bprd log.
If there is a time difference between media server and master, please tell us exactly how much.
I have no idea what this means:
.... restore not initialized on netbackup
How else are restores done if not on NetBackup?
Seems there was a problem with 'feedback' of successful restore to master :
12:24:24.537 [12964] <2> bpbrm write_msg_to_progress_file: (2201645.001) INF - TAR EXITING WITH STATUS = 0 12:24:24.538 [12964] <2> bpbrm handle_restore: from client canopus: INF - TAR RESTORED 1 OF 1 FILES SUCCESSFULLY 12:24:24.678 [12964] <16> bpbrm close_progress_log: could not close progress file /usr/openv/netbackup/logs/user_ops/pmsanche/logs/jbp-24494364483841946230000000094-uqaW1V.log on pabkp
Where did you check keepalive and timeout settings?
Master or media server?
Check both, please.
03-29-2013 01:47 PM
Hi,
I run the restore again with date today 29/03/2013 17:26hs, and attached new logs for analysis (all logs media)
And I have the same problem with the restore
The parameters keepalive and timeout are shown below;
========================================
Thanks
03-29-2013 01:35 AM
Thanks for the logs - I will go through them a bit later in the day.
We need more logs, please:
bpbrm on the media server
bpcd and tar logs on the client.
If these log folders don't exist, please create them and retry the restore. Collect a full set of logs (including the ones previously posted) and upload.
If this matter is urgent, please log a call with Symantec Support as today is a public/bank holiday in most parts of the world.
03-29-2013 01:41 AM
If not done already set CLIENT_READ_TIMEOUT to 36000 on client and master/media server.
If there is a firewall between master/media and clint set TCP_KEEPALIVE_INTERVAL to 15 minutes using ndd -set /dev/tcp tcp_keepalive_interval {time}. The time is in miliseconds.
03-29-2013 08:23 AM
Hi,Guys
Restoring from a backup is FlashBackup (granular)
Follow in annex the logs requested (by Marianne)
The configuration parameter (comment buy Nocalai) TCP_KEEPALIVE_INTERVAL is : 7200000
But we cannot firewall between master/media
ndd -get /dev/tcp tcp_keepalive_interval
7200000
The configuration parameter (comment buy Nocalai) CLIENT_READ_TIMEOUT is : 10800
Thanks
03-29-2013 11:45 AM
We are still missing bpcd and tar logs from the client....
We need to see what is happening on the client as well.
*** EDIT ***
The timestamps in the logs also do not correspond with the Job details.
We see above that restore was started at 12:44 and failed at 12:48.
We see in bpbrm a restore that completed successfully at 12:24.
The next timestamp is 14:36:
bptm starts at 14:59.
bprd seems to come from media server instead of master server:
bprd: canopus is not the primary server pabkp.nextel.com.br...exiting
We need a full set of logs that contain information about start and end of failed restore from:
master: bprd
media server: bptm and bpbrm
client: bpcd and tar.
03-29-2013 11:58 AM
Marianne,
Informations very important (about this problem)
The media-sever was the one who backed(Backup FlashBackup) up and that's where I need to restore
The log TAR, I believe that is clean (because restore not initialized on netbackup), please verify
Thanks
Mauricio
03-29-2013 12:50 PM
Tar shows successful restore of a single file that completed at 12:24:
Please read through my previous post again.
You have not given us any logs that contains evidence of the failed restore between 12:44 and 12:48.
We also need media server's bptm log that corresponds with timestamps in bpbrm log.
As per my post above:
We need a full set of logs that contain information about start and end of failed restore from:
master: bprd
media server: bptm and bpbrm
client: bpcd and tar.
If media server is also the client, then we need those logs on the media server.
We still need the master's bprd log.
If there is a time difference between media server and master, please tell us exactly how much.
I have no idea what this means:
.... restore not initialized on netbackup
How else are restores done if not on NetBackup?
Seems there was a problem with 'feedback' of successful restore to master :
12:24:24.537 [12964] <2> bpbrm write_msg_to_progress_file: (2201645.001) INF - TAR EXITING WITH STATUS = 0 12:24:24.538 [12964] <2> bpbrm handle_restore: from client canopus: INF - TAR RESTORED 1 OF 1 FILES SUCCESSFULLY 12:24:24.678 [12964] <16> bpbrm close_progress_log: could not close progress file /usr/openv/netbackup/logs/user_ops/pmsanche/logs/jbp-24494364483841946230000000094-uqaW1V.log on pabkp
Where did you check keepalive and timeout settings?
Master or media server?
Check both, please.
03-29-2013 01:47 PM
Hi,
I run the restore again with date today 29/03/2013 17:26hs, and attached new logs for analysis (all logs media)
And I have the same problem with the restore
The parameters keepalive and timeout are shown below;
========================================
Thanks