Solved: Marianne, Informations very

msanches · ‎03-28-2013

Hi

While restoring data of a unix client we faced the error :

"Cannot write data to socket, Broken pipe."

We are having Netbackup 7.5.0.3 master server (Solaris 10) , client is HP-UX 11.23.
Please find the below logs :. This is a very critical restore for us. Please suggest

Thanks

28/03/2013 12:44:30 - begin Restore
28/03/2013 12:44:33 - media needed: 006116
28/03/2013 12:44:33 - media needed: 006108
28/03/2013 12:44:34 - restoring from image xxxxxxx_1363861974
28/03/2013 12:44:34 - Info bprd (pid=19456) Restoring from copy 1 of image created Thu Mar 21 07:32:54 2013
28/03/2013 12:44:44 - started process bptm (pid=28126)
28/03/2013 12:44:46 - requesting resource 006108
28/03/2013 12:44:47 - granted resource 006108
28/03/2013 12:44:47 - granted resource HP.ULTRIUM4-SCSI.005
28/03/2013 12:44:48 - started process bptm (pid=28126)
28/03/2013 12:44:48 - mounting 006108
28/03/2013 12:46:05 - mounted 006108; mount time: 0:01:17
28/03/2013 12:46:07 - positioning 006108 to file 8
28/03/2013 12:47:32 - positioned 006108; position time: 0:01:25
28/03/2013 12:47:34 - begin reading
28/03/2013 12:48:50 - Error bptm (pid=28127) cannot write data to socket, Broken pipe
===============================================================

Marianne · ‎03-29-2013

Thanks for the logs - I will go through them a bit later in the day.

We need more logs, please:

bpbrm on the media server

bpcd and tar logs on the client.

If these log folders don't exist, please create them and retry the restore. Collect a full set of logs (including the ones previously posted) and upload.

If this matter is urgent, please log a call with Symantec Support as today is a public/bank holiday in most parts of the world.

Handy NetBackup Links

View solution in original post

Nicolai · ‎03-29-2013

If not done already set CLIENT_READ_TIMEOUT to 36000 on client and master/media server.

If there is a firewall between master/media and clint set TCP_KEEPALIVE_INTERVAL to 15 minutes using ndd -set /dev/tcp tcp_keepalive_interval {time}. The time is in miliseconds.

View solution in original post

msanches · ‎03-29-2013

Hi,Guys

Restoring from a backup is FlashBackup (granular)

Follow in annex the logs requested (by Marianne)

The configuration parameter (comment buy Nocalai) TCP_KEEPALIVE_INTERVAL is : 7200000
But we cannot firewall between master/media

ndd -get /dev/tcp tcp_keepalive_interval
7200000

The configuration parameter (comment buy Nocalai) CLIENT_READ_TIMEOUT is : 10800

Thanks

View solution in original post

Marianne · ‎03-29-2013

We are still missing bpcd and tar logs from the client....

We need to see what is happening on the client as well.

*** EDIT ***

The timestamps in the logs also do not correspond with the Job details.

We see above that restore was started at 12:44 and failed at 12:48.

We see in bpbrm a restore that completed successfully at 12:24.
The next timestamp is 14:36:

bptm starts at 14:59.

bprd seems to come from media server instead of master server:
bprd: canopus is not the primary server pabkp.nextel.com.br...exiting

We need a full set of logs that contain information about start and end of failed restore from:
master: bprd
media server: bptm and bpbrm
client: bpcd and tar.

Handy NetBackup Links

View solution in original post

msanches · ‎03-29-2013

Marianne,

Informations very important (about this problem)

The media-sever was the one who backed(Backup FlashBackup) up and that's where I need to restore

The log TAR, I believe that is clean (because restore not initialized on netbackup), please verify

Thanks

Mauricio

View solution in original post

Marianne · ‎03-29-2013

Tar shows successful restore of a single file that completed at 12:24:

Please read through my previous post again.

You have not given us any logs that contains evidence of the failed restore between 12:44 and 12:48.

We also need media server's bptm log that corresponds with timestamps in bpbrm log.

As per my post above:

We need a full set of logs that contain information about start and end of failed restore from:

master: bprd
media server: bptm and bpbrm
client: bpcd and tar.

If media server is also the client, then we need those logs on the media server.
We still need the master's bprd log.

If there is a time difference between media server and master, please tell us exactly how much.

I have no idea what this means:

.... restore not initialized on netbackup

How else are restores done if not on NetBackup?

Seems there was a problem with 'feedback' of successful restore to master :

12:24:24.537 [12964] <2> bpbrm write_msg_to_progress_file: (2201645.001) INF - TAR EXITING WITH STATUS = 0

12:24:24.538 [12964] <2> bpbrm handle_restore: from client canopus: INF - TAR RESTORED 1 OF 1 FILES SUCCESSFULLY

12:24:24.678 [12964] <16> bpbrm close_progress_log: could not close progress file /usr/openv/netbackup/logs/user_ops/pmsanche/logs/jbp-24494364483841946230000000094-uqaW1V.log on pabkp

Where did you check keepalive and timeout settings?
Master or media server?
Check both, please.

Handy NetBackup Links

View solution in original post

msanches · ‎03-29-2013

Hi,

I run the restore again with date today 29/03/2013 17:26hs, and attached new logs for analysis (all logs media)
And I have the same problem with the restore

The parameters keepalive and timeout are shown below;
========================================

ndd -get /dev/tcp tcp_keepalive_interval (master and media the same)
7200000

The parameters timeout (bp.conf) are shown below;

=====================================

SERVER_CONNECT_TIMEOUT = 1800

CLIENT_READ_TIMEOUT = 36000

LIST_FILES_TIMEOUT = 10800

YES: media server is also the client

Thanks

View solution in original post

Marianne · ‎03-29-2013

Thanks for the logs - I will go through them a bit later in the day.

We need more logs, please:

bpbrm on the media server

bpcd and tar logs on the client.

If these log folders don't exist, please create them and retry the restore. Collect a full set of logs (including the ones previously posted) and upload.

If this matter is urgent, please log a call with Symantec Support as today is a public/bank holiday in most parts of the world.

Handy NetBackup Links

Nicolai · ‎03-29-2013

If not done already set CLIENT_READ_TIMEOUT to 36000 on client and master/media server.

If there is a firewall between master/media and clint set TCP_KEEPALIVE_INTERVAL to 15 minutes using ndd -set /dev/tcp tcp_keepalive_interval {time}. The time is in miliseconds.

msanches · ‎03-29-2013

Hi,Guys

Restoring from a backup is FlashBackup (granular)

Follow in annex the logs requested (by Marianne)

The configuration parameter (comment buy Nocalai) TCP_KEEPALIVE_INTERVAL is : 7200000
But we cannot firewall between master/media

ndd -get /dev/tcp tcp_keepalive_interval
7200000

The configuration parameter (comment buy Nocalai) CLIENT_READ_TIMEOUT is : 10800

Thanks

Marianne · ‎03-29-2013

We are still missing bpcd and tar logs from the client....

We need to see what is happening on the client as well.

*** EDIT ***

The timestamps in the logs also do not correspond with the Job details.

We see above that restore was started at 12:44 and failed at 12:48.

We see in bpbrm a restore that completed successfully at 12:24.
The next timestamp is 14:36:

bptm starts at 14:59.

bprd seems to come from media server instead of master server:
bprd: canopus is not the primary server pabkp.nextel.com.br...exiting

We need a full set of logs that contain information about start and end of failed restore from:
master: bprd
media server: bptm and bpbrm
client: bpcd and tar.

Handy NetBackup Links

msanches · ‎03-29-2013

Marianne,

Informations very important (about this problem)

The media-sever was the one who backed(Backup FlashBackup) up and that's where I need to restore

The log TAR, I believe that is clean (because restore not initialized on netbackup), please verify

Thanks

Mauricio

Marianne · ‎03-29-2013

Tar shows successful restore of a single file that completed at 12:24:

Please read through my previous post again.

You have not given us any logs that contains evidence of the failed restore between 12:44 and 12:48.

We also need media server's bptm log that corresponds with timestamps in bpbrm log.

As per my post above:

We need a full set of logs that contain information about start and end of failed restore from:

master: bprd
media server: bptm and bpbrm
client: bpcd and tar.

If media server is also the client, then we need those logs on the media server.
We still need the master's bprd log.

If there is a time difference between media server and master, please tell us exactly how much.

I have no idea what this means:

.... restore not initialized on netbackup

How else are restores done if not on NetBackup?

Seems there was a problem with 'feedback' of successful restore to master :

12:24:24.537 [12964] <2> bpbrm write_msg_to_progress_file: (2201645.001) INF - TAR EXITING WITH STATUS = 0

12:24:24.538 [12964] <2> bpbrm handle_restore: from client canopus: INF - TAR RESTORED 1 OF 1 FILES SUCCESSFULLY

12:24:24.678 [12964] <16> bpbrm close_progress_log: could not close progress file /usr/openv/netbackup/logs/user_ops/pmsanche/logs/jbp-24494364483841946230000000094-uqaW1V.log on pabkp

Where did you check keepalive and timeout settings?
Master or media server?
Check both, please.

Handy NetBackup Links

msanches · ‎03-29-2013

Hi,

I run the restore again with date today 29/03/2013 17:26hs, and attached new logs for analysis (all logs media)
And I have the same problem with the restore

The parameters keepalive and timeout are shown below;
========================================

ndd -get /dev/tcp tcp_keepalive_interval (master and media the same)
7200000

The parameters timeout (bp.conf) are shown below;

=====================================

SERVER_CONNECT_TIMEOUT = 1800

CLIENT_READ_TIMEOUT = 36000

LIST_FILES_TIMEOUT = 10800

YES: media server is also the client

Thanks

VOX

Restore failing with error : Error bptm ... cannot write data to socket, Broken pipe