Forum Discussion

msanches's avatar
msanches
Level 3
12 years ago

Restore failing with error : Error bptm ... cannot write data to socket, Broken pipe

  Hi While restoring data of a unix client we faced the error :  "Cannot write data to socket, Broken pipe." We are having Netbackup 7.5.0.3 master server (Solaris 10) , client is HP-...
  • Marianne's avatar
    12 years ago

    Thanks for the logs - I will go through them a bit later in the day.

    We need more logs, please:

    bpbrm on the media server

    bpcd and tar logs on the client.

    If these log folders don't exist, please create them and retry the restore. Collect a full set of logs (including the ones previously posted) and upload.

    If this matter is urgent, please log a call with Symantec Support as today is a public/bank holiday in most parts of the world.

  • Nicolai's avatar
    12 years ago

    If not done already set CLIENT_READ_TIMEOUT to 36000 on client and master/media server.

    If there is a firewall between master/media and clint set TCP_KEEPALIVE_INTERVAL to 15 minutes using ndd -set /dev/tcp tcp_keepalive_interval {time}. The time is in miliseconds.

  • msanches's avatar
    12 years ago

    Hi,Guys 

    Restoring from a backup is FlashBackup (granular)

    Follow in annex the logs requested (by Marianne)

    The configuration parameter (comment buy Nocalai) TCP_KEEPALIVE_INTERVAL is : 7200000
    But we cannot firewall between master/media

    ndd -get /dev/tcp tcp_keepalive_interval
    7200000

    The configuration parameter (comment buy Nocalai) CLIENT_READ_TIMEOUT is : 10800

    Thanks

  • Marianne's avatar
    12 years ago

    We are still missing bpcd and tar logs from the client.... 

    We need to see what is happening on the client as well.

    *** EDIT ***

    The timestamps in the logs also do not correspond with the Job details.

    We see above that restore was started at 12:44 and failed at 12:48.

    We see in bpbrm a restore that completed successfully at 12:24.
    The next timestamp is 
    14:36:

    bptm starts at 14:59.

    bprd seems to come from media server instead of master server:
    bprd: canopus is not the primary server pabkp.nextel.com.br...exiting
     

    We need a full set of logs that contain information about start and end of failed restore from:
    master: bprd  
    media server: bptm and bpbrm
    client: bpcd and tar.

     

  • msanches's avatar
    12 years ago

    Marianne,

    Informations very important (about this problem)

    The media-sever  was the one who backed(Backup FlashBackup) up and that's where I need to restore

    The log TAR, I believe that is clean (because restore not initialized on netbackup), please verify 

    Thanks

    Mauricio

     



     

  • Marianne's avatar
    12 years ago

    Tar shows successful restore of a single file that completed at 12:24:

    Please read through my previous post again.

    You have not given us any logs that contains evidence of the failed restore between 12:44 and 12:48.

    We also need media server's bptm log that corresponds with timestamps in bpbrm log.

    As per my post above: 

    We need a full set of logs that contain information about start and end of failed restore from:

    master: bprd  
    media server: bptm and bpbrm
    client: bpcd and tar.
     

    If media server is also the client, then we need those logs on the media server.
    We still need the master's bprd log.

    If there is a time difference between media server and master, please tell us exactly how much. 

    I have no idea what this means:

    ....  restore not initialized on netbackup

    How else are restores done if not on NetBackup?

    Seems there was a problem with 'feedback' of successful restore to master :

    12:24:24.537 [12964] <2> bpbrm write_msg_to_progress_file: (2201645.001) INF - TAR EXITING WITH STATUS = 0
    
    12:24:24.538 [12964] <2> bpbrm handle_restore: from client canopus: INF - TAR RESTORED 1 OF 1 FILES SUCCESSFULLY
    
    12:24:24.678 [12964] <16> bpbrm close_progress_log: could not close progress file /usr/openv/netbackup/logs/user_ops/pmsanche/logs/jbp-24494364483841946230000000094-uqaW1V.log on pabkp

    Where did you check keepalive and timeout settings?
    Master or media server?
    Check both, please.

     

  • msanches's avatar
    12 years ago

    Hi, 

    I run the restore again with date today 29/03/2013 17:26hs, and attached new logs for analysis (all logs media)
    And I have the same problem with the restore

    The parameters keepalive and timeout are shown below;
    ========================================

     

    ndd -get /dev/tcp tcp_keepalive_interval (master and media the same)
    7200000
     
    The parameters timeout (bp.conf) are shown below;
    =====================================
    SERVER_CONNECT_TIMEOUT = 1800
    CLIENT_READ_TIMEOUT = 36000
    LIST_FILES_TIMEOUT = 10800
     
    YES: media server is also the client

     

    Thanks