cancel
Showing results for 
Search instead for 
Did you mean: 

636 read from input socket failed

Rajeshgang
Level 3

Hi,

All backups got failed with error 636 due to network connectivity issue between master and media server ( master and media servers are in different networks). Is there any solution for the backups to get in incomplete status instead of getting errored with 636. 

 

Rajesh

10 REPLIES 10

Michael_G_Ander
Level 6
Certified

Basically you need to solve the network connectivity, and you should not go for incomplete but successfully backups.

But without more information we will not be able you. Need information like Server OS, are there firewall between master and media.

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

You may have noticed a number of similar posts under 'Related Discussions' on the right of this screen. 

Most (all?) of them have been solved by adjusting KeepAlive settings on the master and media server.

Important that the master and media server as well as long-running clients have the same KeepAlive settings.

Some TNs: 
http://www.veritas.com/docs/000018102 

http://www.veritas.com/docs/000087737

 

OS -Linux

Firewall is there between media and master


@Michael_G_Ander wrote:

Basically you need to solve the network connectivity, and you should not go for incomplete but successfully backups.

---

Due to errored jobs we are forced to start backups from the beginning whcih actually waste the time.

In case any connectivity issue happens between master and media then the backup jobs( especially full backups) will get the status incomplete? then we can resume the jobs once the network issue get solved.


Rajesh

Set the TCP keepalive on master, media and clients to less than the firewall idle session timeout (Usually around 5 minutes)

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

For failed jobs to go to Incomplete status that can be resumed, you need to enable Checkpoints in Policy Attributes.

This option is only available for Standard and MS-Windows policy types - not for Agent policy types.

nbutech
Level 6
Accredited Certified

Does normal communications between master and media server work ?

Are the requested ports opened between them ?

Use the bpclntcmd test to verify how Netbackup looks at this connection

How to verify name resolution for NetBackup (tm) systems, using the "bpclntcmd" command

https://www.veritas.com/support/en_US/article.TECH27430

 


@Michael_G_Ander wrote:

Set the TCP keepalive on master, media and clients to less than the firewall idle session timeout (Usually around 5 minutes)


Hi Michael,

Here are the configurations currently we have

tcp keep alive interval on firewall  - 36 hours

net.ipv4.tcp_keepalive_intvl = 60 seconds on both master and media server

net.ipv4.tcp_keepalive_time = 549 seconds on both master and media server

 

Thank you all for the valuable replies.

 


@nbutech wrote:

Does normal communications between master and media server work ?

Are the requested ports opened between them ?

Hi
Below is the scenario.
some full backups were running for 9 days and and all of them got errroed (error code 636) due to firewall connectivity  issue( network failure)  which is between master and media network. The issue got resolved within 2 hours and I couldn't resume the jobs since all are in errored status. I had to resubmit all the jobs from the scratch and lost 9 days time frame. So I  want to know if there any option for avoiding this kind of errors in the future.
Thanks
Rajesh

 


 

This setup sounds like all kinds of wrong to me.

Why have a firewall if it allows idle connections for more than a day ?

Backups shouldn't run for more than a day either, I would claim a backup that as run for more than week is as good as worthless. Besides that you way too vunerable to glitches within that timeframe as you have discovered.

Really think this needs to be redesigned with focus on backup performance. See the Planning and Performance Tuning guide.

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue