Forum Discussion

mrmadej's avatar
Level 4
9 years ago

Inactive TCP session - error 636 (firewall)

Hi All,

I am looking for the possible solution for problem on customer's environment.
I have read a lot of articles and posts. And did not found any solution. None of solutions (i.e. keepalive timeout, etc.) doesn't work for us.
We have a lot of  servers which are behind the firewall. These servers are for Oracle databases.
And the problem is that the parent job is terminating after about 2 hours and backup ended with 636 error - read from input socket failed. All child jobs ending with 0.
The reason of the situation is firewall session which is set to 14400 half-second - 2 hours. Terminated session is established between client and media server.
I know the recomended solution is that we should increase the timeout for inactive session on firewall but LAN administrators don't want do this.
And here is my question. Is there any way to make this TCP session "active" for parent job during whole backup session?  Maybe some output from RMAN script could be redirected to media server?

Media server is running AIX and below are tcp_* settings:

tcp_keepcnt = 8
tcp_keepidle = 28800
tcp_keepinit = 150
tcp_keepintvl = 150

Other timeout settings on master or client are also set to be above 2 hours.
We have tested on servers which are not behind the firewall and parent job is running longer then 2 hours. So we are sure thet the problem is firewall.

Master server: RHEL, NBU
Media server: AIX 7.1, NBU
Clients: various versions to most of them is AIX.

Any suggestion would be appreciated.


  • You need  to confgure TCP keepalive on master and media servers with a keep alive time of 15 minutes . The network admin won't change the parameter in the firewall becuase its again best pratice.

    On a red hat host add the following to /etc/sysctl.conf:


    Apply the setting with sysctl -p

    The OS will then keep alive sessions by sending "ping" packages every 15 minutes therby preventing the firewall closing the sessions becuase of idle time. The keep alive is not just for Netbackup but for all application on the host.

    Please see this tech note for configuring AIX hosts 

    DOCUMENTATION: COMM_FAILURE as a consequence of reusing a transport that has been inactive across a firewall

    Hint: be aware of "unit per messure".

7 Replies

Replies have been turned off for this discussion
  • Set media server and master server keepalive timeout to that of the firewall or lower? Or is that what you already tried?




    Noticed you posted your media server settings

    tcp_keepidle = 28800

    set that to 14400 and on your master as well.

  • Yes, that was tested. Value 14400 was set previously.

    On master i have:

    net.ipv4.tcp_keepalive_time = 7200

    net.ipv4.tcp_keepalive_probes = 9

    net.ipv4.tcp_keepalive_intvl = 75

    As i wrote, other servers which are not behind the firewall are not affected. So i guess that the problem is not with AIX or Linux TCP parameters values.

    Revaroo, LAN admin doesn't want do that and there is no discussion. We tried many times.

  • From you initial post I understand that the master and media servers are in the same network and the affected clients are at the "other side" of the firewall.

    If this is correct I suggests you to try two thinks.

    • Use client site deduplication with network resiliency.
    • Add a media server at the other side of the firewall. With that solution only the "signaling" between master and media server will go threw firewall. The backup will be faster and most probably without errors




  • You need  to confgure TCP keepalive on master and media servers with a keep alive time of 15 minutes . The network admin won't change the parameter in the firewall becuase its again best pratice.

    On a red hat host add the following to /etc/sysctl.conf:


    Apply the setting with sysctl -p

    The OS will then keep alive sessions by sending "ping" packages every 15 minutes therby preventing the firewall closing the sessions becuase of idle time. The keep alive is not just for Netbackup but for all application on the host.

    Please see this tech note for configuring AIX hosts 

    DOCUMENTATION: COMM_FAILURE as a consequence of reusing a transport that has been inactive across a firewall

    Hint: be aware of "unit per messure".

  • Hello,

    Sorry for my absence for so long. I had to wait for administrator.

    Nicolai, your advice finally solved the problem. Thank you.

    The administrator set net.ipv4.tcp_keepalive_time=900 on master server.


