cancel
Showing results for 
Search instead for 
Did you mean: 

Inactive TCP session - error 636 (firewall)

mrmadej
Level 4
Partner Accredited

Hi All,

I am looking for the possible solution for problem on customer's environment.
I have read a lot of articles and posts. And did not found any solution. None of solutions (i.e. keepalive timeout, etc.) doesn't work for us.
We have a lot of  servers which are behind the firewall. These servers are for Oracle databases.
And the problem is that the parent job is terminating after about 2 hours and backup ended with 636 error - read from input socket failed. All child jobs ending with 0.
The reason of the situation is firewall session which is set to 14400 half-second - 2 hours. Terminated session is established between client and media server.
I know the recomended solution is that we should increase the timeout for inactive session on firewall but LAN administrators don't want do this.
And here is my question. Is there any way to make this TCP session "active" for parent job during whole backup session?  Maybe some output from RMAN script could be redirected to media server?

Media server is running AIX and below are tcp_* settings:

tcp_keepcnt = 8
tcp_keepidle = 28800
tcp_keepinit = 150
tcp_keepintvl = 150

Other timeout settings on master or client are also set to be above 2 hours.
We have tested on servers which are not behind the firewall and parent job is running longer then 2 hours. So we are sure thet the problem is firewall.

Master server: RHEL, NBU 7.6.1.1
Media server: AIX 7.1, NBU 7.5.0.3
Clients: various versions 7.5.0.4 to 7.6.1.1 most of them is AIX.

Any suggestion would be appreciated.

Regards
Madej

1 ACCEPTED SOLUTION

Accepted Solutions

Nicolai
Moderator
Moderator
Partner    VIP   

You need  to confgure TCP keepalive on master and media servers with a keep alive time of 15 minutes . The network admin won't change the parameter in the firewall becuase its again best pratice.

On a red hat host add the following to /etc/sysctl.conf:

net.ipv4.tcp_keepalive_time=900

Apply the setting with sysctl -p

The OS will then keep alive sessions by sending "ping" packages every 15 minutes therby preventing the firewall closing the sessions becuase of idle time. The keep alive is not just for Netbackup but for all application on the host.

Please see this tech note for configuring AIX hosts 

DOCUMENTATION: COMM_FAILURE as a consequence of reusing a transport that has been inactive across a firewall

http://www.veritas.com/docs/000005752

Hint: be aware of "unit per messure".

View solution in original post

7 REPLIES 7

revarooo
Level 6
Employee
Fix the firewall. Its Breaking NetBackup.

mnolan
Level 6
Employee Accredited Certified

Set media server and master server keepalive timeout to that of the firewall or lower? Or is that what you already tried?

 

-edit-

 

Noticed you posted your media server settings

tcp_keepidle = 28800

set that to 14400 and on your master as well.

mrmadej
Level 4
Partner Accredited

Yes, that was tested. Value 14400 was set previously.

On master i have:

net.ipv4.tcp_keepalive_time = 7200

net.ipv4.tcp_keepalive_probes = 9

net.ipv4.tcp_keepalive_intvl = 75

As i wrote, other servers which are not behind the firewall are not affected. So i guess that the problem is not with AIX or Linux TCP parameters values.

Revaroo, LAN admin doesn't want do that and there is no discussion. We tried many times.

StefanosM
Level 6
Partner    VIP    Accredited Certified

From you initial post I understand that the master and media servers are in the same network and the affected clients are at the "other side" of the firewall.

If this is correct I suggests you to try two thinks.

  • Use client site deduplication with network resiliency.
  • Add a media server at the other side of the firewall. With that solution only the "signaling" between master and media server will go threw firewall. The backup will be faster and most probably without errors

stefanos

 

 

Nicolai
Moderator
Moderator
Partner    VIP   

You need  to confgure TCP keepalive on master and media servers with a keep alive time of 15 minutes . The network admin won't change the parameter in the firewall becuase its again best pratice.

On a red hat host add the following to /etc/sysctl.conf:

net.ipv4.tcp_keepalive_time=900

Apply the setting with sysctl -p

The OS will then keep alive sessions by sending "ping" packages every 15 minutes therby preventing the firewall closing the sessions becuase of idle time. The keep alive is not just for Netbackup but for all application on the host.

Please see this tech note for configuring AIX hosts 

DOCUMENTATION: COMM_FAILURE as a consequence of reusing a transport that has been inactive across a firewall

http://www.veritas.com/docs/000005752

Hint: be aware of "unit per messure".

mrmadej
Level 4
Partner Accredited

Hello,

Sorry for my absence for so long. I had to wait for administrator.

Nicolai, your advice finally solved the problem. Thank you.

The administrator set net.ipv4.tcp_keepalive_time=900 on master server.

 

Regards

Madej

 

Nicolai
Moderator
Moderator
Partner    VIP   

Glad to hear :)

Thanks for marking a solution.