05-16-2016 07:56 AM
Hello, we are having trouble with one solaris client, whe the backup is running for 4 hours and 10 minutes exactly it finishes with the next error "read from input socket failed (636)", I have read the other posts but I can´t find the problem, the firewall guys told me they don´t have policies or filters that can affect the job in that way.
I really apreciate your comments, and sorry for my english.
Solved! Go to Solution.
05-18-2016 03:10 PM
05-16-2016 08:32 AM
I have re-check and media server and DBserver are in the same network segment.
05-16-2016 08:49 AM
05-16-2016 11:38 PM
Couple of questions:
Is there sent any data ?
Is it a database backup ?
Is there any long wait periods in the backup ?
As you are running through a firewall, have you implemented the recommended keepalive value of 4 minutes based on most firewalls has a idle session close after 5 minutes ?
Is the backup system busy when you get this error ?
What happens if you run the backup at another time ?
05-17-2016 07:01 AM
Have you seen this TN?
And this TN for the Solaris client:
05-17-2016 07:02 AM
Thank you for your comments.
I can´t divide in smaller pieces because is an rman backup or at least I don't know how :S
Is there sent any data ?
I think when it fails the backup is already done, because the parent is the only one failing, all the child tasks end in status 0.
Is it a database backup ? Yes it is.
Is there any long wait periods in the backup ? I really dont now.
As you are running through a firewall, have you implemented the recommended keepalive value of 4 minutes based on most firewalls has a idle session close after 5 minutes ? I have added keepalive value for 15 minutes but just on the media server.
Is the backup system busy when you get this error ? No
What happens if you run the backup at another time ? I already did it but I got the same result 636 error at 04 hours and 10 minutes.
05-17-2016 07:33 AM
KeepAlive settings should be done on master and media server.
05-17-2016 12:57 PM
In addition to the Master and Media, I have found it worth implement the low TCP Keepalive on clients behind firewalls especially database servers as lot of the connections is initiated by database server in the case of database backups.
Sounds like it is either the control file backup or the final validation by rman that is failing, check the stdout and stderr files under the dbclient folder in netbackup/logs on the client, create dbclient with 777 permission if you does not have it already.
Think you are confusing the Netbackup CLIENT_READ_TIMEOUT and the OS TCP Keepalive settings
Are you using the _%t parameter in the rman script for improved speed of the validation ?
Also get the DBA to check the Oracle alert log, there can be indications why the backup/validation does not work, that can't be seen in the Netbackup logs.
If it is a incremental backup talk with the DBA about the possibility to use Block Change Tracking, it makes the incremental faster as is does no have to scan for the changed blocks, but have some caveats on the Oracle side of things,
05-17-2016 01:37 PM
The RMAN piece name "_%t" meta-field that Michael is referring to is described here:
05-17-2016 01:50 PM
Yes we use "_%t", as you told me I added the keepAliveTime on the master server a restart the server, the result this time was different, the parent job ends at one hour and 20 minutes with the same 636 error but child tasks continue backing up the data base.
05-18-2016 07:17 AM
You did not tell us what values you used for KeepAlive?
See this TN: http://www.veritas.com/docs/000020036
The TCP KeepAliveTime value on master server which was already reduced to 900,000 ms was still too high for this environment.
After reducing the TCP KeepAliveTime setting on Master server to 300,000 ms (5 mins), followed by a reboot of the master server, SQL Server parent backup jobs were now able to complete successfully when backing up to the affected media servers.
05-18-2016 08:26 AM
You're right didn't tell you, at first I use 900,000 ms on master and media server but the problem continues, I just reduce it to 300,000 ms and reboot the master server, I'll let you know how it goes.
it is necessary that KeepAlivetime is the same on master and media servers ?
05-18-2016 03:10 PM
05-30-2016 06:16 AM
Thank you all, as you tell me I changed the keepalive time on both media and master to 300000 and it works!!!