cancel
Showing results for 
Search instead for 
Did you mean: 

Oracle clients failing backups with cannot connect on oscket

kandi
Level 3

Hi,

I have a backup environment that runs on windows server 2003 (Master) and Windows server 2008 (Media servers) running RMAN based backups for Linux and Unix clients.

Recently the backups started failing with "cannot connect on socket" on these clients. The backups run at night and we don' troubleshoot the issues at night. When i look at them in the morning, i see that i can access client host properties fine from the master server admin console.

I have 2 similar issues for the past month. in both of them i see a pattern. just the first set of parallel jobs from just one node(2 node oracle cluster) fails. both of the failed policies run NBU 6.5 client. I am aware that this is way old software and i need to upgrade it. I am already on the track for upgrade for next week. But I am not sure if that is the reason why it is failing.

We pay for symantec support and I have engaged symantec for more than 3 weeks now to troubleshoot this one. they came back saying that socket's backend is file descriptor and we are running out of file descriptors at the start time of backups and thats what is causing these failures. our Linux and Unix experts(I am a windows system admin) see no evidence for the file descriptors to run out.

At this point I am at nowhere to go. Anyone who might have seen the same issue or a similar one, can you help me here please. 

 

Specifics:

Master server : Windows server 2003 eNTERPRISE EDITION NBU 7.5.0.4

Media server :  Windows server 2008 enterprise edition NBU 7.5.0.4

Client 1 : Redhat 5.8 NBU 6.5

Client 2 : redhat 6.5  NBU 6.5

 

both clients run oracle RAC cluster

1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Level 6
Partner    VIP    Accredited Certified
The one major difference between 6.5 and 7.x software is that 6.5 listens on vnetd for connections. 7.x clients listens on PBX and vnetd.

View solution in original post

5 REPLIES 5

Michael_G_Ander
Level 6
Certified

Suggest you run through the general database backup troubleshooting post

It could be sockets on the master that is exhausted, without any logs it is difficult to be more specific

As always with windows a reboot might help a while, if you havn't tried that already

Have used CLIENT_READ_TIMEOUT and CLIENT_CONNECT_TIMEOUT to handle a lot of concurrent oracle database backups, of course it does help if the master resource is exhausted

There is also some tcp registry keys like KEEPALIVETIME there might help you, to get the sockets released.

Hope this helps you

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

kandi
Level 3

Thanks for your response michael.

can you please point me to "general database backup troubleshooting post", if you have it handy?

Is there a command that i can use to find out the current # of sockets in master server?....does netbackup listens on a specific port that i can watch for? is there a real time check i can do for that?

 

Logs: I have enabled multiple logs on the servers. I just did not want to clutter my first question with those..but if you think a particular log may help, can you please name it?

Reboot: I cannot afford one. other production backups are running.

TIMEOUT settings: i have a 1G connection and 4 parallel channels form 2 servers for this particular cluster. is there a recommended setting for these params?

 

Again I am sorry for all these questions....but i guess you might have already got  the idea how desparate i am trying to solve this..

Marianne
Level 6
Partner    VIP    Accredited Certified

General database backup error troubleshooting 

General connectivity troubleshooting (status 23,24,25,58, often 6 ) 

 

Do you have bpcd log folder on problematic clients?

Please copy one of these logs from a failed backup to bpcd.txt and upload as File attachment.

Also curious to know how you are backing up these clusters - using node name or virtual hostname?

PS:
Sometimes the only way to know why things go wrong is to be actually there... maybe a good idea to spend the extra hours...
There may be certain things running at night that could interfere with network connectivity. Involve server owner/dba when troubleshooting.

Understand that Symantec did not cause the network connection failure. The failure is cleary caused by something outside of NBU.
NBU is the casualty here - not the cause. 
Again - involve server owner and dba.

kandi
Level 3

Thanks for your info Marianne. I was tied up with other things. sorry for the slow response.

 

The docs are really good.

mY problem was solved by upgrading the NBU client version to 7.5.0.4. Not sure how it fixed it..

 

But thanks all for your efforts.

Marianne
Level 6
Partner    VIP    Accredited Certified
The one major difference between 6.5 and 7.x software is that 6.5 listens on vnetd for connections. 7.x clients listens on PBX and vnetd.