Marianne thanks by the way , you have always helped me and I'm so thankful ., you and all the other members are really a true human and true engineer cause you help people without any expectation.today I could solve this issue , by reinstalling the agent , changing the backup path ( I presented another LUN from SAN Storage ).

Curious to know why you have 'Follow NFS' and 'Cross Mountpoints' selected. Is /backup NFS mount? With nested mountpoints? The timeout seems to be a network read timeout of 30 minutes.This means that the media server received NO data for 30 minutes. Do you have bpbkar log folder on the client?And bpbrm and bptm log folders on the media server? All of these logging levels need to be higher than 0. I suggest level 3. bpbkar on the client will show when files/data is sent to the media server.bptm on the media server will show each time a data block is received.bpbrm will show metadata received from client's bpbkar. bpbkar should also show file sizes - if there are large backup images that take very long to transmit, then it is quite possible that you will see timeouts. Here you will need to look at TCP KeepAlive settings on the master, media server and client. Let us look at logs first.

There is never a single common problem underlying status 41. The status 41 can be caused by a very wide range of situations. My advice would be to check that all elements of the network stack and network path do NOT have any strange custom configuration. . For example, I post this here purely as an example of how strange network configuration settings can cause strange networking errors for networking applications: https://www.veritas.com/support/en_US/article.100020853 ...i.e. I am not trying to suggest that your problem is this, merely to demonstrate an example of a strange configuration setting having strange results. . However, having said all that, one of the most common usual culprits with networking errors is "TCP keepalive". I'll let you do some searching on that topic. Remember, "keepalive" is a feature at all points along the network path... source, carrier, target. i.e. you must check the keepalive at all network contact points, which means: client, switch, media, switch, storage, switch, master.

Hello, Did you try to enable logs as suggested by Marianne ? sdo 's post is interesting, it's obvious the error is about the timeout, which is 30 minutes, which means that media server was waiting for all that time and didn't receive any data.. so, what you need to do is enable logs and look at it closely on what causes this delay.. because I believe even if you increase TO to 3600 you will get an error after 1 hour.. so you need to dig on this. also, you said this issue concerns 2/4 oracle clients, so you have to look at which kind of data & how big it is are being backed up by these failing clients. Good luck

The client directory name must be bpbkar. You need to create the folder if it does not exist. Incorrect folder name will not log anything. It is important to have all 3 logs - bpbkar on the client, bpbrm and bptm.All three logs are necessary in order to follow the process flow.We will also need all text in Job Details to obtain timestamps and PID.

Kasra_Hashemi My workload is quite hectic lately - all I can commit to is to have a look if and when time permits.

Network Connection Timeout and Socket WriteFailed

13 Replies

Marianne
Level 6
5 years ago
Curious to know why you have 'Follow NFS' and 'Cross Mountpoints' selected.

Is /backup NFS mount? With nested mountpoints?

The timeout seems to be a network read timeout of 30 minutes.
This means that the media server received NO data for 30 minutes.

Do you have bpbkar log folder on the client?
And bpbrm and bptm log folders on the media server?

All of these logging levels need to be higher than 0. I suggest level 3.

bpbkar on the client will show when files/data is sent to the media server.
bptm on the media server will show each time a data block is received.
bpbrm will show metadata received from client's bpbkar.

bpbkar should also show file sizes - if there are large backup images that take very long to transmit, then it is quite possible that you will see timeouts.
Here you will need to look at TCP KeepAlive settings on the master, media server and client.

Let us look at logs first.
- Kasra_Hashemi
  Level 5
  5 years ago
  Marianne
  I have unchecked all the options which you mentioned (cause there are no mount points and NFS and those settings just were for testing)
  I changed the logging level to 3 and in order to make it clear I have replaced my server name to (NETBACKUP) AND the oracle client to (CLIENT)
  I couldn’t find the bpkar log (in //usr/openv/netbackup/logs.bpkar, I created bpkar directory) and I'm still searching for the solution and I will attach that log soon.
  But here is bptm and bprm log files.
  BPRM.log5 MB
  BPTM.log5.9 MB
  - Marianne
    Level 6
    5 years ago
    The client directory name must be bpbkar.
    You need to create the folder if it does not exist.
    
    Incorrect folder name will not log anything.
    
    It is important to have all 3 logs - bpbkar on the client, bpbrm and bptm.
    All three logs are necessary in order to follow the process flow.
    We will also need all text in Job Details to obtain timestamps and PID.
sdo
Moderator
5 years ago
There is never a single common problem underlying status 41. The status 41 can be caused by a very wide range of situations. My advice would be to check that all elements of the network stack and network path do NOT have any strange custom configuration.

.

For example, I post this here purely as an example of how strange network configuration settings can cause strange networking errors for networking applications:

https://www.veritas.com/support/en_US/article.100020853

...i.e. I am not trying to suggest that your problem is this, merely to demonstrate an example of a strange configuration setting having strange results.

.

However, having said all that, one of the most common usual culprits with networking errors is "TCP keepalive". I'll let you do some searching on that topic. Remember, "keepalive" is a feature at all points along the network path... source, carrier, target. i.e. you must check the keepalive at all network contact points, which means: client, switch, media, switch, storage, switch, master.
- Kasra_Hashemi
  Level 5
  5 years ago
  There are four OracleLinux netbackup clients in the environment, two of them have this issue, so I think TCP Keepalive must be configured correctly otherwise those two must have the problem.
  - Hamza_H
    Moderator
    5 years ago
    Hello,
    
    Did you try to enable logs as suggested by Marianne ?
    
    sdo 's post is interesting, it's obvious the error is about the timeout, which is 30 minutes, which means that media server was waiting for all that time and didn't receive any data.. so, what you need to do is enable logs and look at it closely on what causes this delay.. because I believe even if you increase TO to 3600 you will get an error after 1 hour.. so you need to dig on this.
    
    also, you said this issue concerns 2/4 oracle clients, so you have to look at which kind of data & how big it is are being backed up by these failing clients.
    
    Good luck

Forum Discussion

Network Connection Timeout and Socket WriteFailed

13 Replies

Related Content

error code 41 network connection timed out

Error 41 Network Connection Timeout

Re: Netbackup Media servergoes Offline to Master Server

NetBackup Client Network Statistics

NBUITA - Connecting with Oracle SQL Developer

Recent Discussions

command: bperror

MS-SharePoint policy restore error (2804) .

How to restore a backup

How to configure RBAC

10 years old netbackup appliance database service down, ssl certification out date