cancel
Showing results for 
Search instead for 
Did you mean: 

Status 41: Network Connection Timed out on full jobs only

dcbone
Level 3

Current netbackup version 7.7.3. Currently no support contract or no plans to re-up to make netbackup current so I'm posting here for some help.

This is currently backing up some old unix servers and recently began to have a Status 41: Network Connection Timed out only on full backup schedules. I increased the client read timeouts (up to 1800) on the master server and increased the client read/file browse timeouts on the client to 7200 each. After doing this, the full backup timeout still occurred after about 45 minutes of the job running.

Recently, I checked the "allow multiple data streams" and the total backup ran for 13 hours and the job that has /opt in the file list failed, but still wrote alot of KB and files, but the last 2 lines of the status are:

info bpbkar32 (pid = 12833) done. status: 41 network connection timed out

end writing; write time 5:33:25

 

Looking for any assistance please.

7 REPLIES 7

Hamza_H
Moderator
Moderator
   VIP   
Hello,
Please share this :
Version of nbu and os of the client
Is deduplication client side is enabled?
Accelerator? If yes ? Try a forced rescan backup
Share detailled status
Is the throughput is good enough? How much is the size ur trying to backup?
Did it work before? And finally, enable bpbkar logs on the client with verbo =3 and look for any errors in the same timeframe ..
Good luck

Version of nbu and os of the client: 7.7.3 and RHEL 5


Is deduplication client side is enabled?  Currently "always use the media server"


Accelerator? If yes ? Try a forced rescan backup Accellerator is unchecked

Is the throughput is good enough? I would assume yes, these were working as recently as a couple of weeks ago.

How much is the size ur trying to backup? ~300gb

Did it work before? Yes, this is a recent occurence. No changes that I'm aware of.

 

And finally, enable bpbkar logs on the client with verbo =3 and look for any errors in the same timeframe: I dont have direct acccess to this client, I will work with someone who does and report back. Could you provide where I can find instructions to enable this? New NBU admin.

Detail status. The job ends 100% complete, with files and folder written.

 

05/01/2020 03:10:03 - Info nbjm (pid=2552) starting backup job (jobid=343720) for client *servername*, policy *policyname*, schedule FULL_D2D
05/01/2020 03:10:03 - Info nbjm (pid=2552) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=343720, request id:{6id}
05/01/2020 03:10:03 - requesting resource *diskname*
05/01/2020 03:10:03 - requesting resource *nbuservername*.NBU_CLIENT.MAXJOBS.*name*
05/01/2020 03:10:03 - requesting resource *nbuservername*.NBU_POLICY.MAXJOBS.*policyname*
05/01/2020 03:10:03 - granted resource *nbuservername*.NBU_CLIENT.MAXJOBS.*servername*
05/01/2020 03:10:03 - granted resource *nbuservername*.NBU_POLICY.MAXJOBS.*policyname*
05/01/2020 03:10:03 - granted resource MediaID=@aaaaZ;Path=*storagepath*;MediaServer=*nbuservername*
05/01/2020 03:10:03 - granted resource *diskname*
05/01/2020 03:10:03 - estimated 0 kbytes needed
05/01/2020 03:10:03 - Info nbjm (pid=2552) started backup (backupid=*servernameid*) job for client *servername*, policy *policyname*, schedule FULL_D2D on storage unit *storagename*
05/01/2020 03:10:04 - started process bpbrm (pid=588)
05/01/2020 03:10:05 - Info bpbrm (pid=588) *servername* is the host to backup data from
05/01/2020 03:10:05 - Info bpbrm (pid=588) reading file list for client
05/01/2020 03:10:05 - connecting
05/01/2020 03:10:07 - Info bpbrm (pid=588) starting bpbkar32 on client
05/01/2020 03:10:07 - Info bpbkar32 (pid=12833) Backup started
05/01/2020 03:10:07 - connected; connect time: 0:00:00
05/01/2020 03:10:07 - Info bptm (pid=2596) start
05/01/2020 03:10:07 - Info bptm (pid=2596) using 262144 data buffer size
05/01/2020 03:10:07 - Info bptm (pid=2596) setting receive network buffer to 1049600 bytes
05/01/2020 03:10:07 - Info bptm (pid=2596) using 128 data buffers
05/01/2020 03:10:09 - Info bptm (pid=2596) start backup
05/01/2020 03:10:12 - Info bptm (pid=2596) backup child process is pid 6360.6920
05/01/2020 03:10:12 - Info bptm (pid=6360) start
05/01/2020 03:10:12 - begin writing
05/01/2020 08:43:37 - Info bpbkar32 (pid=12833) done. status: 41: network connection timed out
05/01/2020 08:43:37 - end writing; write time: 5:33:25
network connection timed out (41)

Hamza_H
Moderator
Moderator
   VIP   
Hi,

To enable bpbkar logs, on the client server, make sure the folder ´bpbkar’ is created in /usr/openv/netbackup/logs/
Then to setup debugging, you have two choices
On the client server : edit the file bp.conf which is in /usr/openv/netbackup/bp.conf by adding/editing the entry VERBOSE = 3 (don’t need to change it to 5 for now)
Or , On your netbackup console on the master server, go to left panel called host properties-clients-client-double click-logging-bpbkar level 3

Then launch a backup and when it fails, look into logs on the client.

In the logs you can use both , timeframe to search on the exact lines which describe what really happened and also using the PID which is for example in the detailled status that you shared is 12833.

Hamza_H
Moderator
Moderator
   VIP   
You may want to check this : https://www.veritas.com/support/en_US/article.100003560



You said :
*I increased the client read timeouts (up to 1800) on the master server and increased the client read/file browse timeouts on the client to 7200 each. After doing this, the full backup timeout still occurred after about 45 minutes of the job running.*

My questions are : is the master server also the media server?

You should change ‘client read timeout’ on the media server and set it to 3600 and the same paramater on the client with the same value (no need to change file browse, you can change it back to 300)
After doing this, do another backup.

Is there a FW between the master/media and the client? If yes you should check with your network team to check logs for any drop connections.

Your backup full is failing because the amount of data to send is much more than the incremental. So if there is any delay on the client’s side.. the media is waiting to the Timeout that you set but this change wouldn’t mean anything if the FW had already dropped the connection..

My suggestions are :
Activate the accelerator and run a full backup.
If it fails, then try another test with deduplication on the client side (but the client must have enough ressources to handle the deduplication).
If this works then there is a problem on your connection and surely on the FW..

And also (I know this reply is a little bit long, but all network/connection status codes especially 13,14,41 are tricky and a lot of factors can be the root cause) so also you may want to check on your switches/routers/clients NIC IF they are on full duplex and not half duplex..

Good luck

Hamza_H
Moderator
Moderator
   VIP   

Hello @dcbone ,

 

just wondering if you were able to resolve this ? if yes could you please share the solution or mark the post that helped you out.

Thanks :)

BR.

 

It resolved itself, however unfortunately there was no solution. It just stopped happening.