cancel
Showing results for 
Search instead for 
Did you mean: 

(24) socket write failed

Zippo0526
Level 3

Hello, I would like some assistance in figuring this one out, it have been bugging me for weeks.

My environment. 

Master and two media servers Windows 2012R2 running NBU 7.7.3

Client is Server 2000 running NBU client 6.5.4

What I have tried.

disabling TCP offload and upping the client timeout.  I also tried creating a new policy and backing up just two folders and not the ALL_Local_Drives.

Attached is the bpbkar log from the client using logging 5.

Anything else needed to diagnose this let me know.

Thanks,

Scott

 

11 REPLIES 11

Tousif
Level 6

Hello

 

Kindly run the below command from master to client.

/usr/openv/netbackup/bin/admincmd/bptestbpcd -client <ClientName> -verbose

From Master

telnet <client name> 1556

telnet <client name> 13724

telnet <client name> 13782

There are lots of changes done from 6.5 to 7.7.x. The master server is on letest version so you may encounter

unpredictable errors.

Suggestion: Upgrade OS and NBU version client.

Thanks & Regards

mph999
Level 6
Employee Accredited

A 6.5.4 client on NBU 7.7.3 Master / Media is maybe not going to work too well ...

Status 24 is 'connection just breaks' having previous been established.

In my experience, you are pretty much wasting your time with NBU logs.  Useful for seeing when the connection breaks, but not why, because NBU has no way of knowing.

Suggest using wireshark on both ends to see what is going on - anything else is trial and error.  You need to find someone who is experienced with reading through wireshark outputs though.  I have a nagging feeling that such a hugh difference in versions may be a part of the issue.

I'll wait for Marianne to come along and tell you it's not supported ...

 

Results

bptestbpcd -client kocsoldev02 -verbose
1 1 1
172.16.4.71:63591 -> 172.16.4.91:13724
172.16.4.71:63594 -> 172.16.4.91:13724
PEER_NAME = kocbkup02.supreme.com
HOST_NAME = KOCSOLDEV02
CLIENT_NAME = KOCSOLDEV02
VERSION = 0x06540000
PLATFORM = nt
PATCH_VERSION = 6.5.4.0
SERVER_PATCH_VERSION = 6.5.4.0
MASTER_SERVER = kocbkup02
EMM_SERVER = kocbkup02
NB_MACHINE_TYPE = CLIENT
172.16.4.71:63601 -> 172.16.4.91:13724

Yes it would be ideal to upgrade both but not practical right now.

Thanks,

Scott

BTW all three telnet commands worked.

It's odd because it used to work after the NBU upgrade from 7.5.0.7 and then something happened.  I will ask my network guys if they can sniff it.

Thanks,

Scott

Know there was some issues with the VSP system used by Netbackup for Windows 2000.

A lot of open files or a big one can also give status 24 as for example a running database.

Another usual suspect for this problem is the anti virus program

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

Marianne
Level 6
Partner    VIP    Accredited Certified

LOL! I heard my name...

This is your problem:

Client is Server 2000 running NBU client 6.5.4

Not so much with NBU version, but the Windows software. I remember MANY years ago that there were recommendations regarding MS hotfixes for the TCP stack on W2000. 

With W2000 out of MS support for so many years, you have little to no chance of getting any MS updates.

You need to let the server owner know that the issue is with his/her antiquated OS and that there is nothing that can be done from NBU point of view to fix it.

We have had customers with similar situations over the years. 
To try and get a backup through, they have done the following:
Ensure checkpoint is enabled in policy attributes - lower the checkpoint interval based on how long the backup runs before failing.
Break up the Backup Selection into smaller chuncks and Allow Multiple data streams.
Limit concurrent jobs to 2.
Keep on resuming Incomplete jobs...

These errors miraculously disappeared when these old servers were replaced with new supported hardware and OS. 

Yeah I know old OS is a huge problem and something we are working very hard at fixing in our data center.  Some are just very difficult to do.  I am trying your suggestions now and will report back.

Thanks,

Scott

mph999
Level 6
Employee Accredited

See ...  I told you Marianne would tell you that ....

Yes you did ;)

Sadly Marianne your suggestions did not work.  :(

Not sure what changed as this server was backing up after the upgrade.  Back to the drawing board I guess.

Marianne
Level 6
Partner    VIP    Accredited Certified

None of my suggestions will fix the status 24's.

They are meant to enable 'resume' from the last checkpoint as opposed to a failure where backups must be restarted.
Resuming after each status 24 may eventually result in successful backup if backup is each time resumed after a progressive checkpoint.

If the backups were working fine all along, try to reboot the W2K server.
This will reload TCP/IP software stack.

Use network testing tools (as suggested by Martin) to test network connectivity and sustained data transfers.

mph999
Level 6
Employee Accredited

I had a very quick peep at the log - seems it died fairly quickly.  It's always useful to see the details from Activity Monitor for the failed job - this often gives a quick summary of what's going or, or not, as the case may be.

There is reference to TCP error 10053 in the log, which is a windows OS error.

Maybe this could be related to the client version - you do sometimes see odd unexpected behavior because of this. Generally, if you disregard the veriosns for the moment, I can't persoanlly think of a time I've been involved in a 'status 24' case that turned out to be NBU, and in 9 odd years, I've done a few ...

I appreciate you mention that back had worked after the upgrade - for how long ?  A single run is kinda different than it worked for 4 months.  It may well be, if the latter, that the version mis-match is in this case irrelevant to this issue.

Either way, no matter where the fault is caused, really the only way is to grab a tcp dump on both ends (media and client) and see if that helps.