03-07-2017 05:09 AM
Hello, I would like some assistance in figuring this one out, it have been bugging me for weeks.
My environment.
Master and two media servers Windows 2012R2 running NBU 7.7.3
Client is Server 2000 running NBU client 6.5.4
What I have tried.
disabling TCP offload and upping the client timeout. I also tried creating a new policy and backing up just two folders and not the ALL_Local_Drives.
Attached is the bpbkar log from the client using logging 5.
Anything else needed to diagnose this let me know.
Thanks,
Scott
03-07-2017 05:20 AM - edited 03-07-2017 05:28 AM
Hello
Kindly run the below command from master to client.
/usr/openv/netbackup/bin/admincmd/bptestbpcd -client <ClientName> -verbose
From Master
telnet <client name> 1556
telnet <client name> 13724
telnet <client name> 13782
There are lots of changes done from 6.5 to 7.7.x. The master server is on letest version so you may encounter
unpredictable errors.
Suggestion: Upgrade OS and NBU version client.
Thanks & Regards
03-07-2017 05:21 AM - edited 03-07-2017 05:26 AM
A 6.5.4 client on NBU 7.7.3 Master / Media is maybe not going to work too well ...
Status 24 is 'connection just breaks' having previous been established.
In my experience, you are pretty much wasting your time with NBU logs. Useful for seeing when the connection breaks, but not why, because NBU has no way of knowing.
Suggest using wireshark on both ends to see what is going on - anything else is trial and error. You need to find someone who is experienced with reading through wireshark outputs though. I have a nagging feeling that such a hugh difference in versions may be a part of the issue.
I'll wait for Marianne to come along and tell you it's not supported ...
03-07-2017 05:30 AM - edited 03-07-2017 05:31 AM
Results
bptestbpcd -client kocsoldev02 -verbose
1 1 1
172.16.4.71:63591 -> 172.16.4.91:13724
172.16.4.71:63594 -> 172.16.4.91:13724
PEER_NAME = kocbkup02.supreme.com
HOST_NAME = KOCSOLDEV02
CLIENT_NAME = KOCSOLDEV02
VERSION = 0x06540000
PLATFORM = nt
PATCH_VERSION = 6.5.4.0
SERVER_PATCH_VERSION = 6.5.4.0
MASTER_SERVER = kocbkup02
EMM_SERVER = kocbkup02
NB_MACHINE_TYPE = CLIENT
172.16.4.71:63601 -> 172.16.4.91:13724
Yes it would be ideal to upgrade both but not practical right now.
Thanks,
Scott
BTW all three telnet commands worked.
03-07-2017 05:33 AM
It's odd because it used to work after the NBU upgrade from 7.5.0.7 and then something happened. I will ask my network guys if they can sniff it.
Thanks,
Scott
03-07-2017 05:46 AM
Know there was some issues with the VSP system used by Netbackup for Windows 2000.
A lot of open files or a big one can also give status 24 as for example a running database.
Another usual suspect for this problem is the anti virus program
03-07-2017 05:49 AM
LOL! I heard my name...
This is your problem:
Client is Server 2000 running NBU client 6.5.4
Not so much with NBU version, but the Windows software. I remember MANY years ago that there were recommendations regarding MS hotfixes for the TCP stack on W2000.
With W2000 out of MS support for so many years, you have little to no chance of getting any MS updates.
You need to let the server owner know that the issue is with his/her antiquated OS and that there is nothing that can be done from NBU point of view to fix it.
We have had customers with similar situations over the years.
To try and get a backup through, they have done the following:
Ensure checkpoint is enabled in policy attributes - lower the checkpoint interval based on how long the backup runs before failing.
Break up the Backup Selection into smaller chuncks and Allow Multiple data streams.
Limit concurrent jobs to 2.
Keep on resuming Incomplete jobs...
These errors miraculously disappeared when these old servers were replaced with new supported hardware and OS.
03-07-2017 05:58 AM
Yeah I know old OS is a huge problem and something we are working very hard at fixing in our data center. Some are just very difficult to do. I am trying your suggestions now and will report back.
Thanks,
Scott
03-08-2017 01:06 AM
See ... I told you Marianne would tell you that ....
03-08-2017 05:11 AM
Yes you did ;)
Sadly Marianne your suggestions did not work. :(
Not sure what changed as this server was backing up after the upgrade. Back to the drawing board I guess.
03-08-2017 06:51 AM
None of my suggestions will fix the status 24's.
They are meant to enable 'resume' from the last checkpoint as opposed to a failure where backups must be restarted.
Resuming after each status 24 may eventually result in successful backup if backup is each time resumed after a progressive checkpoint.
If the backups were working fine all along, try to reboot the W2K server.
This will reload TCP/IP software stack.
Use network testing tools (as suggested by Martin) to test network connectivity and sustained data transfers.
03-08-2017 12:29 PM
I had a very quick peep at the log - seems it died fairly quickly. It's always useful to see the details from Activity Monitor for the failed job - this often gives a quick summary of what's going or, or not, as the case may be.
There is reference to TCP error 10053 in the log, which is a windows OS error.
Maybe this could be related to the client version - you do sometimes see odd unexpected behavior because of this. Generally, if you disregard the veriosns for the moment, I can't persoanlly think of a time I've been involved in a 'status 24' case that turned out to be NBU, and in 9 odd years, I've done a few ...
I appreciate you mention that back had worked after the upgrade - for how long ? A single run is kinda different than it worked for 4 months. It may well be, if the latter, that the version mis-match is in this case irrelevant to this issue.
Either way, no matter where the fault is caused, really the only way is to grab a tcp dump on both ends (media and client) and see if that helps.