We recently upgraded on of our appliance from 2.5.3 to 184.108.40.206. It was a direct upgrade to 220.127.116.11 version. Since the upgrade backups are failing with error 24 and 14 mainly socket errors and we can see winsock error 10060 in bpbkar.
The interesting thing is if we move the backups to the appliance which is not yet upgraded to 18.104.22.168 and is still at 2.5.3 level, the backup goes on fine.
We have opened a case with Symantec and are still struggling for more than a week.
Anyone of you had similar issues after upgrade. Kindly share and advise.
Solved! Go to Solution.
I have seen this before. Here are the steps that should help.
Network related errors/status code 14, 24, 40, and/or 42 are occuring in large numbers on Symantec NetBackup Appliances running 22.214.171.124, 126.96.36.199, or 188.8.131.52 code.
Symantec attempted to tune the TCP stack on the appliances in the 2.6.0.x and one of the settings changed, the net.ipv4.tcp_timestamps kernel setting, is resulting in these errors. In prior versions of the appliances, 2.5.x, this setting had been a value of 1 but in the 2.6.0.x code this was changed to 0 (zero).
1. Using Putty, another SSH utility, or via the IPMI/remote console connection, log into the appliance as the admin user and become root in the maintenance mode.
A. Go to the Main Menu->Support->Maintenance option and enter the maintenance password.
B. run "/opt/Symantec/scspagent/IPS/sisipsoverride.sh" to temporarily override the security policy. This should take the same maintenance password as entered above.
C. type "elevate" to become root
2. Confirm the current setting is equal to zero by running
sysctl -a | grep tcp_timestamps
3. As the root user, make a backup copy of the file to edit:
cp "/etc/sysctl.conf /etc/sysctl.conf.tcptimestampsoff"
4. Then if comfortable using vi to edit files run "vi /etc/sysctl.conf" and change the line that reads:
net.ipv4.tcp_timestamps = 0
To instead read:
net.ipv4.tcp_timestamps = 1
Save the file using either ESC + :wq or ESC + Z + Z *without entering the plus sign
Else run from the command line
sed -i /etc/sysctl.conf -e 's/^net\.ipv4\.tcp_timestamps =.*/net\.ipv4\.tcp_timestamps = 1/'
5. Verify the tcp_timestamps now equals 1 in the file by running
grep timestamps /etc/sysctl.conf
6. Run "sysctl -p" at the command line.
Note: This change does not require a reboot.
7. Confirm the setting change by running the below again
sysctl -a | grep tcp_timestamps
Note: If after making the above change there are still jobs that end with a status 14 or 24 make sure the TCP Chimney setting is disable on the clients experiencing the errors. This can be done on Windows 2008 and greater servers by running "netsh int tcp set global chimney=disabled" from an elevated command prompt; this also does not require a reboot. There can also be other reasons a client gets a status 24 error for example because of firewalls or virus scanners but this KB article does not cover those troubleshooting steps.
It may sound simplistic - especially if support already have this in hand - but i have seen the 7.6 upgrades re-set the timeouts on the Media Servers - all back to 300
Over time we all end up increasing them to 1800 or 3600 for client read timeout on the media servers to keep our connections good during large slow backups - suddenly having these dissapear can cause issue
Work just taking a look at all of the timeouts and compare you 2.5 media servers with the 2.6 ones
Hope this helps
Yes. We increased the timeouts but it did not help us.
Moreover while speaking with backline, we found new network modules been applied along with new OS. Some of them are tcp_fast and flow director.
As per direction from backline we tried to unload the module after which very strange thing happened. The customers network collapsed. The networking team said that the post where appliance was coonected behaved like root bridge and this choked up the network. Once they removed the cable from that port the network became stable.
Now we are in dilema what to do.....
Keyboard and monitor i guess and can it be reloaded.
Maybe worth asking them about the firmware upgrade that is available for the appliances just in case that contains anything that may help?
Mention the following to them:
I meant that if it is all off the network you needed a keyboard and monitor attached to do anything now - not that it was the case - sorry for not being clear
check the bp.conf. Is there an entry like "PREFERRED_NETWORK = 192.168.1.x PROHIBITED" or some other unneeded "PREFERRED_NETWORK =" entries ?
You are not alone with this issue, I have experience this at several customers. Have tried several ideas to solve this but nothing works so I expect it to be yet another bug in the software!
I have no solution for it other than to stay with 2.5.x for now until 184.108.40.206 is released.
And do extensive testing of 220.127.116.11 when it is released, hopefully several severe bugs will be fixed by this version.
No resolution yet!
Today we upgraded one of the appliance to 18.104.22.168 and going to redirect the backups to it from the temporary appliance that we were ising for backups for last 2 weeks.
I will let you guys know how it goes.
We too are experiencing nightly status 24 errors with 22.214.171.124 on 5230 Appliance, where I worked for WEEKS with Symantec TS without resolution. The issue began immediately after migration from 126.96.36.199 on Windows master to shiny new 5230 master with 188.8.131.52.
Exchange data always causes the issue, whether single or multiple streams, after about 20 minutes, followed by a series of retries that eventually terminate when the backup windows ends. Other policy types are effected occasionally. The issue only occurs when the data movement is over the LAN (data moved as FC block is always successful).
The workaround we are using is to backup to an SLP, with 1st stop being AdvancedDisk with short retention (on the internal disks), then duplicating to MSDP with long retention. This gets around the status 24 errors.
I am eagerly awaiting the result of 184.108.40.206 relative to this issue !!!
There have been different issues when WAN optimization is enabled.
Here is one such issue that also happens to show how to turn it off (Very easy in clish).
Worth a try.