06-12-2014 04:29 AM
Hi All,
We recently upgraded on of our appliance from 2.5.3 to 2.6.0.2. It was a direct upgrade to 2.6.0.2 version. Since the upgrade backups are failing with error 24 and 14 mainly socket errors and we can see winsock error 10060 in bpbkar.
The interesting thing is if we move the backups to the appliance which is not yet upgraded to 2.6.0.2 and is still at 2.5.3 level, the backup goes on fine.
We have opened a case with Symantec and are still struggling for more than a week.
Anyone of you had similar issues after upgrade. Kindly share and advise.
Solved! Go to Solution.
08-26-2014 11:42 AM
I have seen this before. Here are the steps that should help.
Symptoms:
Network related errors/status code 14, 24, 40, and/or 42 are occuring in large numbers on Symantec NetBackup Appliances running 2.6.0.1, 2.6.0.2, or 2.6.0.3 code.
Cause:
Symantec attempted to tune the TCP stack on the appliances in the 2.6.0.x and one of the settings changed, the net.ipv4.tcp_timestamps kernel setting, is resulting in these errors. In prior versions of the appliances, 2.5.x, this setting had been a value of 1 but in the 2.6.0.x code this was changed to 0 (zero).
Resolution:
1. Using Putty, another SSH utility, or via the IPMI/remote console connection, log into the appliance as the admin user and become root in the maintenance mode.
A. Go to the Main Menu->Support->Maintenance option and enter the maintenance password.
B. run "/opt/Symantec/scspagent/IPS/sisipsoverride.sh" to temporarily override the security policy. This should take the same maintenance password as entered above.
C. type "elevate" to become root
2. Confirm the current setting is equal to zero by running
sysctl -a | grep tcp_timestamps
3. As the root user, make a backup copy of the file to edit:
cp "/etc/sysctl.conf /etc/sysctl.conf.tcptimestampsoff"
4. Then if comfortable using vi to edit files run "vi /etc/sysctl.conf" and change the line that reads:
net.ipv4.tcp_timestamps = 0
To instead read:
net.ipv4.tcp_timestamps = 1
Save the file using either ESC + :wq or ESC + Z + Z *without entering the plus sign
Else run from the command line
sed -i /etc/sysctl.conf -e 's/^net\.ipv4\.tcp_timestamps =.*/net\.ipv4\.tcp_timestamps = 1/'
5. Verify the tcp_timestamps now equals 1 in the file by running
grep timestamps /etc/sysctl.conf
6. Run "sysctl -p" at the command line.
Note: This change does not require a reboot.
7. Confirm the setting change by running the below again
sysctl -a | grep tcp_timestamps
Note: If after making the above change there are still jobs that end with a status 14 or 24 make sure the TCP Chimney setting is disable on the clients experiencing the errors. This can be done on Windows 2008 and greater servers by running "netsh int tcp set global chimney=disabled" from an elevated command prompt; this also does not require a reboot. There can also be other reasons a client gets a status 24 error for example because of firewalls or virus scanners but this KB article does not cover those troubleshooting steps.
06-12-2014 06:30 AM
It may sound simplistic - especially if support already have this in hand - but i have seen the 7.6 upgrades re-set the timeouts on the Media Servers - all back to 300
Over time we all end up increasing them to 1800 or 3600 for client read timeout on the media servers to keep our connections good during large slow backups - suddenly having these dissapear can cause issue
Work just taking a look at all of the timeouts and compare you 2.5 media servers with the 2.6 ones
Hope this helps
06-13-2014 01:29 AM
Yes. We increased the timeouts but it did not help us.
Moreover while speaking with backline, we found new network modules been applied along with new OS. Some of them are tcp_fast and flow director.
As per direction from backline we tried to unload the module after which very strange thing happened. The customers network collapsed. The networking team said that the post where appliance was coonected behaved like root bridge and this choked up the network. Once they removed the cable from that port the network became stable.
Now we are in dilema what to do.....
06-13-2014 01:52 AM
Keyboard and monitor i guess and can it be reloaded.
Maybe worth asking them about the firmware upgrade that is available for the appliances just in case that contains anything that may help?
Mention the following to them:
S2600GZ_BIOS02010002SYM_ME20107231_BMC1195018_FRUSDR111_EWS-NEISYMC3
06-13-2014 02:25 AM
I did not understand how keyboard and monitor can be cause here. I hope you are replying to the correct post.
06-13-2014 02:38 AM
I meant that if it is all off the network you needed a keyboard and monitor attached to do anything now - not that it was the case - sorry for not being clear
06-15-2014 09:09 AM
Its ok !
I did the same and end up spending 5 hours in Data Center freeze. It was a hell of a lifetime experience.
07-22-2014 03:25 AM
07-22-2014 08:31 AM
Hello,
check the bp.conf. Is there an entry like "PREFERRED_NETWORK = 192.168.1.x PROHIBITED" or some other unneeded "PREFERRED_NETWORK =" entries ?
07-28-2014 04:18 AM
07-29-2014 11:29 PM
No solution yet ... many backline engg worked.
We are transitioning appliance with old code now.... :( this is very disappointing...
07-29-2014 11:46 PM
What level of firmware upgrade is this? Now one mentioned about this at Symantec support...
07-30-2014 12:53 AM
07-30-2014 01:50 AM
You are not alone with this issue, I have experience this at several customers. Have tried several ideas to solve this but nothing works so I expect it to be yet another bug in the software!
I have no solution for it other than to stay with 2.5.x for now until 2.6.0.3 is released.
And do extensive testing of 2.6.0.3 when it is released, hopefully several severe bugs will be fixed by this version.
07-30-2014 10:57 AM
Has this issue been resolved yet or still outstanding?
Sully
07-30-2014 11:00 PM
08-06-2014 08:37 AM
No resolution yet!
Today we upgraded one of the appliance to 2.6.0.3 and going to redirect the backups to it from the temporary appliance that we were ising for backups for last 2 weeks.
I will let you guys know how it goes.
08-08-2014 08:51 AM
We too are experiencing nightly status 24 errors with 2.6.0.2 on 5230 Appliance, where I worked for WEEKS with Symantec TS without resolution. The issue began immediately after migration from 7.5.0.4 on Windows master to shiny new 5230 master with 2.6.0.2.
Exchange data always causes the issue, whether single or multiple streams, after about 20 minutes, followed by a series of retries that eventually terminate when the backup windows ends. Other policy types are effected occasionally. The issue only occurs when the data movement is over the LAN (data moved as FC block is always successful).
The workaround we are using is to backup to an SLP, with 1st stop being AdvancedDisk with short retention (on the internal disks), then duplicating to MSDP with long retention. This gets around the status 24 errors.
I am eagerly awaiting the result of 2.6.0.3 relative to this issue !!!
Ken W
08-08-2014 01:37 PM
There have been different issues when WAN optimization is enabled.
Here is one such issue that also happens to show how to turn it off (Very easy in clish).
Worth a try.
08-09-2014 06:28 AM
We are also seeeing these issues after upgradeing to 2.6.0.2 as well.