Highlighted

After upgrading appliance to 2.6.0.2 , backups fail intermittantly with error 24 and 14

Hi All,

We recently upgraded on of our appliance from 2.5.3 to 2.6.0.2. It was a direct upgrade to 2.6.0.2 version. Since the upgrade backups are failing with error 24 and 14 mainly socket errors and we can see winsock error 10060 in bpbkar.

The interesting thing is if we move the backups to the appliance which is not yet upgraded to 2.6.0.2 and is still at 2.5.3 level, the backup goes on fine.

We have opened a case with Symantec and are still struggling for more than a week.

Anyone of you had similar issues after upgrade. Kindly share and advise.

1 Solution

Accepted Solutions
Highlighted
Accepted Solution!

I have seen this before. Here

I have seen this before. Here are the steps that should help.

Symptoms:

Network related errors/status code 14, 24, 40, and/or 42 are occuring in large numbers on Symantec NetBackup Appliances running 2.6.0.1, 2.6.0.2, or 2.6.0.3 code. 

Cause:

Symantec attempted to tune the TCP stack on the appliances in the 2.6.0.x and one of the settings changed, the net.ipv4.tcp_timestamps kernel setting, is resulting in these errors.  In prior versions of the appliances, 2.5.x, this setting had been a value of 1 but in the 2.6.0.x code this was changed to 0 (zero). 

Resolution:

 

1. Using Putty, another SSH utility, or via the IPMI/remote console connection, log into the appliance as the admin user and become root in the maintenance mode.

    A.  Go to the Main Menu->Support->Maintenance option and enter the maintenance password.

    B.  run "/opt/Symantec/scspagent/IPS/sisipsoverride.sh" to temporarily override the security policy.  This should take the same maintenance password as entered above.

    C.  type "elevate" to become root

2. Confirm the current setting is equal to zero by running

    sysctl -a | grep tcp_timestamps

3. As the root user, make a backup copy of the file to edit:

    cp "/etc/sysctl.conf  /etc/sysctl.conf.tcptimestampsoff"

4. Then if comfortable using vi to edit files run "vi /etc/sysctl.conf" and change the line that reads: 

        net.ipv4.tcp_timestamps = 0

        To instead read:

        net.ipv4.tcp_timestamps = 1

        Save the file using either ESC + :wq or ESC + Z + Z       *without entering the plus sign

    Else run from the command line

        sed -i /etc/sysctl.conf -e 's/^net\.ipv4\.tcp_timestamps =.*/net\.ipv4\.tcp_timestamps = 1/'

5. Verify the tcp_timestamps now equals 1 in the file by running

    grep timestamps /etc/sysctl.conf

6. Run "sysctl -p" at the command line.

     Note:  This change does not require a reboot.

7. Confirm the setting change by running the below again

    sysctl -a | grep tcp_timestamps

 

Note:  If after making the above change there are still jobs that end with a status 14 or 24 make sure the TCP Chimney setting is disable on the clients experiencing the errors.  This can be done on Windows 2008 and greater servers by running "netsh int tcp set global chimney=disabled" from an elevated command prompt; this also does not require a reboot.  There can also be other reasons a client gets a status 24 error for example because of firewalls or virus scanners but this KB article does not cover those troubleshooting steps.

View solution in original post

36 Replies
Highlighted

It may sound simplistic -

It may sound simplistic - especially if support already have this in hand - but i have seen the 7.6 upgrades re-set the timeouts on the Media Servers - all back to 300

Over time we all end up increasing them to 1800 or 3600 for client read timeout on the media servers to keep our connections good during large slow backups - suddenly having these dissapear can cause issue

Work just taking a look at all of the timeouts and compare you 2.5 media servers with the 2.6 ones

Hope this helps

Highlighted

Yes. We increased the

Yes. We increased the timeouts but it did not help us.

Moreover while speaking with backline, we found new network modules been applied along with new OS. Some of them are tcp_fast and flow director.

As per direction from backline we tried to unload the module after which very strange thing happened. The customers network collapsed. The networking team said that the post where appliance was coonected behaved like root bridge and this choked up the network. Once they removed the cable from that port the network became stable.

Now we are in dilema what to do.....

Highlighted

Keyboard and monitor i guess

Keyboard and monitor i guess and can it be reloaded.

Maybe worth asking them about the firmware upgrade that is available for the appliances just in case that contains anything that may help?

Mention the following to them:

S2600GZ_BIOS02010002SYM_ME20107231_BMC1195018_FRUSDR111_EWS-NEISYMC3

Highlighted

I did not understand how

I did not understand how keyboard and monitor can be cause here. I hope you are replying to the correct post.

Highlighted

I meant that if it is all off

I meant that if it is all off the network you needed a keyboard and monitor attached to do anything now - not that it was the case - sorry for not being clear

Highlighted

Its ok ! I did the same and

Its ok !

I did the same and end up spending 5 hours in Data Center freeze. It was a hell of a lifetime experience.

Highlighted

Question: What Version was

Question: What Version was installed on the clients? Did you update the clients to 7.6.0.2 also? Are the problem persistent with current client version?
Highlighted

Hello, check the bp.conf. Is

Hello,

check the bp.conf. Is there an entry like "PREFERRED_NETWORK = 192.168.1.x PROHIBITED" or some other unneeded "PREFERRED_NETWORK =" entries ?

 

Highlighted

Hello SymGuy-IT, ist there a

Hello SymGuy-IT, ist there a solution from support? We have similar problems and support tells us our network was the problem. But the problem was exactly after the upgrade. So there is a change in the 2.6.0.2 software against the 2.5.3 version Regards Andreas
Highlighted

No solution yet ... many

No solution yet ... many backline engg worked.

We are transitioning appliance with old code now.... Smiley Sad this is very disappointing...

Highlighted

What level of firmware

What level of firmware upgrade is this? Now one mentioned about this at Symantec support...

Highlighted

If there is a solution please

If there is a solution please post it because support tells our LAN is bad and there will be no invest in further doing till LAN is solved.
Highlighted

You are not alone with this

You are not alone with this issue, I have experience this at several customers. Have tried several ideas to solve this but nothing works so I expect it to be yet another bug in the software!
I have no solution for it other than to stay with 2.5.x for now until 2.6.0.3 is released.
And do extensive testing of 2.6.0.3 when it is released, hopefully several severe bugs will be fixed by this version.

Highlighted

Has this issue been resolved

Has this issue been resolved yet or still outstanding?

 

Sully

Highlighted

Our customer told me that

Our customer told me that using client dedup is working and could be a workaround. Without client dedup (with or without accelerator) jobs stops with known error code. So my feling is that there is something wrong with parts of the network (driver, firmware of 2.6.x).
Highlighted

No resolution yet! Today we

No resolution yet!

Today we upgraded one of the appliance to 2.6.0.3 and going to redirect the backups to it from the temporary appliance that we were ising for backups for last 2 weeks.

I will let you guys know how it goes.

Highlighted

We too are experiencing

We too are experiencing nightly status 24 errors with 2.6.0.2 on 5230 Appliance, where I worked for WEEKS with Symantec TS without resolution. The issue began immediately after migration from 7.5.0.4 on Windows master to shiny new 5230 master with 2.6.0.2.

Exchange data always causes the issue, whether single or multiple streams, after about 20 minutes, followed by a series of retries that eventually terminate when the backup windows ends. Other policy types are effected occasionally. The issue only occurs when the data movement is over the LAN (data moved as FC block is always successful).

The workaround we are using is to backup to an SLP, with 1st stop being AdvancedDisk with short retention (on the internal disks), then duplicating to MSDP with long retention. This gets around the status 24 errors.

I am eagerly awaiting the result of 2.6.0.3 relative to this issue !!!

Ken W

Highlighted

There have been different

There have been different issues when WAN optimization is enabled.

Here is one such issue that also happens to show how to turn it off (Very easy in clish).

Worth a try.

 

http://www.symantec.com/docs/TECH222380

Highlighted

We are also seeeing these

We are also seeeing these issues after upgradeing to 2.6.0.2 as well.