cancel
Showing results for 
Search instead for 
Did you mean: 

After upgrading appliance to 2.6.0.2 , backups fail intermittantly with error 24 and 14

GulzarShaikhAUS
Level 6
Partner Accredited Certified

Hi All,

We recently upgraded on of our appliance from 2.5.3 to 2.6.0.2. It was a direct upgrade to 2.6.0.2 version. Since the upgrade backups are failing with error 24 and 14 mainly socket errors and we can see winsock error 10060 in bpbkar.

The interesting thing is if we move the backups to the appliance which is not yet upgraded to 2.6.0.2 and is still at 2.5.3 level, the backup goes on fine.

We have opened a case with Symantec and are still struggling for more than a week.

Anyone of you had similar issues after upgrade. Kindly share and advise.

1 ACCEPTED SOLUTION

Accepted Solutions

ChetteB
Level 3
Partner Accredited

I have seen this before. Here are the steps that should help.

Symptoms:

Network related errors/status code 14, 24, 40, and/or 42 are occuring in large numbers on Symantec NetBackup Appliances running 2.6.0.1, 2.6.0.2, or 2.6.0.3 code. 

Cause:

Symantec attempted to tune the TCP stack on the appliances in the 2.6.0.x and one of the settings changed, the net.ipv4.tcp_timestamps kernel setting, is resulting in these errors.  In prior versions of the appliances, 2.5.x, this setting had been a value of 1 but in the 2.6.0.x code this was changed to 0 (zero). 

Resolution:

 

1. Using Putty, another SSH utility, or via the IPMI/remote console connection, log into the appliance as the admin user and become root in the maintenance mode.

    A.  Go to the Main Menu->Support->Maintenance option and enter the maintenance password.

    B.  run "/opt/Symantec/scspagent/IPS/sisipsoverride.sh" to temporarily override the security policy.  This should take the same maintenance password as entered above.

    C.  type "elevate" to become root

2. Confirm the current setting is equal to zero by running

    sysctl -a | grep tcp_timestamps

3. As the root user, make a backup copy of the file to edit:

    cp "/etc/sysctl.conf  /etc/sysctl.conf.tcptimestampsoff"

4. Then if comfortable using vi to edit files run "vi /etc/sysctl.conf" and change the line that reads: 

        net.ipv4.tcp_timestamps = 0

        To instead read:

        net.ipv4.tcp_timestamps = 1

        Save the file using either ESC + :wq or ESC + Z + Z       *without entering the plus sign

    Else run from the command line

        sed -i /etc/sysctl.conf -e 's/^net\.ipv4\.tcp_timestamps =.*/net\.ipv4\.tcp_timestamps = 1/'

5. Verify the tcp_timestamps now equals 1 in the file by running

    grep timestamps /etc/sysctl.conf

6. Run "sysctl -p" at the command line.

     Note:  This change does not require a reboot.

7. Confirm the setting change by running the below again

    sysctl -a | grep tcp_timestamps

 

Note:  If after making the above change there are still jobs that end with a status 14 or 24 make sure the TCP Chimney setting is disable on the clients experiencing the errors.  This can be done on Windows 2008 and greater servers by running "netsh int tcp set global chimney=disabled" from an elevated command prompt; this also does not require a reboot.  There can also be other reasons a client gets a status 24 error for example because of firewalls or virus scanners but this KB article does not cover those troubleshooting steps.

View solution in original post

36 REPLIES 36

Mark_Solutions
Level 6
Partner Accredited Certified

It may sound simplistic - especially if support already have this in hand - but i have seen the 7.6 upgrades re-set the timeouts on the Media Servers - all back to 300

Over time we all end up increasing them to 1800 or 3600 for client read timeout on the media servers to keep our connections good during large slow backups - suddenly having these dissapear can cause issue

Work just taking a look at all of the timeouts and compare you 2.5 media servers with the 2.6 ones

Hope this helps

GulzarShaikhAUS
Level 6
Partner Accredited Certified

Yes. We increased the timeouts but it did not help us.

Moreover while speaking with backline, we found new network modules been applied along with new OS. Some of them are tcp_fast and flow director.

As per direction from backline we tried to unload the module after which very strange thing happened. The customers network collapsed. The networking team said that the post where appliance was coonected behaved like root bridge and this choked up the network. Once they removed the cable from that port the network became stable.

Now we are in dilema what to do.....

Mark_Solutions
Level 6
Partner Accredited Certified

Keyboard and monitor i guess and can it be reloaded.

Maybe worth asking them about the firmware upgrade that is available for the appliances just in case that contains anything that may help?

Mention the following to them:

S2600GZ_BIOS02010002SYM_ME20107231_BMC1195018_FRUSDR111_EWS-NEISYMC3

GulzarShaikhAUS
Level 6
Partner Accredited Certified

I did not understand how keyboard and monitor can be cause here. I hope you are replying to the correct post.

Mark_Solutions
Level 6
Partner Accredited Certified

I meant that if it is all off the network you needed a keyboard and monitor attached to do anything now - not that it was the case - sorry for not being clear

GulzarShaikhAUS
Level 6
Partner Accredited Certified

Its ok !

I did the same and end up spending 5 hours in Data Center freeze. It was a hell of a lifetime experience.

backup-user
Level 3
Partner Accredited Certified
Question: What Version was installed on the clients? Did you update the clients to 7.6.0.2 also? Are the problem persistent with current client version?

RKastner
Level 3
Partner Employee Accredited Certified

Hello,

check the bp.conf. Is there an entry like "PREFERRED_NETWORK = 192.168.1.x PROHIBITED" or some other unneeded "PREFERRED_NETWORK =" entries ?

 

backup-user
Level 3
Partner Accredited Certified
Hello SymGuy-IT, ist there a solution from support? We have similar problems and support tells us our network was the problem. But the problem was exactly after the upgrade. So there is a change in the 2.6.0.2 software against the 2.5.3 version Regards Andreas

GulzarShaikhAUS
Level 6
Partner Accredited Certified

No solution yet ... many backline engg worked.

We are transitioning appliance with old code now.... :( this is very disappointing...

GulzarShaikhAUS
Level 6
Partner Accredited Certified

What level of firmware upgrade is this? Now one mentioned about this at Symantec support...

backup-user
Level 3
Partner Accredited Certified
If there is a solution please post it because support tells our LAN is bad and there will be no invest in further doing till LAN is solved.

Bengt_H
Level 3
Partner Accredited

You are not alone with this issue, I have experience this at several customers. Have tried several ideas to solve this but nothing works so I expect it to be yet another bug in the software!
I have no solution for it other than to stay with 2.5.x for now until 2.6.0.3 is released.
And do extensive testing of 2.6.0.3 when it is released, hopefully several severe bugs will be fixed by this version.

Sulivan77
Level 4

Has this issue been resolved yet or still outstanding?

 

Sully

backup-user
Level 3
Partner Accredited Certified
Our customer told me that using client dedup is working and could be a workaround. Without client dedup (with or without accelerator) jobs stops with known error code. So my feling is that there is something wrong with parts of the network (driver, firmware of 2.6.x).

GulzarShaikhAUS
Level 6
Partner Accredited Certified

No resolution yet!

Today we upgraded one of the appliance to 2.6.0.3 and going to redirect the backups to it from the temporary appliance that we were ising for backups for last 2 weeks.

I will let you guys know how it goes.

Haniwa
Level 4
Partner Accredited Certified

We too are experiencing nightly status 24 errors with 2.6.0.2 on 5230 Appliance, where I worked for WEEKS with Symantec TS without resolution. The issue began immediately after migration from 7.5.0.4 on Windows master to shiny new 5230 master with 2.6.0.2.

Exchange data always causes the issue, whether single or multiple streams, after about 20 minutes, followed by a series of retries that eventually terminate when the backup windows ends. Other policy types are effected occasionally. The issue only occurs when the data movement is over the LAN (data moved as FC block is always successful).

The workaround we are using is to backup to an SLP, with 1st stop being AdvancedDisk with short retention (on the internal disks), then duplicating to MSDP with long retention. This gets around the status 24 errors.

I am eagerly awaiting the result of 2.6.0.3 relative to this issue !!!

Ken W

mnolan
Level 6
Employee Accredited Certified

There have been different issues when WAN optimization is enabled.

Here is one such issue that also happens to show how to turn it off (Very easy in clish).

Worth a try.

 

http://www.symantec.com/docs/TECH222380

ejporter
Level 4

We are also seeeing these issues after upgradeing to 2.6.0.2 as well.