cancel
Showing results for 
Search instead for 
Did you mean: 

After upgrading appliance to 2.6.0.2 , backups fail intermittantly with error 24 and 14

GulzarShaikhAUS
Level 6
Partner Accredited Certified

Hi All,

We recently upgraded on of our appliance from 2.5.3 to 2.6.0.2. It was a direct upgrade to 2.6.0.2 version. Since the upgrade backups are failing with error 24 and 14 mainly socket errors and we can see winsock error 10060 in bpbkar.

The interesting thing is if we move the backups to the appliance which is not yet upgraded to 2.6.0.2 and is still at 2.5.3 level, the backup goes on fine.

We have opened a case with Symantec and are still struggling for more than a week.

Anyone of you had similar issues after upgrade. Kindly share and advise.

36 REPLIES 36

GulzarShaikhAUS
Level 6
Partner Accredited Certified

This was tried and did not help. The issue is persistent whether WAN optimization is ON or OFF.

Another update is upgrade to 2.6.0.3 did not help and the issue is still there.

ejporter
Level 4

Same here

sdo
Moderator
Moderator
Partner    VIP    Certified
Has anyone tried fresh install of v2.6.0.x followed by catalog restore? I ask as I'm wondering whether it's an update only issue or whether even fresh installs of v2.6.0.2/3 have this issue? Also wondering whether v2.5.x upgrades to v2.6.0.1 have this issue? And whether v2.6.0.1 patches/upgrades to v2.6.0.2/3 will have this issue, or not?

bmaro
Level 4

We have been chasing this issue around for over a week.  Same here we were told it was out network, ran appcritical test etc.  Still no fix.  Backups going to same appliance to basic disk do not experience the slow speeds or socket errors.  When we go to Puredisk the issue starts right away.  Still have a ticket opened with Symantec but so far no fix.  Has anyone figured out a fix on this?  

GulzarShaikhAUS
Level 6
Partner Accredited Certified

Hi. There is an ETRACK created for this. I will let you guys know once it is out.

Haniwa
Level 4
Partner Accredited Certified

Our status 24's started on a freshly installed 5320 (2.6.0.2) without any upgrade or catalog import... just taking over for the old Windows master. When we flipped back to the Windows master (7.5.0.4), there were no issues at all. We did this several times and it is consistent.

Our original client version was 6.5.x, then in the course of t-shooting, they were upgraded to 7.6.0.2, which had no effect on the issue.

fcurrie
Level 3
Employee

Hi all,

If you have a case opened with support with regard to this issue could you please pass on the number so I can look into it?

Thanks,

frank.

Heip
Level 0

same problem here.

Updated to 2.6.0.3 on a 5230. My exchange backups end incomplete with 42 and 24 errors.

24 on the backup, 42 on the snapshot. Already tried increasing timeouts and disabling WAN optimization. No effect.I am reverting back to backing up to virtual-tape to have at least disaster copy (no granular possible)

Will make a call aswell cause seems to be a real problem

Regards,

Heino

 

ChetteB
Level 3
Partner Accredited

We have seen customers who are experiencing this issue as well. In several of our cases we made the changes below and the issue immediately went away. 

Symptoms:

Network related errors/status code 14, 24, 40, and/or 42 are occuring in large numbers on Symantec NetBackup Appliances running 2.6.0.1, 2.6.0.2, or 2.6.0.3 code. 

Cause:

Symantec attempted to tune the TCP stack on the appliances in the 2.6.0.x and one of the settings changed, the net.ipv4.tcp_timestamps kernel setting, is resulting in these errors.  In prior versions of the appliances, 2.5.x, this setting had been a value of 1 but in the 2.6.0.x code this was changed to 0 (zero). 

Resolution:

 

1. Using Putty, another SSH utility, or via the IPMI/remote console connection, log into the appliance as the admin user and become root in the maintenance mode.

    A.  Go to the Main Menu->Support->Maintenance option and enter the maintenance password.

    B.  run "/opt/Symantec/scspagent/IPS/sisipsoverride.sh" to temporarily override the security policy.  This should take the same maintenance password as entered above.

    C.  type "elevate" to become root

2. Confirm the current setting is equal to zero by running

    sysctl -a | grep tcp_timestamps

3. As the root user, make a backup copy of the file to edit:

    cp "/etc/sysctl.conf  /etc/sysctl.conf.tcptimestampsoff"

4. Then if comfortable using vi to edit files run "vi /etc/sysctl.conf" and change the line that reads: 

        net.ipv4.tcp_timestamps = 0

        To instead read:

        net.ipv4.tcp_timestamps = 1

        Save the file using either ESC + :wq or ESC + Z + Z       *without entering the plus sign

    Else run from the command line

        sed -i /etc/sysctl.conf -e 's/^net\.ipv4\.tcp_timestamps =.*/net\.ipv4\.tcp_timestamps = 1/'

5. Verify the tcp_timestamps now equals 1 in the file by running

    grep timestamps /etc/sysctl.conf

6. Run "sysctl -p" at the command line.

     Note:  This change does not require a reboot.

7. Confirm the setting change by running the below again

    sysctl -a | grep tcp_timestamps

 

Note:  If after making the above change there are still jobs that end with a status 14 or 24 make sure the TCP Chimney setting is disable on the clients experiencing the errors.  This can be done on Windows 2008 and greater servers by running "netsh int tcp set global chimney=disabled" from an elevated command prompt; this also does not require a reboot.  There can also be other reasons a client gets a status 24 error for example because of firewalls or virus scanners but this KB article does not cover those troubleshooting steps.

ChetteB
Level 3
Partner Accredited

I have seen this before. Here are the steps that should help.

Symptoms:

Network related errors/status code 14, 24, 40, and/or 42 are occuring in large numbers on Symantec NetBackup Appliances running 2.6.0.1, 2.6.0.2, or 2.6.0.3 code. 

Cause:

Symantec attempted to tune the TCP stack on the appliances in the 2.6.0.x and one of the settings changed, the net.ipv4.tcp_timestamps kernel setting, is resulting in these errors.  In prior versions of the appliances, 2.5.x, this setting had been a value of 1 but in the 2.6.0.x code this was changed to 0 (zero). 

Resolution:

 

1. Using Putty, another SSH utility, or via the IPMI/remote console connection, log into the appliance as the admin user and become root in the maintenance mode.

    A.  Go to the Main Menu->Support->Maintenance option and enter the maintenance password.

    B.  run "/opt/Symantec/scspagent/IPS/sisipsoverride.sh" to temporarily override the security policy.  This should take the same maintenance password as entered above.

    C.  type "elevate" to become root

2. Confirm the current setting is equal to zero by running

    sysctl -a | grep tcp_timestamps

3. As the root user, make a backup copy of the file to edit:

    cp "/etc/sysctl.conf  /etc/sysctl.conf.tcptimestampsoff"

4. Then if comfortable using vi to edit files run "vi /etc/sysctl.conf" and change the line that reads: 

        net.ipv4.tcp_timestamps = 0

        To instead read:

        net.ipv4.tcp_timestamps = 1

        Save the file using either ESC + :wq or ESC + Z + Z       *without entering the plus sign

    Else run from the command line

        sed -i /etc/sysctl.conf -e 's/^net\.ipv4\.tcp_timestamps =.*/net\.ipv4\.tcp_timestamps = 1/'

5. Verify the tcp_timestamps now equals 1 in the file by running

    grep timestamps /etc/sysctl.conf

6. Run "sysctl -p" at the command line.

     Note:  This change does not require a reboot.

7. Confirm the setting change by running the below again

    sysctl -a | grep tcp_timestamps

 

Note:  If after making the above change there are still jobs that end with a status 14 or 24 make sure the TCP Chimney setting is disable on the clients experiencing the errors.  This can be done on Windows 2008 and greater servers by running "netsh int tcp set global chimney=disabled" from an elevated command prompt; this also does not require a reboot.  There can also be other reasons a client gets a status 24 error for example because of firewalls or virus scanners but this KB article does not cover those troubleshooting steps.

Mark_Solutions
Level 6
Partner Accredited Certified

ChetteB .. very useful information - thanks

bmaro
Level 4

Thanks ChetteB we just received this solution from backline as well and can confirm it works.  We have two issues going on first the socket errors that was resolved and second when we do tapeouts(disk to tape for monthend needs) the backup performance drops to a point where no backups can run as their speeds are so slow.  I'm talking a huge difference-50,000kb per second down to 200kb per second on one of our appliances that houses windows ms-standard backups and vmware snapshot backups.  Symantec advised to limit the hours when we run duplication jobs and limit concurrent write drives however the problem is we have so much to duplicate with a small window and using 2-4 tape drives it will never finish.  Symantec agreed but unfortunately they cannot do anymore, the duplication jobs have a direct huge performance hit on the backups. We are strongly considered backing up to advanced disk then slp to msdp/tape just like kwachtler mentioned above or else we won't be able to fulfill or month-end requirements.  When we backup directly to advanced disk with the duplication jobs running we see no performance hit(no inline dup).  It's just very disappointing we refreshed our entire netbackup environment counting on these appliances and have spent a lot of money and the performance of these appliances have been very disappointing.  I never thought i would dream of the day when we had our falconstor vtl back...

Beavisrulz
Level 4

I am having the same issue. We are getting not only the status 24 and 14 errors, but also 13 and 87. I made the changes that ChetteB mentions above, including disabling the TCP Chimney, but did not fix the issue for us. I also made the following changes per several documents I found:

- Per TID TECH222380 (also mentioned above), disabled WANOptimization on the appliance under "Network-WANOptimization" using the CLISH. ***Did not help.

- Per TID TECH206337, changed the setting in the NET_BUFFER_SZ file on the appliance to "0". This disables auto-tuning for TCP stacks. ***Did not help.

- Changed the "Client Read Timeout" and "File Browse Timeout" from 300 to 2400 on the media servers (appliances) under "Host Properties-Media Servers-MediaServerName-Timeouts". ***Did not help.

- Per TID TECH55653, ran the following commands to disable other TCP Chimney parameters: "netsh int tcp set global rss=disabled" and "netsh int tcp set global netdma=disabled". The backup finished successfully, although it ran much slower than normal.

I will make the same changes to a few other clients and see what happens. Any thoughts?

Thanks,

Larry

Pawon
Level 2

Hello,

I am experiencing similar issue that windows FS backups are failing with SC 14

 

I tried the below solutions but no luck

1) changed the setting in the NET_BUFFER_SZ file on the appliance to "0".

2) Increasing the time outs.

3) Solution given by ChetteB

Please help me to fix the issue. Also, I would like to know how to download ETRACK for this.

 

Thanks

Pavan

 

ChetteB
Level 3
Partner Accredited

If the above fix does not work I have found these steps to also help

Symptoms:

Recommended for networking/socket errors. I.E. status 24's, 23's, 14's, 40's, etc...

Cause:

No network is perfect the following steps can assist in making things a bit more resilient to minor socket issues.

Resolution:

  1. Disable IPv6
  2. Increase TCP values to allow more tolerance for socket communications.
    1. Start Registry Editor
    2. Locate the following key in the registry:
    3. HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
    4. On the Edit menu, click Add Value , and then add the following registry values:

TCP Timed Wait Delay

  1. Value Name: TcpTimedWaitDelay
  2. Data Type: REG_DWORD
  3. Default: 240
  4. Set from default of 240 seconds to 30 seconds.

TCP Mad Data Retransmissions

  1. Value Name: TcpMaxDataRetransmissions
  2. Data Type: REG_DWORD
  3. Value: 6

TCP Max Connect Retransmissions

  1. Value Name: TcpMaxConnectRetransmissions
  2. Data Type: REG_DWORD
  3. Default: 2
  4. Set from default of 2 to 3.

Keep Alive Time 

  1. Value Name: KeepAliveTime
  2. Key: Tcpip\Parameters
  3. Value Type: REG_DWORD-time in milliseconds
  4. Valid Range: 1-0xFFFFFFFF
  5. Default: 7,200,000
  6. Set from default of 7,200,000 to 300,000.

5. Quit Registry Editor and reboot for the changes to take effect.

3. Next look to see the state of RSS, chimney and autotuning: (they all should be disabled)

For windows 2008 and above:

netsh int tcp show global

To disable:

  1. netsh int tcp set global chimney=disabled
  2. netsh int tcp set global RSS=disabled
  3. netsh int tcp set global autotuning=disabled 

For 2003:

netsh int ip set chimney DISABLED

How to disable receive-side scaling in Windows Server 2003

To disable receive-side scaling in the network adapter driver in Windows Server 2003, follow these steps:

  1. Click Start, click Run, type ncpa.cpl, and then click OK.
  2. Right-click a network adapter object, and then click Properties.
  3. Click Configure, and then click the Advanced tab.
  4. In the Property list, click Receive Side Scaling, click Disable in the Value list, and then click OK.
  5. Repeat steps 2 through 4 for each network adapter object.

NOTE: There is no auto-tuning for 2003.

GulzarShaikhAUS
Level 6
Partner Accredited Certified

Hello All,

For us enabling tcp time stamp worked :)

No 14/24 since last 6 days. It looks like the issue is resolved.

Anyone else with same result?

Haniwa
Level 4
Partner Accredited Certified

Problem resolved here, by enabling TCP_TIMESTAMPS on the 5230 Media server:

 

  1. Make a backup copy of the file to edit:
      # cp "/etc/sysctl.conf  /etc/sysctl.conf.tcptimestampsoff"

  2. Then in "/etc/sysctl.conf", change the line that reads:
     net.ipv4.tcp_timestamps = 0
    To instead read:
    net.ipv4.tcp_timestamps = 1
    Save the file and execute:
    # sysctl -p
 
 
Hope this helps others out as well..