cancel
Showing results for 
Search instead for 
Did you mean: 

windows 2003 client Backup failing with Rc 24 , required help

vits
Level 3
Certified

HI All,

Windows backup clinet is failing with Rc 24,

ping,bptestbpcd,bpclntcmd is working good from master and media server.

still backup is failing , please help.

 

Regards

Vit

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited

 

In my experience, Status 24 is hardly ever NBU (in fact, I don't think I have ever seen a status 24 failure caused by NetBackup myself)
 
Something below normally fixes it ...  Yes, it is a lot to read, and will probably tyake a number of hours to go through.
 
If this is a Windows client, a very common cause is the TCP Chimmey settings  - http://www.symantec.com/docs/TECH55653
 
I have given a number of technotes below (the odd one may be 'internal' only) , and have show a summary of the solutions, as well as the odd extra note.
 
 
http://www.symantec.com/docs/TECH76201
 
Possible solution to Status 24 by increasing TCP receive buffer space 
 
http://www.symantec.com/docs/TECH34183 
this Technote, although written for Solaris, shows how TCP tunings can 
cause status 24s. I am sure your system admins will be aware of the 
corresponding setting for the windows operating system. 
 
http://www.symantec.com/docs/TECH55653 
This technote is very important. It covers many many issues that can 
occur when either TCP Chimney Offload, TCP/IP Offload Engine (TOE) or TCP 
Segmentation Offload (TSO) are enabled. It is recommend to disable 
these, as per the technote. 
 
I also understand that we have previously seen MS Patch KB92222 resolve status 24 issues.
 
 
 
http://www.symantec.com/docs/TECH150369
A write operation to a socket failed, these are possible cause for this issue:
 
A high network load.
Intermittent connectivity.
Packet reordering.
Duplex Mismatch between client and master server NICs.
Small network buffer size
 
 
http://support.microsoft.com/kb/942861 
SOLUTION/WORKAROUND:
Contact the hardware vendor of the NIC for the latest updates for their product as a resolution.
 
This problem occurs when the TCP Chimney Offload feature is enabled on the NetBackup Windows 2003 client.  Disable this feature to workaround this problem.
 
To do this, at a command prompt, enter the following:
Netsh int ip set chimney DISABLED
 
 
 
http://www.symantec.com/docs/TECH127930
The above messages almost always indicate a networking issue of some sort. In this case it was due to a faulty switch. There are rare occasions when the above messages are not caused by a networking issue, such as those addressed in http://www.symantec.com/docs/TECH72115. 
 
But note, the technote says the issue is 'almost always' network related, this can also include operating system settings.
 
 
http://www.symantec.com/docs/TECH145223
The issue was with the idle timeout setting on the firewall that was too low to allow backups and/or restores to complete. With the DMZ media server backing up a DMZ client the media server sends only the occasional meta data updates back to the master server in order to update the images catalog. If that TCP socket connection between the media server and master server is idle for a longer period than the firewall's idle timeout the firewall breaks the connection between the media server and master servder and thus the media server breaks the connection to the client producing the socket error.
Increasing the idle timeout setting on the firewall to a value larger than the amount of time a typical backups takes to complete should resolve the issue.
Also increasing the frequency of the TCP keepalive packets can also help maintain the socket during idle periods from the server's defaults.
 
 
Although you may not have a firewall between the client and the media server, this solution is another demonstation that the issue is network related, as opposed to NetBackup.
 
 
http://www.symantec.com/docs/S:TECH130343  (Internal technote)
 
The issue was found to be due to NIC card Network congestion (that is, network overloaded)
 
 
 
http://www.symantec.com/docs/TECH135924  
 
In this instance, the problem was isolated to this single machine making the point of failure isolated to the problematic new host.
 
If the problem is due to an unidentified corruption / misconfiguration in the new media server's TCP Stack and Winsock environment (as was the case in this example), executing these two commands, followed by a reboot will resolve the problem:
 
netsh int ip reset resetlog.txt   Microsoft Reference:  http://support.microsoft.com/kb/299357 
netsh winsock reset catalog    Microsoft Reference:  http://technet.microsoft.com/en-us/library/cc759700(WS.10).aspx 
 
NOTE: The above two commands will reset the Windows TCP Stack as well as the Windows Winsock environment back to the default values.  This means that if the host is configured with a static IP Address and other customized TCP settings, they will be lost and will need to be re-entered after the reboot.  The default TCP setting is to use DHCP and the host will be using DHCP upon booting up.
 
 
http://www.symantec.com/docs/TECH76201
Possible solution to Status 24 by increasing TCP receive buffer space 
 
 
http://www.symantec.com/docs/TECH34183 
this Technote, although written for Solaris, shows how TCP tunings can 
cause status 24s. I am sure your system admins will be aware of the 
corresponding setting for the windows operating system. 
 
http://www.symantec.com/docs/TECH55653 
This technote is very important. It covers many many issues that can 
occur when either TCP Chimney Offload, TCP/IP Offload Engine (TOE) or TCP 
Segmentation Offload (TSO) are enabled. It is recommend to disable 
these, as per the technote. 
 
I understand that we have previously seen MS Patch KB92222 resolve status 24 issues.
 
 
There are  2 possible issues that could be NBU related that could cause this :
 
1.  Client NBU version is higher than the media serevr
2.  Make sure the comunications buffer is not too high (http://www.symantec.com/docs/TECH60570
)
 
 
What to do next:
 
 
 
http://www.symantec.com/docs/TECH135924  (mentioned before, MS suggested fix)
http://www.symantec.com/docs/TECH60570  (communications buffer, mentioned above)
http://www.symantec.com/docs/TECH60844
 
 
If these do not resolve the situation, I would recommend you talk with the Operating system vendor.  In summary, apart from the Client version of software and the communication buffer size (set in host properties) I can find no other cause that could be NBU.  However, from the very detailed research I have done, I can find many many causes that are the network or operating system.
 
 
Martin

View solution in original post

6 REPLIES 6

revarooo
Level 6
Employee

Network connectivity issue?

What does the bpbkar (client), bptm and bpbrm (media server) logs say at the time the error occurred?

mph999
Level 6
Employee Accredited

 

In my experience, Status 24 is hardly ever NBU (in fact, I don't think I have ever seen a status 24 failure caused by NetBackup myself)
 
Something below normally fixes it ...  Yes, it is a lot to read, and will probably tyake a number of hours to go through.
 
If this is a Windows client, a very common cause is the TCP Chimmey settings  - http://www.symantec.com/docs/TECH55653
 
I have given a number of technotes below (the odd one may be 'internal' only) , and have show a summary of the solutions, as well as the odd extra note.
 
 
http://www.symantec.com/docs/TECH76201
 
Possible solution to Status 24 by increasing TCP receive buffer space 
 
http://www.symantec.com/docs/TECH34183 
this Technote, although written for Solaris, shows how TCP tunings can 
cause status 24s. I am sure your system admins will be aware of the 
corresponding setting for the windows operating system. 
 
http://www.symantec.com/docs/TECH55653 
This technote is very important. It covers many many issues that can 
occur when either TCP Chimney Offload, TCP/IP Offload Engine (TOE) or TCP 
Segmentation Offload (TSO) are enabled. It is recommend to disable 
these, as per the technote. 
 
I also understand that we have previously seen MS Patch KB92222 resolve status 24 issues.
 
 
 
http://www.symantec.com/docs/TECH150369
A write operation to a socket failed, these are possible cause for this issue:
 
A high network load.
Intermittent connectivity.
Packet reordering.
Duplex Mismatch between client and master server NICs.
Small network buffer size
 
 
http://support.microsoft.com/kb/942861 
SOLUTION/WORKAROUND:
Contact the hardware vendor of the NIC for the latest updates for their product as a resolution.
 
This problem occurs when the TCP Chimney Offload feature is enabled on the NetBackup Windows 2003 client.  Disable this feature to workaround this problem.
 
To do this, at a command prompt, enter the following:
Netsh int ip set chimney DISABLED
 
 
 
http://www.symantec.com/docs/TECH127930
The above messages almost always indicate a networking issue of some sort. In this case it was due to a faulty switch. There are rare occasions when the above messages are not caused by a networking issue, such as those addressed in http://www.symantec.com/docs/TECH72115. 
 
But note, the technote says the issue is 'almost always' network related, this can also include operating system settings.
 
 
http://www.symantec.com/docs/TECH145223
The issue was with the idle timeout setting on the firewall that was too low to allow backups and/or restores to complete. With the DMZ media server backing up a DMZ client the media server sends only the occasional meta data updates back to the master server in order to update the images catalog. If that TCP socket connection between the media server and master server is idle for a longer period than the firewall's idle timeout the firewall breaks the connection between the media server and master servder and thus the media server breaks the connection to the client producing the socket error.
Increasing the idle timeout setting on the firewall to a value larger than the amount of time a typical backups takes to complete should resolve the issue.
Also increasing the frequency of the TCP keepalive packets can also help maintain the socket during idle periods from the server's defaults.
 
 
Although you may not have a firewall between the client and the media server, this solution is another demonstation that the issue is network related, as opposed to NetBackup.
 
 
http://www.symantec.com/docs/S:TECH130343  (Internal technote)
 
The issue was found to be due to NIC card Network congestion (that is, network overloaded)
 
 
 
http://www.symantec.com/docs/TECH135924  
 
In this instance, the problem was isolated to this single machine making the point of failure isolated to the problematic new host.
 
If the problem is due to an unidentified corruption / misconfiguration in the new media server's TCP Stack and Winsock environment (as was the case in this example), executing these two commands, followed by a reboot will resolve the problem:
 
netsh int ip reset resetlog.txt   Microsoft Reference:  http://support.microsoft.com/kb/299357 
netsh winsock reset catalog    Microsoft Reference:  http://technet.microsoft.com/en-us/library/cc759700(WS.10).aspx 
 
NOTE: The above two commands will reset the Windows TCP Stack as well as the Windows Winsock environment back to the default values.  This means that if the host is configured with a static IP Address and other customized TCP settings, they will be lost and will need to be re-entered after the reboot.  The default TCP setting is to use DHCP and the host will be using DHCP upon booting up.
 
 
http://www.symantec.com/docs/TECH76201
Possible solution to Status 24 by increasing TCP receive buffer space 
 
 
http://www.symantec.com/docs/TECH34183 
this Technote, although written for Solaris, shows how TCP tunings can 
cause status 24s. I am sure your system admins will be aware of the 
corresponding setting for the windows operating system. 
 
http://www.symantec.com/docs/TECH55653 
This technote is very important. It covers many many issues that can 
occur when either TCP Chimney Offload, TCP/IP Offload Engine (TOE) or TCP 
Segmentation Offload (TSO) are enabled. It is recommend to disable 
these, as per the technote. 
 
I understand that we have previously seen MS Patch KB92222 resolve status 24 issues.
 
 
There are  2 possible issues that could be NBU related that could cause this :
 
1.  Client NBU version is higher than the media serevr
2.  Make sure the comunications buffer is not too high (http://www.symantec.com/docs/TECH60570
)
 
 
What to do next:
 
 
 
http://www.symantec.com/docs/TECH135924  (mentioned before, MS suggested fix)
http://www.symantec.com/docs/TECH60570  (communications buffer, mentioned above)
http://www.symantec.com/docs/TECH60844
 
 
If these do not resolve the situation, I would recommend you talk with the Operating system vendor.  In summary, apart from the Client version of software and the communication buffer size (set in host properties) I can find no other cause that could be NBU.  However, from the very detailed research I have done, I can find many many causes that are the network or operating system.
 
 
Martin

Omar_Villa
Level 6
Employee

Windows is the worst OS on socket handle, just reboot the box to clean up all the TIME_WAIT and CLOSING status sockets.

Yogesh9881
Level 6
Accredited

can you plz post o/p of

bptestbpcd -client <hostname> -verbose -debug

run from master and media server

vits
Level 3
Certified

Hi Yogesh,

below is the output.

bash-3.00$ sudo /usr/openv/netbackup/bin/admincmd/bptestbpcd -client lon-aps-01 -verbose -debug
Password:
15:35:27.841 [5443] <2> bptestbpcd: VERBOSE = 0
15:35:27.873 [5443] <2> parseNetspec: ../../libvlibs/nbconf.c.1567: vnet_inet_pton fails, assuming hostname: 0 0 0x00000000
15:35:27.874 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5661: 0: fopen() failed: 2 0x00000002
15:35:27.874 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5662: 0: fopen() failed: /usr/openv/var/host_cache/1c8/267593c8+0,1,50,2,2,0+lon-aps-01.txt
15:35:27.898 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5661: 0: fopen() failed: 2 0x00000002
15:35:27.898 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5662: 0: fopen() failed: /usr/openv/var/host_cache/1c8/267593c8+veritas_pbx,1,4,2,2,0+lon-aps-01.txt
15:35:27.923 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5661: 0: fopen() failed: 2 0x00000002
15:35:27.924 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5662: 0: fopen() failed: /usr/openv/var/host_cache/003/103ae03+0,1,50,2,2,0+10.54.182.20.txt
15:35:27.943 [5443] <2> parseNetspec: ../../libvlibs/nbconf.c.1567: vnet_inet_pton fails, assuming hostname: 0 0 0x00000000
15:35:27.964 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5661: 0: fopen() failed: 2 0x00000002
15:35:27.964 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5662: 0: fopen() failed: /usr/openv/var/host_cache/1c8/267593c8+vnetd,1,4,2,2,0+lon-aps-01.txt
15:35:27.984 [5443] <2> parseNetspec: ../../libvlibs/nbconf.c.1567: vnet_inet_pton fails, assuming hostname: 0 0 0x00000000
15:35:27.985 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5661: 0: fopen() failed: 2 0x00000002
15:35:27.985 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5662: 0: fopen() failed: /usr/openv/var/host_cache/1c8/267593c8+bpcd,1,4,2,2,0+lon-aps-01.txt
15:35:28.046 [5443] <2> parseNetspec: ../../libvlibs/nbconf.c.1567: vnet_inet_pton fails, assuming hostname: 0 0 0x00000000
15:35:38.134 [5443] <2> vnet_vnetd_pbx_c_supported: ../../libvlibs/vnet_vnetd.c.4867: 0: VN_REQUEST_PBX_C_SUPPORTED: 13 0x0000000d
15:35:38.301 [5443] <2> do_vnetd_service: ../../libvlibs/vnet_connect.c.1581: 0: remote host supports PBX, but PBX is not running: 0 0x00000000
15:35:38.301 [5443] <2> vnet_vnetd_service_socket: ../../libvlibs/vnet_vnetd.c.2196: 0: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
15:35:38.302 [5443] <2> vnet_vnetd_service_socket: ../../libvlibs/vnet_vnetd.c.2210: 0: service: bpcd
15:35:38.579 [5443] <2> logconnections: BPCD CONNECT FROM 10.49.102.36.51538 TO 10.54.182.20.13724 fd = 5
15:35:38.584 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5661: 0: fopen() failed: 2 0x00000002
15:35:38.584 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5662: 0: fopen() failed: /usr/openv/var/host_cache/003/103ae03+veritas_pbx,1,4,2,2,0+10.54.182.20.txt
15:35:38.607 [5443] <2> parseNetspec: ../../libvlibs/nbconf.c.1567: vnet_inet_pton fails, assuming hostname: 0 0 0x00000000
15:35:38.607 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5661: 0: fopen() failed: 2 0x00000002
15:35:38.607 [5443] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.5662: 0: fopen() failed: /usr/openv/var/host_cache/003/103ae03+vnetd,1,4,2,2,0+10.54.182.20.txt
15:35:38.627 [5443] <2> parseNetspec: ../../libvlibs/nbconf.c.1567: vnet_inet_pton fails, assuming hostname: 0 0 0x00000000
15:35:48.711 [5443] <2> vnet_vnetd_pbx_c_supported: ../../libvlibs/vnet_vnetd.c.4867: 0: VN_REQUEST_PBX_C_SUPPORTED: 13 0x0000000d
15:35:48.855 [5443] <2> do_vnetd_service: ../../libvlibs/vnet_connect.c.1581: 0: remote host supports PBX, but PBX is not running: 0 0x00000000
15:35:48.855 [5443] <2> do_vnetd_service: ../../libvlibs/vnet_connect.c.1615: 0: connect: VNETD CONNECT FROM 10.49.102.36.51642 TO 10.54.182.20.13724 fd = 6
15:35:48.894 [5443] <2> vnet_vnetd_connect_forward_socket_begin: ../../libvlibs/vnet_vnetd.c.540: 0: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
15:35:48.896 [5443] <2> vnet_vnetd_connect_forward_socket_begin: ../../libvlibs/vnet_vnetd.c.557: 0: ipc_string: 1855
15:35:49.008 [5443] <2> local_bpcr_connect: expected reserved port, got 13724
1 1 1
10.49.102.36:51538 -> 10.54.182.20:13724
10.49.102.36:51642 -> 10.54.182.20:13724
15:35:49.211 [5443] <2> bpcr_get_peername_rqst: Server peername length = 29
15:35:49.331 [5443] <2> bpcr_get_hostname_rqst: Server hostname length = 10
15:35:49.451 [5443] <2> bpcr_get_clientname_rqst: Server client name length = 10
15:35:49.570 [5443] <2> bpcr_get_version_rqst: bpcd version: 07010000
15:35:49.691 [5443] <2> bpcr_get_platform_rqst: Server client platform length = 2
15:35:49.810 [5443] <2> bpcr_get_version_rqst: bpcd version: 07010000
15:35:49.931 [5443] <2> bpcr_patch_version_rqst: theRest == > <
15:35:49.932 [5443] <2> bpcr_get_version_rqst: bpcd version: 07010000
15:35:50.051 [5443] <2> bpcr_patch_version_rqst: theRest == > <
15:35:50.052 [5443] <2> bpcr_get_version_rqst: bpcd version: 07010000
PEER_NAME = nbu-master.cliffordchance.com
HOST_NAME = lon-aps-01
CLIENT_NAME = lon-aps-01
VERSION = 0x07010000
PLATFORM = nt
PATCH_VERSION = 7.0.1.0
SERVER_PATCH_VERSION = 7.0.1.0
MASTER_SERVER = nbu-master.cliffordchance.com
EMM_SERVER = nbu-master.cliffordchance.com
15:35:50.177 [5443] <2> parseNetspec: ../../libvlibs/nbconf.c.1567: vnet_inet_pton fails, assuming hostname: 0 0 0x00000000
15:35:50.179 [5443] <2> parseNetspec: ../../libvlibs/nbconf.c.1567: vnet_inet_pton fails, assuming hostname: 0 0 0x00000000
15:36:00.263 [5443] <2> vnet_vnetd_pbx_c_supported: ../../libvlibs/vnet_vnetd.c.4867: 0: VN_REQUEST_PBX_C_SUPPORTED: 13 0x0000000d
15:36:00.431 [5443] <2> do_vnetd_service: ../../libvlibs/vnet_connect.c.1581: 0: remote host supports PBX, but PBX is not running: 0 0x00000000
15:36:00.431 [5443] <2> do_vnetd_service: ../../libvlibs/vnet_connect.c.1615: 0: connect: VNETD CONNECT FROM 10.49.102.36.51670 TO 10.54.182.20.13724 fd = 7
15:36:00.431 [5443] <2> vnet_vnetd_connect_forward_socket_begin: ../../libvlibs/vnet_vnetd.c.540: 0: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
15:36:00.433 [5443] <2> vnet_vnetd_connect_forward_socket_begin: ../../libvlibs/vnet_vnetd.c.557: 0: ipc_string: 1865
10.49.102.36:51670 -> 10.54.182.20:13724
<2>bptestbpcd: EXIT status = 0
15:36:00.588 [5443] <2> bptestbpcd: EXIT status = 0

Yogesh9881
Level 6
Accredited

As per the provided logs .. it seems everything is NORMAL.

can you plz update host file as shown below (with FQDN (just try))?

on Master server 

10.xx.xx.xx  lon-aps-01.cliffordchance.com    lon-aps-01

on Client server

10.xx.xx.xx nbu-master.cliffordchance.com    nbu-master

Few more is it faiing immediately or after some time ? (kindly post detail status from activity monitor)