cancel
Showing results for 
Search instead for 
Did you mean: 

Constant errors 21, 23, and 25

Eamonn
Level 3

Master + Media servers: Netbackup 7.0.1, Windows Server 2003 R2

Clients: Netbackup 7.0.1 Windows Server 2008-R2 (VMs running IIS)

Getting intermittent failures on 2 clients, both of which are IIS boxes.

Failures are typically 21, 23, 25 (changes each time, and one drive's job might fail with a 21, while another with 23 or 25)

Happens with Full and Cumulative Incremental backups.

Have tried the following:

Rerun the jobs during the day (normally run at night) - Still fails

Created a brand new policy (not copying the old one) - Also fails

Telnet to BPCD port - Successful response from all servers to the clients

Changing the timeout connections - Still fails

Pinging/NSlookup/bptestbpcd from the master/media servers - Everything appears to be fine

Modified connect back under client properties - Still fails

Running bbps while the backups are starting I can see vnetd and bpcd start on the clients, but still it fails with the above listed status codes.

What makes this more confusing is the errors are intermittent. Sometimes one of the clients will back up without issue, or a drive or 2 will backup without issue, while the rest will fail (ie C: will back up, but D: and Shadow Copy Components:\ will fail, while another night D: and C: will backup, but Shadow Copy Components:\ will fail). Other times everything will fail.

The jobs fail more often than they succeed. Reinstalling the software/rebooting the clients doesn't help.

No amount of tweaking settings appears to result in any difference in sucess/failure.

Another odd issue with these clients is when the jobs start up they take a very long time to start attempting to back up data (vs how long it takes job that are not failing).

As well when I try to load host properities on either of the clients it can take upwards to 15 minutes to bring up the information, where as our other clients load up with in a minute or two.

Anyone run in to anything similar that might be able to shed some light on how to resolve this?

 

Example of one of the failed jobs:

19-Oct-2011 2:55:10 AM - requesting resource sg-mslb-ms02-ms04
19-Oct-2011 2:55:10 AM - requesting resource masterserver.NBU_CLIENT.MAXJOBS.client01
19-Oct-2011 2:55:10 AM - requesting resource masterserver.NBU_POLICY.MAXJOBS.VmwareGuests3
19-Oct-2011 2:55:11 AM - granted resource masterserver.NBU_CLIENT.MAXJOBS.client01
19-Oct-2011 2:55:11 AM - granted resource masterserver.NBU_POLICY.MAXJOBS.VmwareGuests3
19-Oct-2011 2:55:11 AM - granted resource V71611
19-Oct-2011 2:55:11 AM - granted resource Drive001
19-Oct-2011 2:55:11 AM - granted resource mediaserver02-hcart-robot-tld-0
19-Oct-2011 2:55:11 AM - estimated 4059396 Kbytes needed
19-Oct-2011 2:55:26 AM - started process bpbrm (26264)
19-Oct-2011 3:15:19 AM - mounting V71611
19-Oct-2011 3:15:19 AM - mounted; mount time: 00:00:00
19-Oct-2011 3:18:39 AM - positioning V71611 to file 120
19-Oct-2011 3:18:39 AM - positioned V71611; position time: 00:00:00
19-Oct-2011 3:18:40 AM - connecting
19-Oct-2011 3:22:34 AM - Error bpbrm(pid=25228) cannot create data socket, The operation completed successfully.  (0)  
19-Oct-2011 3:31:48 AM - mounted
19-Oct-2011 3:35:10 AM - positioning V71611 to file 120
19-Oct-2011 3:35:10 AM - positioned V71611; position time: 00:00:00
19-Oct-2011 3:48:25 AM - mounted
19-Oct-2011 3:48:25 AM - positioning V71611 to file 120
19-Oct-2011 3:55:00 AM - positioned V71611; position time: 00:06:35
19-Oct-2011 4:11:34 AM - mounted
19-Oct-2011 4:11:34 AM - positioning V71611 to file 120
19-Oct-2011 4:11:34 AM - positioned V71611; position time: 00:00:00
19-Oct-2011 4:11:34 AM - end writing
cannot connect on socket(25)

1 ACCEPTED SOLUTION

Accepted Solutions

Yasuhisa_Ishika
Level 6
Partner Accredited Certified
You should enable debug logging on both servers and clients. For detail, check Troubleshooting Guide. In addition, check following points. * DNS is stable * UAC is disabled * firewall is disabled, or port 1556 is open

View solution in original post

6 REPLIES 6

Yasuhisa_Ishika
Level 6
Partner Accredited Certified
You should enable debug logging on both servers and clients. For detail, check Troubleshooting Guide. In addition, check following points. * DNS is stable * UAC is disabled * firewall is disabled, or port 1556 is open

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Debug logs needed on client: bpcd and bpbkar

Please post these logs as attachment after the next failure. (rename logs to reflect process name.)

Mark_Solutions
Level 6
Partner Accredited Certified

A couple of things here ...

1. If connecting to the client takes that long then something is going amiss so may be a DNS issue - try using the hosts file on the client and adding the Master and Media Servers to it and on the Master and Media Servers add the client in to theirs.

2. Check that the server is not running out of disk space, especially on its system drive which could be affecting its performance and VSS capabilities

3. Also make sure you have increased the client read and connect timeouts on the Media Servers Host Properties

4. As they are IIS servers you may need to exclude or stop the web catalogs during the backups (if that is possible in your environment) - see this tech note:

http://www.symantec.com/docs/TECH19904

Hope these help

Eamonn
Level 3

Attached is the current log for the 2nd client which I already had in place, I've created log directories for the other client so we can check tonight.

I opened port 1556 as suggested on both clients and loaded host properties, and ran a full backup.

Host properties loaded with in a minute or two on both clients, which is a lot better than in the past.

Backups ran sucesfully for both drives on both clients. I had one client error out with status 13 on the system state, but when I reran the job it completed sucesfully. The other client skipped a file, but when I reran the job it ran without issue.

Given that these clients have been intermittent I'll need to watch them for a few days to see if the opening of port 1556 was a solution.

Typically our domain policy is set only to open ports 13720, 13724, and 13782 which allows all of our clients to backup, but given these clients are a bit atypical in setup I wouldn't be suprised if the opening of 1556 was what they needed.

I'll update by Friday afternoon on how the next few days go, and post further logs if we have more failures.

Mark_Solutions
Level 6
Partner Accredited Certified

Hi

Repeatedly in the log you get:

<16> process_requests: get_vnetd_forward_socket failed: 21
 

This indicates that there is still some sort of connection issue here and it is to do with vnetd rather than pbx

It may still be a DNS issue as it is contacted by a FQDN and has to do comparisions with short host names

Hosts file entries for the short name may help

As it is an IIS server you may just be running out of ports so could set

HKLM\System\CurrentControlSet\Services\TCPIP\Parameters\

new DWORD named TcpTimedWaitDelay with a decimal value or 30 to free the ports up quicker and you also need to run from a command line the netsh command to increase the number of ports (dont have to hand just now)

Mark_Solutions
Level 6
Partner Accredited Certified

Hi

Just to clarify the above (it was very late last night when i posted it!)

The entires show the following:

bpcd peer_hostname: Connection from host vicnbums04.viha.ca (10.193.0.236) port 1230
bpcd valid_server: comparing nbuprod and vicnbums04.viha.ca
bpcd valid_server: comparing vicnbums01 and vicnbums04.viha.ca
bpcd valid_server: comparing vicnbums02 and vicnbums04.viha.ca
bpcd valid_server: comparing vicnbums03 and vicnbums04.viha.ca
bpcd valid_server: comparing vicnbums04 and vicnbums04.viha.ca

So you need to ensure that all of those servers still exist and if possible add short name host entries for them on your clients so that it gets through the list quickly.

The netsh command you need to use is:

netsh int ipv4 set dynamicport tcp start=10000 num=50000

It is very useful to do this on the Media Servers as well as a troublesome client, along with the TcpTimedWaitDelay

For Windows O/S other than 2008 rather than use the netsh command add a registry key in the TCPIP Paraments section - DWORD with a decimal value of 65534.

The netsh takes effect instantly, the registry keys require a reboot.

To see what the state of your ports are you could do a netstat on the client.

Hope this helps