Error bpbrm: listen for client timeout during acce...

bsb · ‎04-18-2011

I keep receiving the error message in the title even though I have gone through and set my timeout settings to at least 900 seconds. I have also run all the bpclntcmd commands and all of them return appropriately.

Is the 60 second timeout referred to in the error message a client side OS setting? I am running Red-Hat 5.

Thanks

Nicolai · ‎04-18-2011

This is a issue when the OS doesn't return a list of files after 60 seconds. Try do a "ls -l" on the directory with the most files and see if it takes more than 60 seconds complete. Be aware that re-issuing the command properly will return the file list pretty quick since the list is in the file cache.

To find the directory that causes the timeout follow this T/N:

http://www.symantec.com/docs/TECH31513

You can't change the 60 second timeout, with CLIENT_READ_TIME or CLIENT_CONNENCT_TIMEOUT. To remedy the issue you can reduce the number of files, exclude it for backup or tune the file system.

bsb · ‎04-18-2011

Sorry I should have mentioned this is an RMAN backup not filesystem. I looked at the tech doc you linked to though and turned on the higher level tracing as it suggested. I am seeing some errors from vnetd:

13:21:56.140 [18374] <2> vnet_sock_ready: vnet.c.507: max_time: 120 0x00000078
13:21:56.140 [18374] <2> vnet_sock_ready: vnet.c.508: sock: 8 0x00000008
13:21:56.140 [18374] <2> vnet_sock_ready: vnet.c.509: Function failed: 11 0x0000000b
13:21:56.140 [18374] <2> vnet_pop_byte: vnet.c.173: Function failed: 11 0x0000000b
13:21:56.140 [18374] <2> vnet_pop_string: vnet.c.268: Function failed: 11 0x0000000b
13:21:56.140 [18374] <2> vnet_pop_signed: vnet.c.312: Function failed: 11 0x0000000b
13:21:56.140 [18374] <2> vnet_pop_status: vnet.c.390: Function failed: 11 0x0000000b
13:21:56.140 [18374] <2> vnet_vnetd_connect_forward_socket_end: vnet_vnetd.c.614: try again: 540 0x0000021c

Thanks

rizwan84tx · ‎04-18-2011

Could you enable bpcd log on the client (Verbose 5) and check if a response was received.

bsb · ‎04-19-2011

I turned on logging. I do not see much in the BPCD logs. In the activity monitor I see a status 54 at 12:57:59 AM and in my RMAN logs I see several errors like the following:

        RMAN-03009: failure of backup command on s1 channel at 04/19/2011 01:14:10
        ORA-19506: failed to create sequential file, name="c-2237647430-20110412-00", parms=""
        ORA-27028: skgfqcre: sbtbackup returned error
        ORA-19511: Error received from media manager layer, error text:
        VxBSACreateObject: Failed with error:
        Server Status: timed out connecting to client
        ORA-19600: input file is backup piece (+ARCHIVE/autobackup/2011_04_12/s_748224764.305.748224765)
        ORA-19601: output file is backup piece (c-2237647430-20110412-00)
        channel s1 disabled, job failed on it will be run on another channel

In the BPCD log at that time I see where the failure occured but there is no actual error message. It just submits a close request then another open.

00:57:59.825 [6374] <2> process_requests: BPCD_OPEN_FOR_WRITE_RQST
00:57:59.826 [6374] <2> process_requests: open for write file name = /usr/openv/netbackup/logs/user_ops/dbext/logs/4769.0.1303189075
00:57:59.828 [6374] <2> process_requests: buffer length = 22
00:57:59.829 [6374] <2> process_requests: our open flags = 0x119
00:57:59.829 [6374] <2> process_requests: client open flags = 0x1441
00:57:59.829 [6374] <2> process_requests: completed the write of the file number to the output socket
00:57:59.833 [6374] <2> process_requests: BPCD_LOG_RQST
00:57:59.900 [6374] <2> process_requests: BPCD_CLOSE_FILE_RQST
00:58:01.858 [6374] <2> process_requests: BPCD_OPEN_FOR_WRITE_RQST
00:58:01.860 [6374] <2> process_requests: open for write file name = /usr/openv/netbackup/logs/user_ops/dbext/logs/4751.0.1303189077
00:58:01.861 [6374] <2> process_requests: buffer length = 22
00:58:01.863 [6374] <2> process_requests: our open flags = 0x119
00:58:01.863 [6374] <2> process_requests: client open flags = 0x1441
00:58:01.863 [6374] <2> process_requests: completed the write of the file number to the output socket
00:58:01.867 [6374] <2> process_requests: BPCD_LOG_RQST
00:58:01.930 [6374] <2> process_requests: BPCD_CLOSE_FILE_RQST
00:58:09.286 [7145] <2> process_requests: BPCD_DISCONNECT_RQST
00:58:09.286 [7145] <2> bpcd exit_bpcd: exit status 0 ----------->exiting

Since the backups write many of the files it is not a connection issue. It is consistent but not constant. Do I need to set one of the timeouts to be as long as the max file write time? I do not see any realted to how long a file takes to write.

Thanks

Nicolai · ‎04-19-2011

Try ramp-up CLIENT_READ_TIMEOUT. RMAN sometimes need very long time to determine what archives is needed. All this time Netbackup need to wait. Try setting CLIENT_READ_TIMEOUT = 86400 and retry the operation. Today where all communication is passed thru 13724, more meaningful messages may be masked as sockets errors because of vnetd.

bsb · ‎04-22-2011

I set the read timeout to the max it would allow me (327667) and it still timed out.

readCommFile: ERR - timed out after 32767 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/20626.0.1303445742

Brian

Nicolai · ‎04-23-2011

Can you confirm that a connection is really established ?

How fragmented is the recover catalog ?. I have seen really long response time from a recover catalog because some household routines of the RMAN catalog wasn't running (index update etc etc). Talk to the DBA about this.

VOX

Error bpbrm: listen for client timeout during accept from data listen socket after 60 seconds