Forum Discussion

bsb's avatar
bsb
Level 3
14 years ago

Error bpbrm: listen for client timeout during accept from data listen socket after 60 seconds

I keep receiving the error message in the title even though I have gone through and set my timeout settings to at least 900 seconds. I have also run all the bpclntcmd commands and all of them return appropriately.

 

Is the 60 second timeout referred to in the error message a client side OS setting? I am running Red-Hat 5.

Thanks

  • This is a issue when the OS doesn't return a list of files after 60 seconds. Try do a "ls -l" on the directory with the most files and see if it takes more than 60 seconds complete. Be aware that re-issuing the command properly will return the file list pretty quick since the list is in the file cache.

    To find the directory that causes the timeout follow this T/N:

    http://www.symantec.com/docs/TECH31513

    You can't change the 60 second timeout, with CLIENT_READ_TIME or CLIENT_CONNENCT_TIMEOUT. To remedy the issue you can reduce the number of files, exclude it for backup or tune the file system.

  • Sorry I should have mentioned this is an RMAN backup not filesystem. I looked at the tech doc you linked to though and turned on the higher level tracing as it suggested. I am seeing some errors from vnetd:

     

    13:21:56.140 [18374] <2> vnet_sock_ready: vnet.c.507: max_time: 120 0x00000078
    13:21:56.140 [18374] <2> vnet_sock_ready: vnet.c.508: sock: 8 0x00000008
    13:21:56.140 [18374] <2> vnet_sock_ready: vnet.c.509: Function failed: 11 0x0000000b
    13:21:56.140 [18374] <2> vnet_pop_byte: vnet.c.173: Function failed: 11 0x0000000b
    13:21:56.140 [18374] <2> vnet_pop_string: vnet.c.268: Function failed: 11 0x0000000b
    13:21:56.140 [18374] <2> vnet_pop_signed: vnet.c.312: Function failed: 11 0x0000000b
    13:21:56.140 [18374] <2> vnet_pop_status: vnet.c.390: Function failed: 11 0x0000000b
    13:21:56.140 [18374] <2> vnet_vnetd_connect_forward_socket_end: vnet_vnetd.c.614: try again: 540 0x0000021c

     

    Thanks

  • Could you enable bpcd log on the client (Verbose 5) and check if a response was received.

  • I turned on logging. I do not see much in the BPCD logs. In the activity monitor I see a status 54 at 12:57:59 AM and in my RMAN logs I see several errors like the following:

     

            RMAN-03009: failure of backup command on s1 channel at 04/19/2011 01:14:10
            ORA-19506: failed to create sequential file, name="c-2237647430-20110412-00", parms=""
            ORA-27028: skgfqcre: sbtbackup returned error
            ORA-19511: Error received from media manager layer, error text:
            VxBSACreateObject: Failed with error:
            Server Status:  timed out connecting to client
            ORA-19600: input file is backup piece  (+ARCHIVE/autobackup/2011_04_12/s_748224764.305.748224765)
            ORA-19601: output file is backup piece  (c-2237647430-20110412-00)
            channel s1 disabled, job failed on it will be run on another channel

     

    In the BPCD log at that time I see where the failure occured but there is no actual error message. It just submits a close request then another open.

     

    00:57:59.825 [6374] <2> process_requests: BPCD_OPEN_FOR_WRITE_RQST
    00:57:59.826 [6374] <2> process_requests: open for write file name = /usr/openv/netbackup/logs/user_ops/dbext/logs/4769.0.1303189075
    00:57:59.828 [6374] <2> process_requests: buffer length = 22
    00:57:59.829 [6374] <2> process_requests: our open flags = 0x119
    00:57:59.829 [6374] <2> process_requests: client open flags = 0x1441
    00:57:59.829 [6374] <2> process_requests: completed the write of the file number to the output socket
    00:57:59.833 [6374] <2> process_requests: BPCD_LOG_RQST
    00:57:59.900 [6374] <2> process_requests: BPCD_CLOSE_FILE_RQST
    00:58:01.858 [6374] <2> process_requests: BPCD_OPEN_FOR_WRITE_RQST
    00:58:01.860 [6374] <2> process_requests: open for write file name = /usr/openv/netbackup/logs/user_ops/dbext/logs/4751.0.1303189077
    00:58:01.861 [6374] <2> process_requests: buffer length = 22
    00:58:01.863 [6374] <2> process_requests: our open flags = 0x119
    00:58:01.863 [6374] <2> process_requests: client open flags = 0x1441
    00:58:01.863 [6374] <2> process_requests: completed the write of the file number to the output socket
    00:58:01.867 [6374] <2> process_requests: BPCD_LOG_RQST
    00:58:01.930 [6374] <2> process_requests: BPCD_CLOSE_FILE_RQST
    00:58:09.286 [7145] <2> process_requests: BPCD_DISCONNECT_RQST
    00:58:09.286 [7145] <2> bpcd exit_bpcd: exit status 0  ----------->exiting

    Since the backups write many of the files it is not a connection issue. It is consistent but not constant. Do I need to set one of the timeouts to be as long as the max file write time? I do not see any realted to how long a file takes to write.

    Thanks

     

  • Try ramp-up CLIENT_READ_TIMEOUT. RMAN sometimes need very long time to determine what archives is needed. All this time Netbackup need to wait. Try setting CLIENT_READ_TIMEOUT = 86400 and retry the operation. Today where all communication is passed thru 13724, more meaningful messages may be masked as sockets errors because of vnetd.

  • I set the read timeout to the max it would allow me (327667) and it still timed out.

     

    readCommFile: ERR - timed out after 32767 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/20626.0.1303445742

     

    Brian

  • Can you confirm that a connection is really established ?

    How fragmented is the recover catalog ?. I have seen really long response time from a recover catalog because some household routines of the RMAN catalog wasn't running (index update etc etc). Talk to the DBA about this.