Forum Discussion

CoMurphy's avatar
CoMurphy
Level 4
10 years ago

Error received from media manager layer

Hello,

 

We recently had a case open with Symantec about the error that I am pasting below. It was thought to be ghost tape drives at the time which we fixed. We had 3 more failures come out this weekend, and was wondering if anyone else has had this issue prior?

We recently upgraded from LTO-4 to LTO-6.

We within the last few months have gone from 7.1.0.4 to 7.6.0.3.

We have added (2) new media servers to our environment to bring us up to (4).

 

Any advice or hints would be welcome.

 

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of backup plus archivelog command at 01/12/2015 01:56:02

ORA-27192: skgfcls: sbtclose2 returned error - failed to close file

ORA-19511: Error received from media manager layer, error text:

   Failed to process backup file <bk_2911_1_868757927>

ORA-19502: write error on file "bk_2911_1_868757927", block number 90785 (block size=8192)

ORA-27030: skgfwrt: sbtwrite2 returned error

ORA-19511: Error received from media manager layer, error text:

   VxBSASendData: Failed with error:

  • Hi,

     

    This seems like a comms issue, there are no errors from the bptm / bpbrm logs. They show that connection/backup was closed from the client side. In the client logs you see this. It was writing then timed out.

     

    Is this a RAC or single Instance?

     

    01:39:12.658 [29981] <2> int_WriteData: INF - writing buffer # 1 of size 262144
    01:39:39.376 [29981] <2> int_WriteData: INF - writing buffer # 1000 of size 262144

    01:40:07.354 [29981] <2> int_WriteData: INF - writing buffer # 2000 of size 262144
    01:56:00.281 [29981] <16> writeToServer: ERR - send() to server on socket failed: Connection timed out (110)
    01:56:00.281 [29981] <16> dbc_put: ERR - failed sending data to server
    01:56:00.281 [29981] <16> VxBSASendData: ERR - Could not do a bsa_put().

    01:56:00.281 [29981] <16> xbsa_SendData: ERR - VxBSASendData: Failed with error:
       Server Status:  Communication with the server has not been initiated or the server status has not been retrieved from the serve
    01:56:00.320 [29981] <16> dbc_put: ERR - invalid handle received from the application
    01:56:00.320 [29981] <16> VxBSASendData: ERR - Could not do a bsa_put().
    01:56:00.320 [29981] <16> xbsa_SendData: ERR - VxBSASendData: Failed with error:
       Server Status:  Communication with the server has not been initiated or the server status has not been retrieved from the serve
    01:56:00.320 [29981] <16> dbc_put: ERR - invalid handle received from the application
    01:56:00.320 [29981] <16> VxBSASendData: ERR - Could not do a bsa_put().
    01:56:00.320 [29981] <16> xbsa_SendData: ERR - VxBSASendData: Failed with error:
       Server Status:  Communication with the server has not been initiated or the server status has not been retrieved from the serve
    01:56:00.320 [29981] <2> int_CloseImage: INF - Backup - closing <bk_2911_1_868757927>
    01:56:00.320 [29981] <16> writeToServer: ERR - send() to server on socket failed: Bad file descriptor (9)
    01:56:00.320 [29981] <16> write_LF_END_tarHeader: ERR - failed writing LF_END_U_LEN_FILE record on DATA socket
    01:56:00.320 [29981] <16> finishTarImage: ERR - write_LF_END_tarHeader() failed.
    01:56:00.320 [29981] <16> VxBSAEndData: ERR - EndData unable to bsa_finishTarImage().
    01:56:00.320 [29981] <16> xbsa_EndData: ERR - VxBSAEndData: Failed with error:
       Server Status:  Communication with the server has not been initiated or the server status has not been retrieved from the serve
    01:56:00.320 [29981] <16> int_CloseImage: ERR - Failed to process backup file <bk_2911_1_868757927>

  • Hello in the bptm log I am seeing a lot of no connection could be made even though the server never went down, nor did Netbackup

     

    00:13:25.075 [4692.4960] <2> vnet_pbxConnect: pbxConnectEx Succeeded
    00:13:25.075 [4692.4960] <2> do_pbx_service: [vnet_connect.c:2160] via PBX bpdbm CONNECT FROM 172.28.25.40.59762 TO 172.28.25.25.1556 fd = 728
    00:13:25.075 [4692.4960] <2> async_connect: [vnet_connect.c:1691] connect async CONNECT FROM 172.28.25.40.59762 TO 172.28.25.25.1556 fd = 728
    00:13:25.075 [4692.4960] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM 0.0.0.0 TO veritasarsenal 172.28.25.25 bpdbm VIA pbx
    00:13:25.075 [4692.4960] <2> logconnections: BPDBM CONNECT FROM 172.28.25.40.59762 TO 172.28.25.25.1556 fd = 728
    00:13:25.075 [4692.4960] <2> vnet_check_vxss_client_magic_with_info_ex: [vnet_vxss_helper.c:943] Ignoring VxSS authentication 2 0x2
    00:13:25.278 [4692.4960] <2> db_CLIENTsend: reset client protocol version from 0 to 8
    00:13:25.278 [4692.4960] <2> db_CLIENTsend: prec->offline=0
    00:13:25.278 [4692.4960] <2> db_CLIENTsend: prec->online_at=0
    00:13:25.278 [4692.4960] <2> db_CLIENTsend: prec->offline_at=0
    00:13:25.278 [4692.4960] <2> db_CLIENTsend: sending 46 bytes
    00:13:25.496 [4692.4960] <2> isDONE: reply DONE 227 <?xml version="1.0" encoding="utf-8"?><StatusMsgDoc><StatusMsg NBUStatus="227" Severity="2" VXULoid="271" VXULtid="227"><MsgString>no&#32;entity&#32;was&#32;found</MsgString></StatusMsg></StatusMsgDoc>
    00:13:25.496 [4692.4960] <2> db_getCLIENT: db_CLIENTreceive: no entity was found 227
    00:13:25.496 [4692.4960] <2> ConnectionCache::connectAndCache: Acquiring new connection for host veritasarsenal, query type 223
    00:13:25.496 [4692.4960] <2> vnet_sortaddrs: [vnet_addrinfo.c:3856] sorted addrs: 1 0x1
    00:13:25.496 [4692.4960] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4754] using interface  ANY
    00:13:25.496 [4692.4960] <2> vnet_sortaddrs: [vnet_addrinfo.c:3856] sorted addrs: 1 0x1
    00:13:25.496 [4692.4960] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4754] using interface  ANY
    00:13:25.496 [4692.4960] <2> async_connect: [vnet_connect.c:1499] connect in progress 1 0x1
    00:13:25.496 [4692.4960] <2> vnet_pbxConnect: pbxConnectEx Succeeded
    00:13:25.496 [4692.4960] <2> do_pbx_service: [vnet_connect.c:2160] via PBX bpdbm CONNECT FROM 172.28.25.40.59763 TO 172.28.25.25.1556 fd = 732
    00:13:25.496 [4692.4960] <2> async_connect: [vnet_connect.c:1691] connect async CONNECT FROM 172.28.25.40.59763 TO 172.28.25.25.1556 fd = 732
    00:13:25.496 [4692.4960] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM 0.0.0.0 TO veritasarsenal 172.28.25.25 bpdbm VIA pbx
    00:13:25.496 [4692.4960] <2> logconnections: BPDBM CONNECT FROM 172.28.25.40.59763 TO 172.28.25.25.1556 fd = 732
    00:13:25.496 [4692.4960] <2> vnet_check_vxss_client_magic_with_info_ex: [vnet_vxss_helper.c:943] Ignoring VxSS authentication 2 0x2
    00:13:25.730 [4692.4960] <2> db_CLIENTsend: reset client protocol version from 0 to 8
    00:13:25.730 [4692.4960] <2> db_CLIENTsend: prec->offline=0
    00:13:25.730 [4692.4960] <2> db_CLIENTsend: prec->online_at=0
    00:13:25.730 [4692.4960] <2> db_CLIENTsend: prec->offline_at=0
    00:13:25.730 [4692.4960] <2> db_CLIENTsend: sending 46 bytes
    00:13:25.949 [4692.4960] <2> isDONE: reply DONE 227 <?xml version="1.0" encoding="utf-8"?><StatusMsgDoc><StatusMsg NBUStatus="227" Severity="2" VXULoid="271" VXULtid="227"><MsgString>no&#32;entity&#32;was&#32;found</MsgString></StatusMsg></StatusMsgDoc>
    00:13:25.949 [4692.4960] <2> db_getCLIENT: db_CLIENTreceive: no entity was found 227
    00:13:25.949 [4692.4960] <2> verify_client: ../bpbrm.c.41740: db_getCLIENT failed for CLIENT: db135
    00:13:25.949 [4692.4960] <2> verify_client: ../bpbrm.c.41910: db_getCLIENT failed: 227 227 0x000000e3
    00:13:25.949 [4692.4960] <2> bpbrm start_bpcd_stat: ../bpbrm.c.23199: bpcd_client: db135
    00:13:25.949 [4692.4960] <2> bpbrm start_bpcd_stat: ../bpbrm.c.23202: bpcd_client_hostname: db135
    00:13:25.949 [4692.4960] <2> local_bpcr_connect: bpcr.c.333: connect_opts = 0x01000100 connect_opts2 = 0x01010100
    00:13:25.949 [4692.4960] <2> local_bpcr_connect: bpcr.c.340: connect_opts = 0x01000100
    00:13:25.949 [4692.4960] <2> local_bpcr_connect: bpcr.c.379: daemon_port_type = 0
    00:13:25.949 [4692.4960] <2> vnet_sortaddrs: [vnet_addrinfo.c:3856] sorted addrs: 1 0x1
    00:13:25.949 [4692.4960] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4754] using interface  ANY
    00:13:25.949 [4692.4960] <2> vnet_sortaddrs: [vnet_addrinfo.c:3856] sorted addrs: 1 0x1
    00:13:25.949 [4692.4960] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4754] using interface  ANY
    00:13:25.949 [4692.4960] <2> vnet_sortaddrs: [vnet_addrinfo.c:3856] sorted addrs: 1 0x1
    00:13:25.964 [4692.4960] <2> vnet_get_pref_netconnection: [vnet_addrinfo.c:4754] using interface  ANY
    00:13:25.964 [4692.4960] <2> async_connect: [vnet_connect.c:1499] connect in progress 1 0x1
    00:13:26.963 [4692.4960] <8> async_connect: [vnet_connect.c:1700] getsockopt SO_ERROR returned 10061 0x274d
    00:13:26.963 [4692.4960] <2> async_connect: [vnet_connect.c:1499] connect in progress 1 0x1
    00:13:27.977 [4692.4960] <8> async_connect: [vnet_connect.c:1700] getsockopt SO_ERROR returned 10061 0x274d
    00:13:27.977 [4692.4960] <2> async_connect: [vnet_connect.c:1499] connect in progress 1 0x1
    00:13:27.992 [4692.4960] <2> async_connect: [vnet_connect.c:1691] connect async CONNECT FROM 172.28.25.40.785 TO 172.28.26.243.13782 fd = 728
    00:13:27.992 [4692.4960] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS
        status: RETRY, (42) CONNECT_REFUSED; system: (10061) No connection could be made because the target machine actively refused it. ; FROM 0.0.0.0 TO db135 172.28.26.243 bpcd VIA pbx
        status: RETRY, (42) CONNECT_REFUSED; system: (10061) No connection could be made because the target machine actively refused it. ; FROM 0.0.0.0 TO db135 172.28.26.243 bpcd VIA vnetd
        status: SUCCESS; FROM 0.0.0.0 TO db135 172.28.26.243 bpcd

  • Hi,

     

    This seems like a comms issue, there are no errors from the bptm / bpbrm logs. They show that connection/backup was closed from the client side. In the client logs you see this. It was writing then timed out.

     

    Is this a RAC or single Instance?

     

    01:39:12.658 [29981] <2> int_WriteData: INF - writing buffer # 1 of size 262144
    01:39:39.376 [29981] <2> int_WriteData: INF - writing buffer # 1000 of size 262144

    01:40:07.354 [29981] <2> int_WriteData: INF - writing buffer # 2000 of size 262144
    01:56:00.281 [29981] <16> writeToServer: ERR - send() to server on socket failed: Connection timed out (110)
    01:56:00.281 [29981] <16> dbc_put: ERR - failed sending data to server
    01:56:00.281 [29981] <16> VxBSASendData: ERR - Could not do a bsa_put().

    01:56:00.281 [29981] <16> xbsa_SendData: ERR - VxBSASendData: Failed with error:
       Server Status:  Communication with the server has not been initiated or the server status has not been retrieved from the serve
    01:56:00.320 [29981] <16> dbc_put: ERR - invalid handle received from the application
    01:56:00.320 [29981] <16> VxBSASendData: ERR - Could not do a bsa_put().
    01:56:00.320 [29981] <16> xbsa_SendData: ERR - VxBSASendData: Failed with error:
       Server Status:  Communication with the server has not been initiated or the server status has not been retrieved from the serve
    01:56:00.320 [29981] <16> dbc_put: ERR - invalid handle received from the application
    01:56:00.320 [29981] <16> VxBSASendData: ERR - Could not do a bsa_put().
    01:56:00.320 [29981] <16> xbsa_SendData: ERR - VxBSASendData: Failed with error:
       Server Status:  Communication with the server has not been initiated or the server status has not been retrieved from the serve
    01:56:00.320 [29981] <2> int_CloseImage: INF - Backup - closing <bk_2911_1_868757927>
    01:56:00.320 [29981] <16> writeToServer: ERR - send() to server on socket failed: Bad file descriptor (9)
    01:56:00.320 [29981] <16> write_LF_END_tarHeader: ERR - failed writing LF_END_U_LEN_FILE record on DATA socket
    01:56:00.320 [29981] <16> finishTarImage: ERR - write_LF_END_tarHeader() failed.
    01:56:00.320 [29981] <16> VxBSAEndData: ERR - EndData unable to bsa_finishTarImage().
    01:56:00.320 [29981] <16> xbsa_EndData: ERR - VxBSAEndData: Failed with error:
       Server Status:  Communication with the server has not been initiated or the server status has not been retrieved from the serve
    01:56:00.320 [29981] <16> int_CloseImage: ERR - Failed to process backup file <bk_2911_1_868757927>

  • If I understand correctly this backup is the only backup on this host.

  • Please attach debug text as a file next time. The post was held back by the anti spam system.

    Best Regrads

    Nicolai

  • Lets go back to basics and troubleshoot as if this is a new setup.

    Is master also the media server?
    Which OS and NBU version on the client?

    Is there a firewall between server(s) and clients?
    If so, ensure port 1556 is open in all directions.

    If Windows client, is Windows firewall disabled?
    Any security software that will prevent port connectivity or connect-back? (Check AV settings)

    Are NetBackup Client services and Symantec Pivate Branch Exchange service running on the client?

    Test port connectivity and hostname lookup as follows:

    1. Create bpcd log folder on client.
      Create bprd log folder on master if not already there followed by restart of NBU on the master.
    2. On master, run this command:
      bptestbpcd -client <client-name> -verbose -debug
      (Command is in ...netbackup\bin\admincmd)
      Check output for errors as well as client's bpcd log.
      If nothing is logged in bpcd, there is a firewall preventing connection.
      If there is separate media server, repeat above steps.
    3. On client, run this command:
      bpclntcmd -pn
      (Command is in ...netbackup\bin)
      Check output to see if master responds with client name.
      If not, check bprd on the master to see if client connection request was received. Look for IP address from client and how the master is resolving IP address to hostname.
      If nothing is logged, the client request did not get there.
      Check that client has correct hostname configured for Master server in NBU and OS level. Check at OS level on the client if Master server hostname is resolving to correct IP address.
      See : 

      How to verify name resolution for NetBackup (tm) systems, using the "bpclntcmd" command
      http://www.symantec.com/docs/TECH27430