Forum Discussion

EUB's avatar
EUB
Level 4
8 years ago

Communication errors

Hello

I got:

Win2008R2 Master server

Tape Library Quantum Scalar i500

Oracle(RHEL) client machine

3 defferent Windows 2008R2 client machine

 

All 3 Windows client backups fails with error 98:

28.07.2016 13:00:00 - Info nbjm(pid=5476) starting backup job (jobid=494375) for client kzwbackup01.eub.kz, policy Backup_SQL_MUOR, schedule Differential-Inc
28.07.2016 13:00:00 - Info nbjm(pid=5476) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=494375, request id:{7B803017-4C6B-4FF1-B78B-C26E4B9B84DD})
28.07.2016 13:00:00 - requesting resource kzwnb01-hcart3-robot-tld-0
28.07.2016 13:00:00 - requesting resource kzwnb01.eub.kz.NBU_CLIENT.MAXJOBS.kzwbackup01.eub.kz
28.07.2016 13:00:00 - requesting resource kzwnb01.eub.kz.NBU_POLICY.MAXJOBS.Backup_SQL_MUOR
28.07.2016 13:00:00 - granted resource kzwnb01.eub.kz.NBU_CLIENT.MAXJOBS.kzwbackup01.eub.kz
28.07.2016 13:00:00 - granted resource kzwnb01.eub.kz.NBU_POLICY.MAXJOBS.Backup_SQL_MUOR
28.07.2016 13:00:00 - granted resource 000106
28.07.2016 13:00:00 - granted resource HP.ULTRIUM6-SCSI.004
28.07.2016 13:00:00 - granted resource kzwnb01-hcart3-robot-tld-0
28.07.2016 13:00:00 - estimated 161158 Kbytes needed
28.07.2016 13:00:00 - Info nbjm(pid=5476) started backup (backupid=kzwbackup01.eub.kz_1469689200) job for client kzwbackup01.eub.kz, policy Backup_SQL_MUOR, schedule Differential-Inc on storage unit kzwnb01-hcart3-robot-tld-0
28.07.2016 13:00:00 - started process bpbrm (9716)
28.07.2016 13:00:00 - started
28.07.2016 13:00:05 - Info bpbrm(pid=9716) kzwbackup01.eub.kz is the host to backup data from
28.07.2016 13:00:05 - Info bpbrm(pid=9716) reading file list for client
28.07.2016 13:00:05 - connecting
28.07.2016 13:00:08 - Info bpbrm(pid=9716) starting bpbkar32 on client
28.07.2016 13:00:08 - connected; connect time: 0:00:03
28.07.2016 13:00:10 - Info bpbkar32(pid=488) Backup started
28.07.2016 13:00:10 - Info bptm(pid=4040) start
28.07.2016 13:00:10 - Info bptm(pid=4040) using 65536 data buffer size
28.07.2016 13:00:10 - Info bptm(pid=4040) setting receive network buffer to 263168 bytes
28.07.2016 13:00:10 - Info bptm(pid=4040) using 30 data buffers
28.07.2016 13:00:10 - Info bptm(pid=4040) start backup
28.07.2016 13:00:10 - Info bptm(pid=4040) backup child process is pid 10092.7356
28.07.2016 13:00:10 - Info bptm(pid=4040) Waiting for mount of media id 000106 (copy 1) on server kzwnb01.eub.kz.
28.07.2016 13:00:10 - Info bptm(pid=10092) start
28.07.2016 13:00:10 - mounting 000106
28.07.2016 13:00:46 - Info bpbkar32(pid=488) change journal NOT enabled for <D:\sql_backup\MUOR>
28.07.2016 13:31:11 - Error bptm(pid=4040) error requesting media, TpErrno = Robot operation failed
28.07.2016 13:31:11 - Warning bptm(pid=4040) media id 000106 load operation reported an error
28.07.2016 13:31:11 - current media 000106 complete, requesting next resource Any
28.07.2016 14:01:27 - Info bptm(pid=4040) EXITING with status 98 <----------
28.07.2016 14:01:32 - Info bpbkar32(pid=488) done. status: 98: error requesting media (tpreq)
28.07.2016 14:01:32 - end writing
error requesting media (tpreq)(98)
28.07.2016 14:01:35 - Info bpbrm(pid=9020) Starting delete snapshot processing
28.07.2016 14:01:39 - Info bpfis(pid=7628) Backup started
28.07.2016 14:01:39 - Critical bpbrm(pid=9020) from client kzwbackup01.eub.kz: cannot open C:\Program Files\Veritas\NetBackup\online_util\fi_cntl\bpfis.fim.kzwbackup01.eub.kz_1469689200.1.0
28.07.2016 14:01:39 - Info bpfis(pid=7628) done. status: 1542
28.07.2016 14:01:39 - end operation
28.07.2016 14:01:39 - Info bpfis(pid=7628) done. status: 1542: An existing snapshot is no longer valid and cannot be mounted for subsequent operations

Oracle fails with error 6 (not every backup, just few times in a day)

RMAN logs shows next:

channel ORA_SBT_TAPE_1: input backup set: count=9309, stamp=916194140, piece=1
channel ORA_SBT_TAPE_1: starting piece 1 at 26-JUL-16
channel ORA_SBT_TAPE_1: backup piece /archive/archivelog/DB.LOGS.ERP.s9309.p1.t916194140
piece handle=2tr9o0qs_1_3 comment=API Version 2.0,MMS Version 5.0.0.0
channel ORA_SBT_TAPE_1: finished piece 1 at 26-JUL-16
channel ORA_SBT_TAPE_1: backup piece complete, elapsed time: 00:00:45
channel ORA_SBT_TAPE_1: input backup set: count=9297, stamp=916194139, piece=1
channel ORA_SBT_TAPE_1: starting piece 1 at 26-JUL-16
channel ORA_SBT_TAPE_1: backup piece /archive/archivelog/DB.LOGS.ERP.s9297.p1.t916194139
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_SBT_TAPE_1 channel at 07/26/2016 10:45:24
ORA-19506: failed to create sequential file, name="2hr9o0qr_1_3", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
VxBSACreateObject: Failed with error:
Server Status: failed to communicate with resource requester
ORA-19600: input file is backup piece (/archive/archivelog/DB.LOGS.ERP.s9297.p1.t916194139)
ORA-19601: output file is backup piece (2hr9o0qr_1_3)

Recovery Manager complete.
Backup ended at 2016-07-26 10:45:33

 

 

Can u provide me how to determine cause of that errors, which logs could help me

RMAN script is old an allways works fine, communication between master and client were checked via script(telneted ports 1556,13724,13782 every 30 secs, and no single interrupt appers in script log)

Master server were restarted few times, updated drivers for tape drives

Another windows and oracle(rhel) backups works well(about 30 clients)

 

 

 

 

17 Replies

  • Here some logs from linux client(oracle):

    more /usr/openv/netbackup/logs/user_ops/dbext/logs/16210.0.1469728327
    Backup started Thu Jul 28 23:52:08 2016


    23:52:09 Initiating backup
    23:52:14 INF - Starting bpbrm
    23:52:18 INF - Data socket = kzwnb01.eub.kz.IPC:/tmp/vnet-23466469728336472524000005994-8CAH0F;d15070320c90dd5ffac89c4ea9cdd487;11;1800
    23:52:18 INF - Name socket = kzwnb01.eub.kz.IPC:/tmp/vnet-23467469728336748882000005994-J6NAJG;0a108b72858ec6d9bd551781ef6f47e2;11;1800
    23:52:18 INF - Job id = 494667
    23:52:18 INF - Backup id = KZLORA04.EUB.KZ_1469728332
    23:52:18 INF - Backup time = 1469728332
    23:52:19 INF - Policy name = KZLORA04_RMAN
    23:52:19 INF - Snapshot = 0
    23:52:19 INF - Frozen image = 0
    23:52:19 INF - Backup copy = 0
    23:52:20 INF - Master server = kzwnb01.eub.kz
    23:52:20 INF - Media server = kzwnb01.eub.kz
    23:52:20 INF - Multiplexing = 0
    23:52:20 INF - New data socket = kzwnb01.eub.kz.IPC:/tmp/vnet-23460469728336212972000005994-KxZxkF;705f029b4a0a3b6cfb04117e0c2fd54b;11;1800
    23:52:20 INF - Encrypt = 0
    23:52:21 INF - Use shared memory = 0
    23:52:21 INF - Compression = 1
    23:52:21 INF - Encrypt = 0
    23:52:21 INF - Keep logs = 28
    23:52:21 INF - Client read timeout = 11800
    23:52:22 INF - Media mount timeout = 0
    23:52:23 INF - Waiting for mount of media id 000031 on server kzwnb01.eub.kz for writing
    00:22:24 INF - Server status = 6
    00:22:25 INF - Backup by oracle on client KZLORA04.EUB.KZ using policy KZLORA04_RMAN, sched Default-Application-Backup:the backup failed to back up the requested files

    more /usr/openv/netbackup/logs/user_ops/dbext/logs/8917.0.1469744559
    Backup started Fri Jul 29 04:22:39 2016


    04:22:40 Initiating backup
    04:22:45 INF - Starting bpbrm
    04:22:49 INF - Data socket = kzwnb01.eub.kz.IPC:/tmp/vnet-09842469744567634243000006003-ej7Pmz;875d7aabfe6ff2b16a21976d8824cb61;11;1800
    04:22:49 INF - Name socket = kzwnb01.eub.kz.IPC:/tmp/vnet-09845469744567914147000006003-rEMi6z;1b6c020056facf0c0385460f839867c9;11;1800
    04:22:49 INF - Job id = 494897
    04:22:49 INF - Backup id = KZLORA04.EUB.KZ_1469744564
    04:22:50 INF - Backup time = 1469744564
    04:22:50 INF - Policy name = KZLORA04_RMAN
    04:22:50 INF - Snapshot = 0
    04:22:50 INF - Frozen image = 0
    04:22:51 INF - Backup copy = 0
    04:22:51 INF - Master server = kzwnb01.eub.kz
    04:22:51 INF - Media server = kzwnb01.eub.kz
    04:22:51 INF - Multiplexing = 0
    04:22:51 INF - New data socket = kzwnb01.eub.kz.IPC:/tmp/vnet-09841469744567347672000006003-hfeiCy;850f6d95072bbe80eb9a6f7d5c7e57ec;11;1800
    04:22:52 INF - Encrypt = 0
    04:22:52 INF - Use shared memory = 0
    04:22:52 INF - Compression = 1
    04:22:52 INF - Encrypt = 0
    04:22:52 INF - Keep logs = 28
    04:22:53 INF - Client read timeout = 11800
    04:22:53 INF - Media mount timeout = 0
    04:22:54 INF - Waiting for mount of media id 000031 on server kzwnb01.eub.kz for writing
    04:23:02 INF - Waiting for mount of media id 000039 on server kzwnb01.eub.kz for writing
    04:23:47 INF - Mounted media id 000039
    04:25:00 INF - Beginning backup on server kzwnb01.eub.kz of client KZLORA04.EUB.KZ
    04:25:09 INF - Server status = 0
    04:25:10 INF - Backup by oracle on client KZLORA04.EUB.KZ using policy KZLORA04_RMAN, sched Default-Application-Backup:the requested operation was successfully completed

    more /usr/openv/netbackup/logs/vnetd/log.072916
    00:00:21.127 [7174] <2> setup_debug_log: switched debug log file for vnetd
    00:22:23.544 [7174] <2> vnet_pbxAcceptSocket: Accepted sock[9] from 172.20.2.42:63851
    00:22:23.544 [4350] <8> ProcessRequests: [vnetd.c:635] msg VNETD ACCEPT FROM 172.20.2.42.63851 TO 172.20.1.66.1556 fd = 9
    00:22:23.549 [4350] <8> vnet_version_accept: [vnetd.c:1092] *actual_version 11 0xb
    00:22:23.590 [4350] <8> ProcessRequests: [vnetd.c:718] msg Request VN_REQUEST_CONNECT_FORWARD_SOCKET(10)
    00:22:23.590 [4350] <8> vnet_generate_hashes: [vnet_vnetd.c:1592] hash_str1 d98e9d5f31b49718fc042bc9d50f1083
    00:22:23.590 [4350] <8> vnet_generate_hashes: [vnet_vnetd.c:1593] hash_str2 9bd1cb593c3423ffe14d6a14714c1d3f
    00:22:23.590 [4350] <8> process_connect_forward_socket: [vnetd.c:1972] ipc_string /tmp/vnet-04350469730143590705000005997-AHrPJK
    00:22:23.796 [4350] <8> vnet_send_network_socket: [vnet_vnetd.c:1400] hash_str2 9bd1cb593c3423ffe14d6a14714c1d3f
    00:22:23.797 [4350] <8> ProcessRequests: [vnetd.c:720] status 0 0x0
    00:22:25.052 [7174] <2> vnet_pbxAcceptSocket: Accepted sock[9] from 172.20.2.42:63864
    00:22:25.052 [4362] <8> ProcessRequests: [vnetd.c:635] msg VNETD ACCEPT FROM 172.20.2.42.63864 TO 172.20.1.66.1556 fd = 9
    00:22:25.054 [4362] <8> vnet_version_accept: [vnetd.c:1092] *actual_version 11 0xb
    00:22:25.096 [7174] <2> vnet_pbxAcceptSocket: Accepted sock[9] from 172.20.2.42:63866
    00:22:25.097 [4364] <8> ProcessRequests: [vnetd.c:635] msg VNETD ACCEPT FROM 172.20.2.42.63866 TO 172.20.1.66.1556 fd = 9
    00:22:25.097 [4362] <8> ProcessRequests: [vnetd.c:718] msg Request VN_REQUEST_CONNECT_FORWARD_SOCKET(10)
    00:22:25.097 [4362] <8> vnet_generate_hashes: [vnet_vnetd.c:1592] hash_str1 576d95acd0f17fbe6bd549e0aab642ce
    00:22:25.097 [4362] <8> vnet_generate_hashes: [vnet_vnetd.c:1593] hash_str2 3c2f65ced6ec152d45b2eddfa580d20e
    00:22:25.097 [4362] <8> process_connect_forward_socket: [vnetd.c:1972] ipc_string /tmp/vnet-04362469730145097185000005997-6HhwGO
    00:22:25.099 [4364] <8> vnet_version_accept: [vnetd.c:1092] *actual_version 11 0xb
    00:22:25.141 [4364] <8> ProcessRequests: [vnetd.c:718] msg Request VN_REQUEST_CONNECT_FORWARD_SOCKET(10)
    00:22:25.141 [4364] <8> vnet_generate_hashes: [vnet_vnetd.c:1592] hash_str1 5b74da90edf0ef5a00c351fa99595d90
    00:22:25.142 [4364] <8> vnet_generate_hashes: [vnet_vnetd.c:1593] hash_str2 d22a419731e051d6da314c20bd947b20
    00:22:25.142 [4364] <8> process_connect_forward_socket: [vnetd.c:1972] ipc_string /tmp/vnet-04364469730145141749000005997-gk5KNO
    00:22:25.303 [4362] <8> vnet_send_network_socket: [vnet_vnetd.c:1400] hash_str2 3c2f65ced6ec152d45b2eddfa580d20e
    00:22:25.303 [4362] <8> ProcessRequests: [vnetd.c:720] status 0 0x0
    00:22:25.352 [4364] <8> vnet_send_network_socket: [vnet_vnetd.c:1400] hash_str2 d22a419731e051d6da314c20bd947b20
    00:22:25.352 [4364] <8> ProcessRequests: [vnetd.c:720] status 0 0x0
    02:26:27.258 [7174] <2> vnet_pbxAcceptSocket: Accepted sock[9] from 172.20.2.42:57941

    more /usr/openv/netbackup/logs/dbclient/log.072916
    00:07:24.487 [16210] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/16210.0.1469728327
    00:07:24.498 [16210] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
    00:07:24.498 [16210] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
    00:22:23.359 [16210] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/16210.0.1469728327
    00:22:23.359 [16210] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/16210.0.1469728327>
    00:22:23.359 [16210] <16> CreateNewImage: ERR - serverResponse() failed
    00:22:23.362 [16210] <16> VxBSACreateObject: ERR - Could not create new image with file /29rbkbg0_1_2.
    00:22:23.362 [16210] <16> xbsa_CreateObject: ERR - VxBSACreateObject: Failed with error:
    Server Status: Communication with the server has not been initiated or the server status has not been retrieved from the serve
    00:22:35.962 [16210] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.436: errno: 2 2 0x00000002
    00:22:35.962 [16210] <2> conf_update_time: ../../libvlibs/nbconf_glue.cpp.437: path: /home/oracle/bp.conf
    00:22:35.963 [16210] <4> sbtend: INF - --- END of SESSION ---
    00:22:35.963 [16210] <8> close_image: Session being terminated abnormally, cleaning up
    00:22:35.963 [16210] <4> close_image: INF - backup FAILED
    00:22:35.963 [16210] <4> close_image: INF ---- end of Backup ---

    00:22:35.963 [16210] <16> VxBSAEndTxn: ERR - Transaction ended with active Backup/Restore.
    00:22:35.963 [16210] <16> xbsa_EndTransaction: ERR - VxBSAEndTxn: Failed with error:
    The transaction was aborted.

     

     

    • Marianne's avatar
      Marianne
      Level 6

      Forget about Oracle backups for now (or start a new discussion for that).

      This is the error that you need to troubleshoot first:

      28.07.2016 13:31:11 - Error bptm(pid=4040) error requesting media, TpErrno = Robot operation failed
      28.07.2016 13:31:11 - Warning bptm(pid=4040) media id 000106 load operation reported an error

      Media id 000106 could not be loaded into the tape drive.

      Have you confirmed that the media and tape drive(s) are compatibile?
      E.g. LTO6 can write to LTO5 and LTO6 tapes.

      Are you seeing the same issue with other media id's or just this one?

      • EUB's avatar
        EUB
        Level 4

        Thanks for quick apply

        All tape media are LTO6, all drives are compatibile

        Errors appers with different drive ID an different media ID

        Tape library reports dont include robot fails(reports via library web-interface)

        Policies are old and works well.