Forum Discussion

Varma_Chiluvuri's avatar
12 years ago

RMAN Backup failing after backup "begin writing"

03/04/2013 05:02:50 - Info nbjm (pid=28573784) starting backup job (jobid=1002560) for client sapm6pdrv-nb, policy ORA_RMAN_GRV_M6P, schedule Default-Application-Backup
03/04/2013 05:02:50 - Info nbjm (pid=28573784) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1002560, request id:{AC5A40B6-84B2-11E2-B1C5-02AA35650000})
03/04/2013 05:02:50 - requesting resource nbu-media-grv-nb-hcart3-robot-tld-3
03/04/2013 05:02:50 - requesting resource sun107-nb.NBU_CLIENT.MAXJOBS.sapm6pdrv-nb
03/04/2013 05:02:50 - requesting resource sun107-nb.NBU_POLICY.MAXJOBS.ORA_RMAN_GRV_M6P
03/04/2013 05:02:50 - Waiting for scan drive stop STK.T10000A.015, Media server: nbu-media-grv-nb
03/04/2013 05:02:51 - granted resource  sun107-nb.NBU_CLIENT.MAXJOBS.sapm6pdrv-nb
03/04/2013 05:02:51 - granted resource  sun107-nb.NBU_POLICY.MAXJOBS.ORA_RMAN_GRV_M6P
03/04/2013 05:02:51 - granted resource  I00184
03/04/2013 05:02:51 - granted resource  STK.T10000A.015
03/04/2013 05:02:51 - granted resource  nbu-media-grv-nb-hcart3-robot-tld-3
03/04/2013 05:02:52 - estimated 0 kbytes needed
03/04/2013 05:02:52 - Info nbjm (pid=28573784) started backup (backupid=sapm6pdrv-nb_1362391372) job for client sapm6pdrv-nb, policy ORA_RMAN_GRV_M6P, schedule Default-Application-Backup on storage unit nbu-media-grv-nb-hcart3-robot-tld-3
03/04/2013 05:02:55 - mounting I00184
03/04/2013 05:02:56 - Info bpbrm (pid=19923062) sapm6pdrv-nb is the host to backup data from
03/04/2013 05:02:56 - Info bpbrm (pid=19923062) telling media manager to start backup on client
03/04/2013 05:02:56 - Info bptm (pid=11206736) using 262144 data buffer size
03/04/2013 05:02:56 - Info bptm (pid=11206736) using 64 data buffers
03/04/2013 05:02:57 - Info bpbrm (pid=19923062) spawning a brm child process
03/04/2013 05:02:57 - Info bpbrm (pid=19923062) child pid: 16842958
03/04/2013 05:02:58 - Info bpbrm (pid=19923062) sending bpsched msg: CONNECTING TO CLIENT FOR sapm6pdrv-nb_1362391372
03/04/2013 05:02:58 - Info bpbrm (pid=19923062) listening for client connection
03/04/2013 05:02:58 - connecting
03/04/2013 05:03:05 - Info bpbrm (pid=19923062) INF - Client read timeout = 3000
03/04/2013 05:03:11 - Info bpbrm (pid=19923062) accepted connection from client
03/04/2013 05:03:11 - connected; connect time: 0:00:00
03/04/2013 05:03:30 - mounted I00184; mount time: 0:00:35
03/04/2013 05:03:31 - positioning I00184 to file 113
03/04/2013 05:03:42 - positioned I00184; position time: 0:00:11
03/04/2013 05:03:42 - begin writing
03/04/2013 05:18:16 - Error bpbrm (pid=16842958) client sapm6pdrv-nb EXIT STATUS = 6: the backup failed to back up the requested files
03/04/2013 05:18:16 - Info bpbkar (pid=0) done. status: 6
03/04/2013 05:18:16 - Info bpbrm (pid=19923062) sending message to media manager: STOP BACKUP sapm6pdrv-nb_1362391372
03/04/2013 05:18:17 - Info bpbrm (pid=19923062) media manager for backup id sapm6pdrv-nb_1362391372 exited with status 150: termination requested by administrator

03/04/2013 05:18:17 - end writing; write time: 0:14:35
the backup failed to back up the requested files  (6)
 

  • The above tells us that the master is trying to connect to bpcd on the client (via vnetd).

    We need to see incoming connection FROM  10.48.184.74.55164 TO 100.6.1.107.13724.

    You can also check connectivity from Client to master as follows:

    bpclntcmd -pn

    Client should get response back from master server with client name that is known by master server.
    Evidence of this connection request can also be seen in master's bprd log.

     

  • The OS backup is completing successfully but Database RMAN backup is failing after few mins it starts writing, please provide me a solution and let me know if you need more details.

    Netbackup Master Server Version : 7.5.0.4

    Netbackup Media  Server Version : 7.5.0.4

    Netbackup Client   Server Version : 7.0

     

  • Questions:

    1. Is this a new or existing Oracle client?
    2. Has Oracle backups worked previously?
    3. Which steps were followed on Client to configure and link NBU and RMAN?
    4. Is NBU Policy type Oracle or SAP? (I'm wondering about bpbkar process in job details)
    5. Any reason why client is not on NBU 7.5.x as well?
    6. Seems you are using backup network for client. What is CLIENT_NAME in bp.conf on client? Have you hard-coded backup network name in NB_ORA_CLIENT in RMAN script?

    You need the following logs to troubleshoot:

    On Oracle client: dbclient (if log folder does not exist, create it and remember to chmod 777) as well as the RMAN output file.

    On Media server: bptm and bpbrm

    Please rename logs to reflect process name (e.g. dbclient.txt) and post as File attachments.

    Please also check  Client Connect and Client Read Timeouts on the media server - big databases normally need increased timeouts (e.g. 1800).

  • Answers:

    1. Existing oracle(11g) client

    2. Yes, as this is DR backup we run this backup when required.

    3. The RMAN backup policy is same as other policies and this is an old policy.

    4. Oracle

    5. No, we are planning to upgrade client soon.

    6. # cat bp.conf
    SERVER = sun107-nb
    SERVER = nbu-media-grv-nb
    SERVER = sun140-nb
    CLIENT_NAME = sapm6pdrv-nb

    Attached dbclient log from the client and bptm, bpbrm logs from the media server.

    Client Connect and Client Read Timeouts on the media server

    CLIENT_READ_TIMEOUT = 3000
    CLIENT_CONNECT_TIMEOUT = 600

  • Comms issue between client and master server.
    Client is trying to connect to bprd (via vnetd) on the master.
    Client received no response from master:

     

    08:05:58.341 [19005588] <2> logconnections: BPRD CONNECT FROM 10.48.184.74.55164 TO 100.6.1.107.13724
    08:21:33.399 [18612362] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/18612362.0.1362402358
    08:21:33.400 [18612362] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/18612362.0.1362402358>
    08:21:33.400 [18612362] <16> CreateNewImage: ERR - serverResponse() failed

    Please extract bprd log entries on master server between 08:00 and 08:21.

    We need to see if backup request was received by master server from Client IP address 10.48.184.74 and how the master server interpreted this connection request.

    According to Client config the master server is sun107-nb with IP address 100.6.1.107.
    Is this correct?

  • @Marianne I found the below string in the master server BPRD log

     

    ogconnections: BPCD CONNECT FROM 100.6.1.107.35229 TO 10.48.184.74.13724 fd = 5
    08:06:14.463 [42860712] <2> vnet_pbxConnect: ../../libvlibs/vnet_pbx.c.666: pbxSetAddrEx/pbxConnectEx return error 73:Connection reset by peer
    08:06:14.463 [42860712] <8> do_pbx_service: [vnet_connect.c:2034] vnet_pbxConnect() failed 18 0x12
    08:06:14.463 [42860712] <8> do_pbx_service: [vnet_connect.c:2035] save_errno 73 0x49
    08:06:14.463 [42860712] <8> do_pbx_service: [vnet_connect.c:2036] use_vnetd 1 0x1
    08:06:14.463 [42860712] <8> do_pbx_service: [vnet_connect.c:2037] cr->vcr_service vnetd
    08:06:14.463 [42860712] <8> async_connect: [vnet_connect.c:1630] do_service failed 18 0x12
    08:06:14.504 [42860712] <8>

  • The above tells us that the master is trying to connect to bpcd on the client (via vnetd).

    We need to see incoming connection FROM  10.48.184.74.55164 TO 100.6.1.107.13724.

    You can also check connectivity from Client to master as follows:

    bpclntcmd -pn

    Client should get response back from master server with client name that is known by master server.
    Evidence of this connection request can also be seen in master's bprd log.

     

  • Thanks Marianne, I have verified the communication in the client and it was not usual...

    # ./bpclntcmd -pn
    expecting response from server sun107-nb
    sapm6pdrv *NULL* 10.48.184.74 40426

    Afterwards I found that there are two entries pointing to the client in the master server.

    10.48.184.74    sapm6pdrv

    14.4.2.74       sapm6pdrv-nb

    I have modified the entry as below because master server can't communicate to 14 series IP because sapm6pdrv is in different network.

    10.48.184.74    sapm6pdrv       sapm6pdrv-nb

    Now the communication is good and the backup completed successfully.

    # ./bpclntcmd -pn
    expecting response from server sun107-nb
    sapm6pdrv sapm6pdrv-nb 10.48.184.74 54313

    Thank you for quick response guiding me in the right direction.