Forum Discussion

BharathAchar's avatar
5 years ago

bptm consistently fails and restore keeps running and fails finally

 

 

4 Aug, 2020 1:10:19 PM - begin Restore
4 Aug, 2020 1:10:20 PM - Info bprd (pid=6368) Found (12194522) files in (59) images for Restore Job ID 9370081.xxx
4 Aug, 2020 1:10:22 PM - number of images required: 1
4 Aug, 2020 1:10:22 PM - media needed: 6NO423
4 Aug, 2020 1:10:53 PM - restoring from image abcd0306.pqr.xyz.com_1579581936
4 Aug, 2020 1:10:53 PM - Info bpbrm (pid=5182) abcd0306.pqr.xyz.com is the host to restore to
4 Aug, 2020 1:10:53 PM - Info bpbrm (pid=5182) telling media manager to start restore on client
4 Aug, 2020 1:10:54 PM - Info bpbrm (pid=5182) spawning a brm child process
4 Aug, 2020 1:10:54 PM - Info bpbrm (pid=5182) child pid: 5194
4 Aug, 2020 1:10:54 PM - connecting
4 Aug, 2020 1:10:54 PM - Info bpbrm (pid=5194) start tar on client
4 Aug, 2020 1:10:54 PM - Info tar (pid=23413) Restore started.
4 Aug, 2020 1:10:54 PM - connected; connect time: 0:00:00
4 Aug, 2020 1:10:54 PM - requesting resource  6NO423
4 Aug, 2020 1:10:55 PM - Info bptm (pid=5184) Waiting for mount of media id 6NO423 (copy 1) on server abcd0305.pqr.xyz.com.
4 Aug, 2020 1:10:55 PM - started process bptm (pid=5184)
4 Aug, 2020 1:10:55 PM - mounting 6NO423
4 Aug, 2020 1:10:55 PM - granted resource  6NO423
4 Aug, 2020 1:10:55 PM - granted resource  HP.ULTRIUM6-SCSI.003
4 Aug, 2020 1:10:56 PM - Info bptm (pid=5184) INF - Waiting for mount of media id 6NO423 on server abcd0305.pqr.xyz.com for reading.
4 Aug, 2020 1:10:56 PM - Error bptm (pid=5184) error requesting media, TpErrno = File name already exists
4 Aug, 2020 1:10:57 PM - current media 6NO423 complete, requesting next media HP.ULTRIUM6-SCSI.003:Monthly:6NO423
4 Aug, 2020 1:11:00 PM - Info bptm (pid=5184) Waiting for mount of media id 6NO423 (copy 1) on server abcd0305.pqr.xyz.com.
4 Aug, 2020 1:11:00 PM - started process bptm (pid=5184)
4 Aug, 2020 1:11:00 PM - mounting 6NO423
4 Aug, 2020 1:11:00 PM - granted resource  6NO423
4 Aug, 2020 1:11:00 PM - granted resource  HP.ULTRIUM6-SCSI.003
4 Aug, 2020 1:11:00 PM - Info bptm (pid=5184) INF - Waiting for mount of media id 6NO423 on server abcd0305.pqr.xyz.com for reading.
4 Aug, 2020 1:11:00 PM - Error bptm (pid=5184) error requesting media, TpErrno = File name already exists
4 Aug, 2020 1:11:01 PM - current media 6NO423 complete, requesting next media HP.ULTRIUM6-SCSI.003:Monthly:6NO423
4 Aug, 2020 1:11:05 PM - Info bptm (pid=5184) Waiting for mount of media id 6NO423 (copy 1) on server abcd0305.pqr.xyz.com.
4 Aug, 2020 1:11:05 PM - started process bptm (pid=5184)
4 Aug, 2020 1:11:05 PM - mounting 6NO423
4 Aug, 2020 1:11:05 PM - Info bptm (pid=5184) INF - Waiting for mount of media id 6NO423 on server abcd0305.pqr.xyz.com for reading.
4 Aug, 2020 1:11:05 PM - Error bptm (pid=5184) error requesting media, TpErrno = File name already exists
4 Aug, 2020 1:11:05 PM - granted resource  6NO423
4 Aug, 2020 1:11:05 PM - granted resource  HP.ULTRIUM6-SCSI.003
4 Aug, 2020 1:11:06 PM - current media 6NO423 complete, requesting next media HP.ULTRIUM6-SCSI.003:Monthly:6NO423
4 Aug, 2020 1:11:09 PM - granted resource  6NO423
4 Aug, 2020 1:11:09 PM - granted resource  HP.ULTRIUM6-SCSI.003
4 Aug, 2020 1:11:10 PM - Info bptm (pid=5184) Waiting for mount of media id 6NO423 (copy 1) on server abcd0305.pqr.xyz.com.
4 Aug, 2020 1:11:10 PM - started process bptm (pid=5184)
4 Aug, 2020 1:11:10 PM - mounting 6NO423
4 Aug, 2020 1:11:10 PM - Info bptm (pid=5184) INF - Waiting for mount of media id 6NO423 on server abcd0305.pqr.xyz.com for reading.
4 Aug, 2020 1:11:10 PM - Error bptm (pid=5184) error requesting media, TpErrno = File name already exists
4 Aug, 2020 1:11:11 PM - current media 6NO423 complete, requesting next media HP.ULTRIUM6-SCSI.003:Monthly:6NO423

 

4 Aug, 2020 7:01:02 AM - Error bptm (pid=5450) error requesting media, TpErrno = File name already exists
4 Aug, 2020 7:01:02 AM - current media 6NO423 complete, requesting next media HP.ULTRIUM6-SCSI.004:Monthly:6NO423
4 Aug, 2020 7:01:10 AM - Error bptm (pid=5450) media manager terminated by parent process
4 Aug, 2020 7:01:10 AM - Info bpbrm (pid=5322) got ERROR 82 from media manager
4 Aug, 2020 7:01:10 AM - Info bpbrm (pid=5322) terminating bpbrm child 5459 jobid=9368355
4 Aug, 2020 7:01:10 AM - Warning bprd (pid=24365) Restore must be resumed prior to first image expiration on Wed 20 Jan 2021 10:15:36 AM IST
4 Aug, 2020 7:01:10 AM - end Restore; elapsed time 16:26:37

 

  • It appears to be bprd was hung,

    Restarted the Netbackup services on master server, didn't help.

    Rebooted the master server, did help and issue was resolved.

  • Hello

    Can you try to stop and start back this media server services where this BPTM fails...? 

    If this won't help, make sure no tape drives on this media server are not used... than go to

    <install dir>\Veritas\NetBackup\db\media\tpreq\

    and remove drive_* these are the files - I think - it is complaining on....

     

  • bptm tail snippet from media server -

     

    14:17:05.768 [12976] <2> db_end: Need to collect reply
    14:17:05.774 [12976] <16> catch_signal: media manager terminated by parent process
    14:17:05.774 [12976] <2> send_MDS_msg: MEDIA_DONE 0 9370164 0 6NO423 0 180
    14:17:05.774 [12976] <2> packageBptmResourceDoneMsg: msg (MEDIA_DONE 0 9370164 0 6NO423 0 180 )
    14:17:05.774 [12976] <2> packageBptmResourceDoneMsg: returns 7
    14:17:05.774 [12976] <2> JobInst::sendIrmMsg: returning
    14:17:05.774 [12976] <2> send_MDS_msg: Error from sendIrmMsg, Master abcd0300.iind.intel.com, type 11, returned error -1
    14:17:05.774 [12976] <2> send_brm_msg: ERROR 82
    14:17:05.774 [12976] <2> catch_signal: EXITING with status 82
    14:17:05.774 [12976] <2> cleanup: removing shared memory for mpx, shmid = 177537031

     

  • BharathAchar 

    You need to check bptm log on media server abcd0305.xxx for additional info.
    Snippets do not help. All of the log entries related to this particular job PID (e.g. 5184 in the 1st example) need to be examined. 

    I suggest minimum level 3 logging for bptm process. 

    • BharathAchar's avatar
      BharathAchar
      Level 4

      Changed btm logging level to 3 on media server and deleted the tprequst here's the log

      14:26:10.151 [14093] <2> Media_signal_poll: Read bpbrm message (../MultiResReq.cpp:2644)
      14:26:10.151 [14093] <2> read_brm_msg: SUSPEND RESTORE abcd0306.pqr.xyz.com_1579581936
      14:26:10.151 [14093] <2> KILL_MM_CHILD: Sending SIGUSR2 (kill) to child 14102 (../mpx.c:1769)
      14:26:10.151 [14093] <2> wait_for_sigcld: waiting for child to exit, timeout is 1800
      14:26:10.152 [14102] <2> Media_child_dispatch_signal: calling child_catch_signal for 12 (../child.c:3595) delay 0 seconds
      14:26:10.164 [14093] <2> send_brm_msg: EXIT abcd0306.pqr.xyz.com_1579581936 157
      14:26:10.165 [14093] <2> bct_clear_active: Set BCTable entry 0 to inactive, ACTIVE_GC count decremented to 0
      14:26:12.753 [14093] <2> Media_signal_poll: Read bpbrm message (../MultiResReq.cpp:2644)
      14:26:12.753 [14093] <2> read_brm_msg: TERMINATE
      14:26:12.753 [14093] <2> read_brm_socket: got TERMINATE from bpbrm, ACTIVE_GC = 0
      14:26:12.753 [14093] <2> RequestInitialResources: Cancellation requested
      14:26:12.754 [14093] <2> EndpointSelector_R2::endpoint_from_profile: epsr2: Reusing endpoint, target: 10.223.8.148, source: ANY(Endpoint_Selector.cpp:751)
      14:26:12.754 [14093] <2> EndpointSelector_R2::performCallUsingEndpoint: epsr2: insecure invocation->perform_call(...) to host 10.223.8.148 SUCCESS(Endpoint_Selector.cpp:1331)
      14:26:12.757 [14093] <16> RequestInitialResources: ../MultiResReq.cpp:2720 resource request failed [150]
      14:26:12.757 [14093] <2> RequestInitialResources: Requesting cancellation from jobmgr
      14:26:12.757 [14093] <2> EndpointSelector_R2::endpoint_from_profile: epsr2: Reusing endpoint, target: 10.223.8.148, source: ANY(Endpoint_Selector.cpp:751)
      14:26:12.757 [14093] <2> EndpointSelector_R2::performCallUsingEndpoint: epsr2: insecure invocation->perform_call(...) to host 10.223.8.148 SUCCESS(Endpoint_Selector.cpp:1331)
      14:26:12.760 [14093] <2> RequestInitialResources: returning
      14:26:12.760 [14093] <4> nbjm_media_request: Error from RequestMultipleResources, Master abcd0300.pqr.xyz.com, error 150, resourceAllocated 0
      14:26:12.760 [14093] <4> nbjm_media_request: Shutdown requested
      14:26:12.760 [14093] <2> catch_signal: found no active backup control table, ACTIVE_GC = 0
      14:26:14.761 [14093] <8> vnet_get_user_credential_path: [vnet_vxss.c:1621] status 35 0x23
      14:26:14.761 [14093] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:4096] vnet_get_user_credential_path failed 35 0x23
      14:26:14.761 [14093] <2> ConnectionCache::connectAndCache: Acquiring new connection for host abcd0300.pqr.xyz.com, query type 1
      14:26:14.763 [14093] <2> vnet_pbxConnect_ex: pbxConnectExEx Succeeded
      14:26:14.784 [14093] <2> logconnections: PROXY CONNECT FROM 10.223.8.153.56710 TO 10.223.8.148.1556 fd = 17
      14:26:14.784 [14093] <2> logconnections: BPDBM CONNECT FROM 127.0.0.1.34138 TO 127.0.0.1.44865 fd = 17
      14:26:14.790 [14093] <2> db_end: Need to collect reply
      14:26:14.798 [14093] <16> catch_signal: media manager terminated by parent process
      14:26:14.798 [14093] <2> send_MDS_msg: MEDIA_DONE 0 9370164 0 6NO423 0 180
      14:26:14.798 [14093] <2> packageBptmResourceDoneMsg: msg (MEDIA_DONE 0 9370164 0 6NO423 0 180 )
      14:26:14.798 [14093] <2> packageBptmResourceDoneMsg: returns 7
      14:26:14.798 [14093] <2> JobInst::sendIrmMsg: returning
      14:26:14.798 [14093] <2> send_MDS_msg: Error from sendIrmMsg, Master abcd0300.pqr.xyz.com, type 11, returned error -1
      14:26:14.798 [14093] <2> send_brm_msg: ERROR 82
      14:26:14.798 [14093] <2> catch_signal: EXITING with status 82
      14:26:14.798 [14093] <2> cleanup: removing shared memory for mpx, shmid = 177569799
      abcd0305:/usr/openv/netbackup/logs/bptm#

       

      • Marianne's avatar
        Marianne
        Level 6

        Log snippets do not help. 

        Especially if it only includes the suspend and terminate part due to something that happened previously. 

  • It appears to be bprd was hung,

    Restarted the Netbackup services on master server, didn't help.

    Rebooted the master server, did help and issue was resolved.