cancel
Showing results for 
Search instead for 
Did you mean: 

bptm consistently fails and restore keeps running and fails finally

BharathAchar
Level 4

 

 

4 Aug, 2020 1:10:19 PM - begin Restore
4 Aug, 2020 1:10:20 PM - Info bprd (pid=6368) Found (12194522) files in (59) images for Restore Job ID 9370081.xxx
4 Aug, 2020 1:10:22 PM - number of images required: 1
4 Aug, 2020 1:10:22 PM - media needed: 6NO423
4 Aug, 2020 1:10:53 PM - restoring from image abcd0306.pqr.xyz.com_1579581936
4 Aug, 2020 1:10:53 PM - Info bpbrm (pid=5182) abcd0306.pqr.xyz.com is the host to restore to
4 Aug, 2020 1:10:53 PM - Info bpbrm (pid=5182) telling media manager to start restore on client
4 Aug, 2020 1:10:54 PM - Info bpbrm (pid=5182) spawning a brm child process
4 Aug, 2020 1:10:54 PM - Info bpbrm (pid=5182) child pid: 5194
4 Aug, 2020 1:10:54 PM - connecting
4 Aug, 2020 1:10:54 PM - Info bpbrm (pid=5194) start tar on client
4 Aug, 2020 1:10:54 PM - Info tar (pid=23413) Restore started.
4 Aug, 2020 1:10:54 PM - connected; connect time: 0:00:00
4 Aug, 2020 1:10:54 PM - requesting resource  6NO423
4 Aug, 2020 1:10:55 PM - Info bptm (pid=5184) Waiting for mount of media id 6NO423 (copy 1) on server abcd0305.pqr.xyz.com.
4 Aug, 2020 1:10:55 PM - started process bptm (pid=5184)
4 Aug, 2020 1:10:55 PM - mounting 6NO423
4 Aug, 2020 1:10:55 PM - granted resource  6NO423
4 Aug, 2020 1:10:55 PM - granted resource  HP.ULTRIUM6-SCSI.003
4 Aug, 2020 1:10:56 PM - Info bptm (pid=5184) INF - Waiting for mount of media id 6NO423 on server abcd0305.pqr.xyz.com for reading.
4 Aug, 2020 1:10:56 PM - Error bptm (pid=5184) error requesting media, TpErrno = File name already exists
4 Aug, 2020 1:10:57 PM - current media 6NO423 complete, requesting next media HP.ULTRIUM6-SCSI.003:Monthly:6NO423
4 Aug, 2020 1:11:00 PM - Info bptm (pid=5184) Waiting for mount of media id 6NO423 (copy 1) on server abcd0305.pqr.xyz.com.
4 Aug, 2020 1:11:00 PM - started process bptm (pid=5184)
4 Aug, 2020 1:11:00 PM - mounting 6NO423
4 Aug, 2020 1:11:00 PM - granted resource  6NO423
4 Aug, 2020 1:11:00 PM - granted resource  HP.ULTRIUM6-SCSI.003
4 Aug, 2020 1:11:00 PM - Info bptm (pid=5184) INF - Waiting for mount of media id 6NO423 on server abcd0305.pqr.xyz.com for reading.
4 Aug, 2020 1:11:00 PM - Error bptm (pid=5184) error requesting media, TpErrno = File name already exists
4 Aug, 2020 1:11:01 PM - current media 6NO423 complete, requesting next media HP.ULTRIUM6-SCSI.003:Monthly:6NO423
4 Aug, 2020 1:11:05 PM - Info bptm (pid=5184) Waiting for mount of media id 6NO423 (copy 1) on server abcd0305.pqr.xyz.com.
4 Aug, 2020 1:11:05 PM - started process bptm (pid=5184)
4 Aug, 2020 1:11:05 PM - mounting 6NO423
4 Aug, 2020 1:11:05 PM - Info bptm (pid=5184) INF - Waiting for mount of media id 6NO423 on server abcd0305.pqr.xyz.com for reading.
4 Aug, 2020 1:11:05 PM - Error bptm (pid=5184) error requesting media, TpErrno = File name already exists
4 Aug, 2020 1:11:05 PM - granted resource  6NO423
4 Aug, 2020 1:11:05 PM - granted resource  HP.ULTRIUM6-SCSI.003
4 Aug, 2020 1:11:06 PM - current media 6NO423 complete, requesting next media HP.ULTRIUM6-SCSI.003:Monthly:6NO423
4 Aug, 2020 1:11:09 PM - granted resource  6NO423
4 Aug, 2020 1:11:09 PM - granted resource  HP.ULTRIUM6-SCSI.003
4 Aug, 2020 1:11:10 PM - Info bptm (pid=5184) Waiting for mount of media id 6NO423 (copy 1) on server abcd0305.pqr.xyz.com.
4 Aug, 2020 1:11:10 PM - started process bptm (pid=5184)
4 Aug, 2020 1:11:10 PM - mounting 6NO423
4 Aug, 2020 1:11:10 PM - Info bptm (pid=5184) INF - Waiting for mount of media id 6NO423 on server abcd0305.pqr.xyz.com for reading.
4 Aug, 2020 1:11:10 PM - Error bptm (pid=5184) error requesting media, TpErrno = File name already exists
4 Aug, 2020 1:11:11 PM - current media 6NO423 complete, requesting next media HP.ULTRIUM6-SCSI.003:Monthly:6NO423

 

4 Aug, 2020 7:01:02 AM - Error bptm (pid=5450) error requesting media, TpErrno = File name already exists
4 Aug, 2020 7:01:02 AM - current media 6NO423 complete, requesting next media HP.ULTRIUM6-SCSI.004:Monthly:6NO423
4 Aug, 2020 7:01:10 AM - Error bptm (pid=5450) media manager terminated by parent process
4 Aug, 2020 7:01:10 AM - Info bpbrm (pid=5322) got ERROR 82 from media manager
4 Aug, 2020 7:01:10 AM - Info bpbrm (pid=5322) terminating bpbrm child 5459 jobid=9368355
4 Aug, 2020 7:01:10 AM - Warning bprd (pid=24365) Restore must be resumed prior to first image expiration on Wed 20 Jan 2021 10:15:36 AM IST
4 Aug, 2020 7:01:10 AM - end Restore; elapsed time 16:26:37

 

1 ACCEPTED SOLUTION

Accepted Solutions

BharathAchar
Level 4

It appears to be bprd was hung,

Restarted the Netbackup services on master server, didn't help.

Rebooted the master server, did help and issue was resolved.

View solution in original post

9 REPLIES 9

quebek
Moderator
Moderator
   VIP    Certified

Hello

Can you try to stop and start back this media server services where this BPTM fails...? 

If this won't help, make sure no tape drives on this media server are not used... than go to

<install dir>\Veritas\NetBackup\db\media\tpreq\

and remove drive_* these are the files - I think - it is complaining on....

 

BharathAchar
Level 4

bptm tail snippet from media server -

 

14:17:05.768 [12976] <2> db_end: Need to collect reply
14:17:05.774 [12976] <16> catch_signal: media manager terminated by parent process
14:17:05.774 [12976] <2> send_MDS_msg: MEDIA_DONE 0 9370164 0 6NO423 0 180
14:17:05.774 [12976] <2> packageBptmResourceDoneMsg: msg (MEDIA_DONE 0 9370164 0 6NO423 0 180 )
14:17:05.774 [12976] <2> packageBptmResourceDoneMsg: returns 7
14:17:05.774 [12976] <2> JobInst::sendIrmMsg: returning
14:17:05.774 [12976] <2> send_MDS_msg: Error from sendIrmMsg, Master abcd0300.iind.intel.com, type 11, returned error -1
14:17:05.774 [12976] <2> send_brm_msg: ERROR 82
14:17:05.774 [12976] <2> catch_signal: EXITING with status 82
14:17:05.774 [12976] <2> cleanup: removing shared memory for mpx, shmid = 177537031

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@BharathAchar 

You need to check bptm log on media server abcd0305.xxx for additional info.
Snippets do not help. All of the log entries related to this particular job PID (e.g. 5184 in the 1st example) need to be examined. 

I suggest minimum level 3 logging for bptm process. 

Changed btm logging level to 3 on media server and deleted the tprequst here's the log

14:26:10.151 [14093] <2> Media_signal_poll: Read bpbrm message (../MultiResReq.cpp:2644)
14:26:10.151 [14093] <2> read_brm_msg: SUSPEND RESTORE abcd0306.pqr.xyz.com_1579581936
14:26:10.151 [14093] <2> KILL_MM_CHILD: Sending SIGUSR2 (kill) to child 14102 (../mpx.c:1769)
14:26:10.151 [14093] <2> wait_for_sigcld: waiting for child to exit, timeout is 1800
14:26:10.152 [14102] <2> Media_child_dispatch_signal: calling child_catch_signal for 12 (../child.c:3595) delay 0 seconds
14:26:10.164 [14093] <2> send_brm_msg: EXIT abcd0306.pqr.xyz.com_1579581936 157
14:26:10.165 [14093] <2> bct_clear_active: Set BCTable entry 0 to inactive, ACTIVE_GC count decremented to 0
14:26:12.753 [14093] <2> Media_signal_poll: Read bpbrm message (../MultiResReq.cpp:2644)
14:26:12.753 [14093] <2> read_brm_msg: TERMINATE
14:26:12.753 [14093] <2> read_brm_socket: got TERMINATE from bpbrm, ACTIVE_GC = 0
14:26:12.753 [14093] <2> RequestInitialResources: Cancellation requested
14:26:12.754 [14093] <2> EndpointSelector_R2::endpoint_from_profile: epsr2: Reusing endpoint, target: 10.223.8.148, source: ANY(Endpoint_Selector.cpp:751)
14:26:12.754 [14093] <2> EndpointSelector_R2::performCallUsingEndpoint: epsr2: insecure invocation->perform_call(...) to host 10.223.8.148 SUCCESS(Endpoint_Selector.cpp:1331)
14:26:12.757 [14093] <16> RequestInitialResources: ../MultiResReq.cpp:2720 resource request failed [150]
14:26:12.757 [14093] <2> RequestInitialResources: Requesting cancellation from jobmgr
14:26:12.757 [14093] <2> EndpointSelector_R2::endpoint_from_profile: epsr2: Reusing endpoint, target: 10.223.8.148, source: ANY(Endpoint_Selector.cpp:751)
14:26:12.757 [14093] <2> EndpointSelector_R2::performCallUsingEndpoint: epsr2: insecure invocation->perform_call(...) to host 10.223.8.148 SUCCESS(Endpoint_Selector.cpp:1331)
14:26:12.760 [14093] <2> RequestInitialResources: returning
14:26:12.760 [14093] <4> nbjm_media_request: Error from RequestMultipleResources, Master abcd0300.pqr.xyz.com, error 150, resourceAllocated 0
14:26:12.760 [14093] <4> nbjm_media_request: Shutdown requested
14:26:12.760 [14093] <2> catch_signal: found no active backup control table, ACTIVE_GC = 0
14:26:14.761 [14093] <8> vnet_get_user_credential_path: [vnet_vxss.c:1621] status 35 0x23
14:26:14.761 [14093] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:4096] vnet_get_user_credential_path failed 35 0x23
14:26:14.761 [14093] <2> ConnectionCache::connectAndCache: Acquiring new connection for host abcd0300.pqr.xyz.com, query type 1
14:26:14.763 [14093] <2> vnet_pbxConnect_ex: pbxConnectExEx Succeeded
14:26:14.784 [14093] <2> logconnections: PROXY CONNECT FROM 10.223.8.153.56710 TO 10.223.8.148.1556 fd = 17
14:26:14.784 [14093] <2> logconnections: BPDBM CONNECT FROM 127.0.0.1.34138 TO 127.0.0.1.44865 fd = 17
14:26:14.790 [14093] <2> db_end: Need to collect reply
14:26:14.798 [14093] <16> catch_signal: media manager terminated by parent process
14:26:14.798 [14093] <2> send_MDS_msg: MEDIA_DONE 0 9370164 0 6NO423 0 180
14:26:14.798 [14093] <2> packageBptmResourceDoneMsg: msg (MEDIA_DONE 0 9370164 0 6NO423 0 180 )
14:26:14.798 [14093] <2> packageBptmResourceDoneMsg: returns 7
14:26:14.798 [14093] <2> JobInst::sendIrmMsg: returning
14:26:14.798 [14093] <2> send_MDS_msg: Error from sendIrmMsg, Master abcd0300.pqr.xyz.com, type 11, returned error -1
14:26:14.798 [14093] <2> send_brm_msg: ERROR 82
14:26:14.798 [14093] <2> catch_signal: EXITING with status 82
14:26:14.798 [14093] <2> cleanup: removing shared memory for mpx, shmid = 177569799
abcd0305:/usr/openv/netbackup/logs/bptm#

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Log snippets do not help. 

Especially if it only includes the suspend and terminate part due to something that happened previously. 

here's the log from newly started restore job

 

14:33:14.615 [17085] <16> mount_open_media: error requesting media, TpErrno = File name already exists
14:33:14.615 [17085] <4> create_tpreq_file: symlink to path /dev/nst0
14:33:14.615 [17085] <2> manage_drive_before_load: report_attr, fl1 0x00000000, fl2 0x00000000
14:33:14.615 [17085] <2> drivename_write: Called with mode 1
14:33:14.615 [17085] <2> drivename_unlock: unlocked
14:33:14.615 [17085] <2> drivename_close: Called for file HP.ULTRIUM6-SCSI.003
14:33:14.615 [17085] <8> vnet_get_user_credential_path: [vnet_vxss.c:1621] status 35 0x23
14:33:14.615 [17085] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:4096] vnet_get_user_credential_path failed 35 0x23
14:33:14.615 [17085] <2> ConnectionCache::connectAndCache: Acquiring new connection for host ba5lsbk0300.iind.intel.com, query type 1
14:33:14.617 [17085] <2> vnet_pbxConnect_ex: pbxConnectExEx Succeeded
14:33:14.632 [17085] <2> logconnections: PROXY CONNECT FROM 10.223.8.153.40945 TO 10.223.8.148.1556 fd = 0
14:33:14.633 [17085] <2> logconnections: BPDBM CONNECT FROM 127.0.0.1.40391 TO 127.0.0.1.35339 fd = 0
14:33:14.642 [17085] <2> db_end: Need to collect reply

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I trust that YOU are looking at more than the snippets you are sharing. 

Good luck.

quebek
Moderator
Moderator
   VIP    Certified

OK - so this is linux based machine...

I think this is your master issue

mount_open_media: error requesting media, TpErrno = File name already exists

in that case please go to  /usr/openv/netbackup/db/media/tpreq/
and run there ls command

remove any files found there - if no tape is being used...

what is the output from vmoprcmd? please share as well tpconfig -d

BharathAchar
Level 4

It appears to be bprd was hung,

Restarted the Netbackup services on master server, didn't help.

Rebooted the master server, did help and issue was resolved.