cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to release MDS allocation ley

Karthick_S
Level 3
Certified

Media Manager daemons not running even after bouncing NBU services.

# ./vmoprcmd -d
oprd returned abnormal status (96)
IPC Error: Daemon may not be running

LTID debug log

02:28:41.603 [7063] <16> InitLtidDeviceInfo: Drive XXXX is ACTIVE
02:29:11.603 [7063] <4> CheckShutdownWhileInit:

Unable to release allocations for this drive. I have tried reset media server option too. It did not work.

1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Is this the only MDS allocation?
If so, have you tried nbrbutil -resetall?
If not, have you tried to release the allocation key explicitly?

Did you look for lock files on the media server?

View solution in original post

24 REPLIES 24

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Forget about releasing MDS allocation for now...

What is wrong with NBU daemons/processes?
Processes perhaps down because of low disk space?

Please show us output of 'bpps -x'

/opt/openv has 34 % free space which is 13 GB

bpps -x output :


NB Processes
------------
root 24332 1 0 02:47:58 ? 0:00 /usr/openv/netbackup/bin/bpcompatd
root 24288 1 0 02:47:54 ? 0:00 /usr/openv/pdde/pdag/bin/mtstrmd
root 24399 1 0 02:48:02 ? 0:01 /usr/openv/netbackup/bin/nbrmms
root 24117 1 0 02:47:42 ? 0:01 /usr/openv/netbackup/bin/nbdisco
root 24106 1 0 02:47:40 ? 0:00 /usr/openv/netbackup/bin/bpcd -standalone
p1wms2d2 18659 24106 0 03:22:21 ? 0:00 /usr/openv/netbackup/bin/bpcd -standalone
root 24450 1 0 02:48:09 ? 0:00 /usr/openv/netbackup/bin/nbsvcmon
root 24415 1 0 02:48:05 ? 0:05 /usr/openv/netbackup/bin/nbsl
root 24103 1 0 02:47:40 ? 0:00 /usr/openv/netbackup/bin/vnetd -standalone


MM Processes
------------
root 24314 1 0 02:47:57 ? 0:00 /usr/openv/volmgr/bin/ltid
root 24320 1 0 02:47:57 ? 0:00 vmd -v


Shared Symantec Processes
-------------------------
root 24085 1 0 02:47:34 ? 0:02 /opt/VRTSpbx/bin/pbx_exchange

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Is this a master or media server?

NONE of the daemons/processes that should be running on a master are running here.

If this is a media server, then there is a problem with comms to nbemm on the master server.

Let us first know what we are troubleshooting here - master or media server? 

This is a media server.

If you referring to EMM server, the connecitivty is good between EMM server and problem media server.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Looks like some issue with EMM comms at the moment.
Or with robot control host. 

Can you run these commands on the master and look for output specific to this media server?

nbemmcmd -listhosts -verbose

nbemmcmd -getemmserver

Command completed successfully.

nbemmcmd -listhosts -verbose

shows active disk for the problem media server

nbemmcmd -getemmserver

Media server shows correct emm server name

We have ACS and TLD drives configured on the media server. connecitivity from media server is good for robotic controller host and ACSLS server.

 

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

By the looks of it, you have VERBOSE entry in vm.conf (vmd -v)

Please check /var/adm/messages (assuming this is Solaris media server) for Media Manager startup.
Issues with process startup and/or comms with ACSLS server and/or robot control host should be logged here.

 

Genericus
Moderator
Moderator
   VIP   

If solaris - as root at the prompt:

run tpconfig

hit enter

select 2 for robot config - it will display the robots. Make sure you only have the correct robots assigned.

Hit Q to exit each screen back to prompt - make sure you do no edits only display, because you will cause issues!

I have seen where my solaris media servers will allocate multiple acsls servers, and cannot find the correct drives until the wrong ones are removed. Please verify your robots.

 

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Have you checked this technote: 

http://www.veritas.com/docs/000038499

Try enabling these logs under  /usr/openv/volmgr/debug:

   tpcommand
   robots
   ltid
   daemon
   reqlib

With VERBOSE in vm.conf, you shall be able to get some error message out of them.

Hi Marianne & others,

Thanks for your reply.

Logs from /var/adm/messages

1 ) cat messages |grep vmd

Oct 12 20:59:19 vmd[24320]: [ID 631293 daemon.notice] terminating - successful (0)

Oct 12 20:59:19 vmd[24320]: [ID 715111 daemon.error] volume daemon terminating because it received a signal (15)

Oct 12 20:59:19 vmd[24320]: [ID 164182 daemon.error] terminating - daemon terminated (7)

Oct 12 21:01:13 vmd[2236]: [ID 617826 daemon.notice] ready for connections

2) LTID debug logs

21:01:12.034 [2230] <4> ltid: INITIATING

21:01:12.053 [2230] <2> mm_getnodename: cached_hostname, cached_method 3

21:01:12.080 [2230] <2> mm_getnodename:  (3) hostname <MEDIA SERVER NAME>(from mm_master_config.mm_server_name)

21:01:12.080 [2230] <4> ltid: EMM tuning parameters: emm_retries = 1, emm_connect_timeout = 20, emm_request_timeout = 300

21:01:12.080 [2230] <4> ltid: using Scan Host status interval 300

21:01:12.080 [2230] <4> ltid: auto_correct_paths set to 0

21:01:14.090 [2230] <4> InitLtidEMM: emmserver_name = <EMM SERVER NAME>

21:01:14.090 [2230] <4> InitLtidEMM: emmserver_port = 1556

21:01:14.090 [2230] <4> InitLtidEMM: ThisHost = <MEDIA SERVER NAME>

21:01:14.105 [2230] <4> RedirectNBLoggerToLegacyLog: Setting up redirection of VxUL messages to legacy log

21:01:14.109 [2230] <4> RedirectNBLoggerToLegacyLog: Verbosity from VxUL configuration: 1

21:01:14.109 [2230] <4> RedirectNBLoggerToLegacyLog: logmsg verbosity set: 1

21:01:14.118 [2230] <2> Orb::init: Enabling ORBNativeCharCodeSet UTF-8(Orb.cpp:708)

21:01:14.120 [2230] <2> Orb::init: initializing ORB EMMlib_Orb with: ltid -ORBSvcConfDirective "-ORBDottedDecimalAddresses 0" -ORBSvcConfDirective "static Resource_Factory '-ORBNativeCharCodeSet UTF-8'" -ORBSvcConfDirective "static PBXIOP_Factory '-enable_keepalive'" -ORBSvcConfDirective "static EndpointSelectorFactory ''" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory PBXIOP_Factory'" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory IIOP_Factory'" -ORBDefaultInitRef '' -ORBSvcConfDirective "static PBXIOP_Evaluator_Factory '-orb EMMlib_Orb'" -ORBSvcConfDirective "static Resource_Factory '-ORBConnectionCacheMax 1024 '" -ORBSvcConf /dev/null -ORBSvcConfDirective "static Server_Strategy_Factory '-ORBMaxRecvGIOPPayloadSize 268435456'"(Orb.cpp:923)

21:01:14.132 [2230] <2> Orb::init: caching EndpointSelectorFactory(Orb.cpp:938)

21:01:14.134 [2230] <2> Orb::setOrbConnectTimeout: timeout seconds: 50(Orb.cpp:1570)

21:01:14.134 [2230] <2> Orb::setOrbRequestTimeout: timeout seconds: 1800(Orb.cpp:1579)

21:01:14.158 [2230] <4> InitLtidEMM: connected to EMM server

21:01:14.158 [2230] <2> Orb::setOrbRequestTimeout: timeout seconds: 300(Orb.cpp:1579)

21:01:14.158 [2230] <2> mm_getnodename: (0) hostname <MEDIA SERVER NAME>(from cached_hostname)

21:01:14.159 [2230] <2> retrieveLocalPatchVersion: Reading from /usr/openv/netbackup/version

21:01:14.159 [2230] <2> parsePatchVersionString: parsing = >7.6.0.3

<

21:01:14.159 [2230] <2> parsePatchVersionString: theRest = ><

21:01:14.200 [2230] <4> InitLtidEMM: Device Mappings version in EMM database is 1.120

21:01:14.212 [2230] <4> InitLtidEMM: Local device mapping is up-to-date

21:01:14.394 [2230] <4> InitLtidEMM: Releasing media server allocations

21:01:14.402 [2230] <2> Orb::init: Enabling ORBNativeCharCodeSet UTF-8(Orb.cpp:708)

21:01:14.402 [2230] <2> Orb::init: initializing ORB Default_CLIENT_Orb with: Unknown -ORBSvcConfDirective "-ORBDottedDecimalAddresses 0" -ORBSvcConfDirective "static Resource_Factory '-ORBNativeCharCodeSet UTF-8'" -ORBSvcConfDirective "static PBXIOP_Factory '-enable_keepalive'" -ORBSvcConfDirective "static EndpointSelectorFactory ''" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory PBXIOP_Factory'" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory IIOP_Factory'" -ORBDefaultInitRef '' -ORBSvcConfDirective "static PBXIOP_Evaluator_Factory '-orb Default_CLIENT_Orb'" -ORBSvcConfDirective "static Resource_Factory '-ORBConnectionCacheMax 1024 '" -ORBSvcConf /dev/null -ORBSvcConfDirective "static Server_Strategy_Factory '-ORBMaxRecvGIOPPayloadSize 268435456'"(Orb.cpp:923)

21:01:14.404 [2230] <2> Orb::init: caching EndpointSelectorFactory(Orb.cpp:938)

21:01:17.791 [2230] <16> InitLtidDeviceInfo: Drive DLT7_53_DP_19 is ACTIVE

21:01:17.792 [2230] <16> InitLtidDeviceInfo: Drives are still assigned on this host

21:01:47.792 [2230] <4> CheckShutdownWhileInit:

21:01:48.041 [2230] <16> InitLtidDeviceInfo: Drive DLT7_53_DP_19 is ACTIVE

21:02:18.042 [2230] <4> CheckShutdownWhileInit:

21:02:18.042 [2230] <4> CheckShutdownWhileInit: Pid=1, Data.Pid=2582, Type=100, Param1=0, Param2=-1, LongParam=2106721584

21:02:18.290 [2230] <16> InitLtidDeviceInfo: Drive DLT7_53_DP_19 is ACTIVE

3) nbrbutil -resetMediaServer <mediaserver>, releaseMDS did not release the allocation for DLT7_53_DP_19

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
It does not help to grep for vmd in messages. The problem is not with vmd but with the rest of media manager processes not starting.
Please post all entries in messages file from 21:00 onwards.

/var/adm/messages

Oct 12 20:02:59 inetd[5532]: [ID 317013 daemon.notice] bgssd[25087] from 150.234.116.184 53306
Oct 12 20:59:19 vmd[24320]: [ID 631293 daemon.notice] terminating - successful (0)
Oct 12 20:59:19 vmd[24320]: [ID 715111 daemon.error] volume daemon terminating because it received a signal (15)
Oct 12 20:59:19 vmd[24320]: [ID 164182 daemon.error] terminating - daemon terminated (7)
Oct 12 21:01:13 vmd[2236]: [ID 617826 daemon.notice] ready for connections
Oct 12 21:01:17 ltid[2230]: [ID 533143 daemon.error] ltid can not be started while resources are assigned to the host.
Oct 12 21:03:01 inetd[5532]: [ID 317013 daemon.notice] bgssd[4375] from 150.234.116.184 39884
Oct 12 22:03:03 inetd[5532]: [ID 317013 daemon.notice] bgssd[13928] from 150.234.116.184 57575
Oct 12 23:03:05 inetd[5532]: [ID 317013 daemon.notice] bgssd[24184] from 150.234.116.184 39725
Oct 12 23:03:46 inetd[5532]: [ID 317013 daemon.notice] bgssd[24404] from 150.234.116.184 44176
Oct 12 23:03:46 inetd[5532]: [ID 317013 daemon.notice] bgssd[24406] from 150.234.116.184 44206
Oct 12 23:03:46  inetd[5532]: [ID 317013 daemon.notice] bgssd[24411] from 150.234.116.184 44251

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Can you show us the MDS allocation portion of nbrbutil -dump?

Also check for *lock* files in volmgr/misc.
If there are any, stop running processes, delete lock files and start NBU again.

MdsAllocation: allocationKey=55846077 jobType=1 mediaKey=4035269 mediaId=W91587 driveKey=2001252 driveName=DLT7_53_DP_19 drivePath=/dev/rmt/7cbn stuName=DP19_VTL masterServerName=wspebrms01-ebr mediaServerName=<media server name> ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0 fatPipeKey=0 scsiResType=1 serverStateFlags=1

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Is this the only MDS allocation?
If so, have you tried nbrbutil -resetall?
If not, have you tried to release the allocation key explicitly?

Did you look for lock files on the media server?

I have cleared all lock files and started NBU. Sorry, forgot to mention in the previous reply. 

And this is the only MDS allocation for this media server. Tried releasing this allocation. it did not work.

Haven't run nbrbutil -resetall. Please advice whether it will affect other jobs running on master.

 

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Resetall will kill all running jobs.

No more ideas from my side... other than being curious to see your nbrbutil command and if there is any output.

Thanks much for your suggestions.

I have requested for downtime to reset all nbrb allocations on master server. will keep this thread posted.

nbrbutil -releaseMDS 55846077 

this did not help !!!??  what does it say ??

i see that this tape W91587 is in the drive, have you managed to get this tape out from the drive ? use robtest or eject the tape from the libray console and see if you can releaseMDS with nbrbutil command again