Forum Discussion
1) Such phenomenon is very widespread in our NBU envrionment, and these processes would always automatically disappear after a short time.
2) I mean that why the tldd/tldcd have not been designed as multi-threads mechanism because it can run much more efficient than single-thread.
I don't know why you have multiple tldcd processes, the only thing I can think of, as I mentioned is if you had multiple paths, it 'might' be a reason ... If you want to investigate it, you would need to start looking at logs, or probably as a better starting point, strace/ truss output on the PIDs of the tldcd processes to get some idea of what they might be doing.
tldcd was written a long time ago, back in the day when NBU mainly wrote to tape as and was way less complex than today. Most, if not all processes back then where single threaded.
Multithreaded processes appeared with NBU 6, for example nbemm, nbrb, nbjm, nbpem but certainly for tldcd there is no real need, it doesn't actually do much, just sends a few commands to the robot every now and then, it's not 'busy' like nbemm, or nbjm. I suspect it's a bit of a case of, if it ain't broke ....
- mph9996 years agoLevel 6
A quick guide to logs, which the wonderful Marianne helped me put together ...
https://vox.veritas.com/t5/Articles/Quick-Guide-to-Setting-up-logs-in-NetBackup/ta-p/811951
- liuyl6 years agoLevel 6
It seems that there really can be more than one process holding on the same robot from the debug log!
And also they can respectively work fine without any problem! So why ?Here just for example (process 21923/21924):
22:45:07.679 [24560] <2> robotd_check_magic: Not using VxSS authentication.
22:45:07.679 [24560] <6> tldcd: process_request: ../tldcd.c.3034, process_request(), received command=1, from peername=shzycdb5, version 50
22:45:07.679 [24560] <5> tldcd:mount_unmount_drive: Processing MOUNT, TLD(1) drive 6, slot 18, barcode 000018L6 , vsn 0018L6
22:45:07.697 [21923] <5> tldcd:command_init: TLD(1) opening robotic path /dev/sg231
22:45:09.373 [24560] <6> tldcd:check_unit_attention: command_init to check for UA on robot 0
22:45:09.393 [24560] <3> tldcd:mode_sense: <../tldcd.c:7079> Device geometry: NumDrives = 12 at address 256
22:45:09.393 [24560] <3> tldcd:mode_sense: --> NumSlots = 708 at address 4096
22:45:09.393 [24560] <3> tldcd:mode_sense: --> NumTransports = 1 at address 1
22:45:09.393 [24560] <3> tldcd:mode_sense: --> NumIE = 24 at address 16
22:45:09.398 [24560] <6> tldcd:listen_loop: accept: newfd = 20, error = 0, timersig = 0
22:45:09.399 [24560] <4> peer_hostname_ipi: Connection from host jcyxfdb1, 10.131.24.154, port 63860
22:45:09.399 [24560] <4> peer_hostname_ipi: Connection to host jcbak, 10.131.33.60, port 1556
22:45:09.399 [24560] <2> robotd_check_magic: Not using VxSS authentication.
22:45:09.399 [24560] <6> tldcd: process_request: ../tldcd.c.3034, process_request(), received command=3, from peername=jcyxfdb1, version 50
22:45:09.399 [24560] <5> tldcd:mount_unmount_drive: Processing UNMOUNT, TLD(1) drive 7, slot 90, barcode 000090L6 , vsn 0090L6
22:45:09.417 [21924] <5> tldcd:command_init: TLD(1) opening robotic path /dev/sg231
22:45:10.533 [24560] <6> tldcd:check_unit_attention: command_init to check for UA on robot 0
22:45:10.554 [24560] <3> tldcd:mode_sense: <../tldcd.c:7079> Device geometry: NumDrives = 12 at address 256
22:45:10.554 [24560] <3> tldcd:mode_sense: --> NumSlots = 708 at address 4096
22:45:10.554 [24560] <3> tldcd:mode_sense: --> NumTransports = 1 at address 1
22:45:10.554 [24560] <3> tldcd:mode_sense: --> NumIE = 24 at address 16
22:45:10.556 [24560] <6> tldcd:inquiry: <../tldcd.c:6929> Read device table for QUANTUM Scalar i6000 745Q, type
22:45:23.394 [24560] <3> tldcd:mode_sense: --> NumTransports = 2 at address 1
22:45:23.394 [24560] <3> tldcd:mode_sense: --> NumIE = 16 at address 769
22:45:23.395 [24560] <6> tldcd:inquiry: <../tldcd.c:6929> Read device table for IBM 03584L32 A470, type 8, slots 600 and ie 16
22:45:23.395 [24560] <4> MmDeviceMappings::GetRobotAttributes
: <../../lib/MmDeviceMappings.cpp:976> search robot list (length=456) for IBM 03584L32, type 8
22:45:23.395 [24560] <4> MmDeviceMappings::GetRobotAttributes
: <../../lib/MmDeviceMappings.cpp:1229> found match: "IBM 3584" IBM 03584L
22:45:23.395 [24560] <5> tldcd:inquiry: inquiry() function processing library IBM 03584L32 A470:
22:45:23.395 [24560] <6> tldcd:check_unit_attention: initalizing robot 6
22:45:23.395 [24560] <6> tldcd:initialize_robot_hardware: command_init on robot 6
22:45:23.395 [24560] <5> tldcd:command_init: TLD(6) opening robotic path MISSING_PATH:2U10616001
22:45:23.395 [24560] <6> tldcd:listen_loop: accept: newfd = -1, error = 0, timersig = 1
22:45:23.482 [21924] <6> tldcd:tape_in_drive: valid = 1, sel = 4185, barcode = (000090L6 )
22:45:23.482 [21924] <6> tldcd:read_element_status_drive: RES drive 7
22:45:23.613 [21924] <6> tldcd:read_element_status_slot: RES storage element 90
22:45:23.744 [21924] <5> tldcd:move_medium: TLD(1) initiating MOVE_MEDIUM from addr 262 to addr 4185
22:45:31.853 [21924] <5> tldcd:tld_main: TLD(1) closing/unlocking robotic path- mph9996 years agoLevel 6
I tested this - there is in fact only one tldcd process, even for two robots.
I can only image something is a little messed up in the config, did you try deleting the misc dir as I suggested previously.
If that doesn;t help, I'd delete and readd the devices.
I would use nbemmcmd -deletealldevices -allrecords
That will delete all libraries and tape drives, then just readd them and reinventory.
- liuyl6 years agoLevel 6
But I noticed that the two tldcd processes, in fact, did not take actions at the same point-in-time, though they were concurrently holding on the same robot!
Related Content
- 6 months ago