Forum Discussion

liuyl's avatar
liuyl
Level 6
6 years ago

The working mechanism about tldd/tldcd

1) Is the number of child tldcds determined by the job sessions, and it can exceed the current number of robots?
2) Why the tldd/tldcds cannot work as their own multi-threads, but as their corresponding child processes running?

For examples:

root@jcbak:/#
root@jcbak:/#
root@jcbak:/# ps -ef|grep tldcd
root 21862 24560 0 22:44 ? 00:00:00 tldcd -v
root 21905 24560 0 22:45 ? 00:00:00 tldcd -v
root 21923 24560 0 22:45 ? 00:00:00 tldcd -v
root 21924 24560 0 22:45 ? 00:00:00 tldcd -v
root 21925 24560 0 22:45 ? 00:00:00 tldcd -v
root 21927 23207 0 22:45 pts/26 00:00:00 grep tldcd
root 24560 1 0 Aug29 ? 00:38:20 tldcd -v
root@jcbak:/#
root@jcbak:/#
root@jcbak:/#
root@jcbak:/# tpconfig -l|grep -i robot
Device Robot Drive Robot Drive Device Second
robot 0 - TLD - - - - /dev/sg230
robot 1 - TLD - - - - /dev/sg231
robot 4 - TLD - - - - /dev/sg234
root@jcbak:/#
root@jcbak:/#

22 Replies

  •  

    Both processes are single threaded.  They are not written as multi-threaded processes and therefore do not behave as such.

    tldcd runs only on the robot control host - it communicates directly with the robot.  There should be one tldcd process per robot if I recall correctly.  Something has gone a bit amiss with your setup it seems:

    stop ltid (/usr/openv/volmgr/bin/stopltid), if the processes don't disappear after a minute, kill them.

    Under /usr/openv/volmgr (or <install>\veritas\volmgr if on the lesser operating system ;0) ) - there is a misc directory, delete it.

    Restart ltid (ltid -v), hopefully should be fine - (you will see an error due to the misc dir, but it should be recreated).

    If the robot has multiple paths to the host ??? - this might cause multipe tldcd processes, I think - not 100% sure, but easy to check.  You should only have one path, maybe two it you want redundancy, but this second path should be set as such in robtest.

    tldd runs on each machine that has tape drives, this would include the robot control host, so you have:

    Robot control host - tldcd and tldd (if it has drives, which it usually does)

    Robot control host - tldcd only, if no tape drives

    Hosts with tape drives only - tldd

    How it works, when a tape is requested to be loaded or unloaded for a job, tldd passes this request to tldcd (over the network if necessary, if the RCH is a seperate machine)

    tldcd send the actual scsi CDB to the library, for example, 0xa5 for move medium.

    • liuyl's avatar
      liuyl
      Level 6

      1) Such phenomenon is very widespread in our NBU envrionment, and these processes would always automatically disappear after a short time.
      2) I mean that why the tldd/tldcd have not been designed as multi-threads mechanism because it can run much more efficient than single-thread.

      • mph999's avatar
        mph999
        Level 6

        I don't know why you have multiple tldcd processes, the only thing I can think of, as I mentioned is if you had multiple paths, it 'might' be a reason ...  If you want to investigate it, you would need to start looking at logs, or probably as a better starting point, strace/ truss output on the PIDs of the tldcd processes to get some idea of what they might be doing.

        tldcd was written a long time ago, back in the day when NBU mainly wrote to tape as and was way less complex than today.   Most, if not all processes back then where single threaded.  

        Multithreaded processes appeared with NBU 6,  for example nbemm, nbrb, nbjm, nbpem but certainly for tldcd there is no real need, it doesn't actually do much, just sends a few commands to the robot every now and then, it's not 'busy' like nbemm, or nbjm.  I suspect it's a bit of a case of, if it ain't broke ....