EUB
8 years agoLevel 4
Communication errors
Hello I got: Win2008R2 Master server Tape Library Quantum Scalar i500 Oracle(RHEL) client machine 3 defferent Windows 2008R2 client machine All 3 Windows client backups fails with error 98: ...
From cmd, check output of 'bpps' on Windows master/media server.
(Command is in ...netbackup\bin)
There should be one tldcd process (robot control) and one tldd process for each robot-controlled tape drive.
Also check output of 'vmoprcmd -d'
(in ....volmgr\bin) to ensure that all drives are in TLD control (not AVR).
This post is getting a bit lengthy and I now have to constantly page between previous and current posts...
Not sure if I have asked the following:
Add VERBOSE entry to vm.conf on master/media server
Create debug logs on master/media such as ltid and robots (see http://www.veritas.com/docs/000109772 )
and restart Device Manager service.
Look for errors in Event Viewer System and Application logs and debug logs when status 98 is seen.
Thank you!
Here is a screenshot's of the teams "vmoprcmd -d" and "bpps"
"VERBOSE" was added earlier.
You can see that there is no tldcd process and that tape drives are in AVR (non-robotic) control.
Check Windows Event Viewer System and Application log to see when and why tldcd went down.
Please enable VERBOSE logging and create log folders, then restart NBU Device Management service.
The service restart should hopefully start tldcd again.
Verbose logging should help with future troubleshooting.
PS:
6 tape drives attached to a single master/media server seems a bit much.
Have you confirmed sufficient physical resources (memory, cpu, hba's, pci slots, bus speed, etc) to ensure that a single server can manage and stream 6 x LTO6 tape drives simultaneously?
VERBOSE included. Service were restarted few times.
Environment is old, and in that configuration works well
The master server is powerful enough (CPU 2xIntel Xeon E5-2620 2.00GHz 24 Core, 32Gb RAM, 250Gb HDD, 10Gbps LAN) OS Windows Server 2008R2
We have a robotic library Quantum Scalar i500 with 6 LTO drives in it
Why robotic library must be non-robotic?
Also we got another backup spot(media server attached to kzwnb01) with name kzwnb03 and the same hardware configuration as described above only it is installed OS Windows Server 2012R2 and no backup fails with error code-98 on it
Here is screenshot from media server kzwnb03 where all tape drives are not in AVR(and works fine).
Service were restarted few times.
So, by the looks of vmoprcmd ouput, drives are back in TLD control.
This means that the service restart has started tldcd (robot control daemon).
It also seems as if 2 tape drives are loaded - meaning backups are running that previously ended in status 98?
Next time you see status 98, please check Windows System and Application logs for possible reasons why tldcd is going down.
Services restart seems to be a short-term solution.
You need to find out why tldcd is going down as this quite abnormal.
The last time I have seen tldcd going down for no apparent reason was about 12 years ago when that particular NBU version had a problem with AUTO_UPDATE_ROBOT entry in vm.conf. tldcd went down each time the MAP was opened instead of updating the robot inventory.
Windows System and/or Application logs will tell you if the server has lost connectivity to the robot or if robot is not responding as a result of robot door being opened, etc... etc...
Other helpful logs will be debug logs under volgmr. Hopefully you have created log folders such as 'robots'?
Why robotic library must be non-robotic?
Seems you misunderstood what I was trying to say.
The tape drives were in AVR /non-robotic mode because tldcd was not running.
tldcd not running was the reason for the most recent status 98.