cancel
Showing results for 
Search instead for 
Did you mean: 

robot is down - drives show SCAN and AVR

randes2000
Level 4

My SLES 10 media server shows robot down and all eight drives in the library either in SCAN control or AVR.  All host names listed below are fictitious, as I cannot post host names or IP addresses.

Master server - Solaris 10 (sparc) Netbackup 7.5.0.6

Robot control media server (media1)- Solaris 10 (sparc) Netbackup 7.5.0.6

Affected media server (media2) - SLES 10 Netbackup 7.1.0  (cannot easily update as it is a Teradata BAR server and software updates require Teradata authorization)

Library - Storagtek SL150

media2 had been working before this incident. media1 is writing to tape OK.

I have bounced both media servers, restarted all Netbackup processes in several attempts to resolve.  Have also cycled processes shown with vmps several times.  scan command shows changer and all eight tape drives.  tpconfig -d shows all drives UP, lists the defined robot, the robot control host and EMM server correctly.

/var/log/messages on media2 lists this: tldd[29508]: TLD(0) [29508] unable to connect to tldcd on media1: cannot connect to the robotic softwae daemon (42)           However, tldcd is running on media1.

Network connection is good between media1 and media2 and master server.

I cannot copy output of any command as the affected system is on my high side.  I have an identical setup on my low side which is working and I can compare the output of any command to the high side if this would help.  I understand the difficutly of troubleshooting without complete output from commands or logs and I'll work with whoever responds to this request.  If the output is not too extensive, I can retype to this thread.

1 ACCEPTED SOLUTION

Accepted Solutions

Mark_Solutions
Level 6
Partner Accredited Certified

As you have certain difficulties with access it may just be easiest as a first step re-running the device configuration wizard to see if you can re-establish proper control

Perhaps the media servers are not using persistent bindings causing paths to have changed (luns etc.)

The issue sounds more like a firewall / communications issue to be honest but you will know if that is the case when you run the wizard as it will not be able to conenct to the media servers

Try that as a first step and let us know the result

View solution in original post

7 REPLIES 7

Mark_Solutions
Level 6
Partner Accredited Certified

As you have certain difficulties with access it may just be easiest as a first step re-running the device configuration wizard to see if you can re-establish proper control

Perhaps the media servers are not using persistent bindings causing paths to have changed (luns etc.)

The issue sounds more like a firewall / communications issue to be honest but you will know if that is the case when you run the wizard as it will not be able to conenct to the media servers

Try that as a first step and let us know the result

randes2000
Level 4

Tried to include all I have done in my first post, but I forgot the Device Config Wizard. I have run this several times after trying different things.  In each run the robot and 8 tape drives are found and configured.  The wizard completes without error.  In the Admin Console under Media and Device Management > Devices > Robots, the listing for media2 shows No for Enabled.  All other media servers show Yes for Enabled.

Mark_Solutions
Level 6
Partner Accredited Certified

You havent used an evaluation license key on that media server by any chance?

Worth checking its licenses are OK or re-add them

 

randes2000
Level 4

Not a license key issue. get_license_key shows no expiration on the Netbackup Enterprise Server key.  This media server is backing up OK to BasicDisk storage units.  Just can't copy to tape.

randes2000
Level 4

Whle waiting for another post, the library "magically" came up and a duplication that was queued started running.  Thanks for the input, but I don't know what fixed it.....

Mark_Solutions
Level 6
Partner Accredited Certified

OK - sounds like a communications issue as i mentioned earlier but guess it will be hard to find out unless something has been logged (bpcd / bptm)

Worth putting the tape drive and SSO licenses on a media server if used

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

So, in summary, if my understanding is correct, you have the following:

master - EMM server
media1 - Robot control host
media2 - media server sharing drives.

If this is correct, you need to verify pbx (port 1556) connectivity between media1 and media2.

Connectivity with master (EMM server) seems fine, that is why the device wizard is working.

media2 needs to connect to tldcd on media1 using port 1556. media1 needs to connect back to media2 on port 1556 to communicate robot status.

Try to telnet from media1 to media2 on port 1556. Do the same for media2 - telnet to port 1556 on media1.

Please also verify forward and reverse name lookup. If you are using hosts files and any FQDNs are added as aliases, please check that ALL entries for master, media1 and media2 have FQDN aliases in hosts files.