02-25-2021 01:20 PM
I'm running Solaris 11 and NetBackup 7.7.2 and have recently had my robots and drives stop working. When i go to "UP" the drives they appear to come up and then immediately fail. The drives themselves are Storagetek SL150s and they are not showing any errors and are at 3.74 firmware version. On the servers, when i run the scan command this is the output on both systems: ****************SDT_TAPE*************************
********************************SDT_CHANGER********************
and no devices are displayed. I ran tpautoconf -a and got no feedback. When I opened the GUI and tried to reconfigure the storage devices it founf the robot and tape drive but I still can't bring them UP in the gui.
Due to the nature of the servers I can't upload logs.
Thanks
02-25-2021 05:21 PM
You must likely have a problem between the OS and the devices, not NetBackup.
scan / tpautoconf only send the scsi inquiry command, and displays the response sent from the drive/ robot.
Solaris can be a bit of a pain with the sg driver, but if nothing has changed, this shouldn’t be an issue.
Can you see the devices from the OS ?
cfgadm -al -o show_FCP_dev I think ...
02-25-2021 10:27 PM
sgscan and scan commands perform a scan at OS-level.
It seems that the OS lost connectivity to the devices.
We often say this:
NBU has no direct connectivity to devices. It relies on the OS for access to devices.
Please check /var/adm/messages file for hardware errors.
(If this happened more than 3 days ago, look at messages.*)
If ALL devices disappeared, my suspicion will be with the HBA.
02-25-2021 11:29 PM
Here is a TN showing how scan works (tpautoconf -t is pretty much the same, just that the output is formatted differently)
https://isearch.veritas.com/internal-search/en_US/article.100047749.html
It shows that when you run scan, the scsi commands received by the library look like this:
Sep 22 04:30:48 vtl-v3 vtllibrary[1568]: CDB (225) (delay 120405): 12 00 00 00 60 00
Sep 22 04:30:48 vtl-v3 vtllibrary[1568]: CDB (226) (delay 805): 12 01 80 00 84 00
Sep 22 04:30:48 vtl-v3 vtllibrary[1568]: CDB (227) (delay 2405): 12 01 83 00 84 00
<snip>
... this is all it does - CDB 0x12 is just scsi inquiry
The numbers after the '12' are just slight variations on what is being queried, for example '12 01 80 00 84 00' would return the serial number of the drive.
You could run these same commands from another utility (sgscan as Marianne mentioned) and you would get teh same issue - no output.
03-02-2021 02:10 PM
I'm seeing the following error repeated on both the media and master servers:
ID702911 daemon error TLD(1) unavailable initizalition failed, unable to open robotic path
daemon error, robotic path /dev/sg/c0tw5xxxxx does not exist
03-02-2021 05:17 PM
03-02-2021 10:43 PM - edited 03-02-2021 11:40 PM
Please show us output of this command as per @mph999 's request:
cfgadm -al -o show_FCP_dev
This error:
robotic path /dev/sg/c0tw5xxxxx does not exist
... means one of 2 things:
1. The OS has lost connectivity to the device. You need to check physical connections and switch logs.
2. The device path has changed due to lack of persistent binding.
Running OS command 'cfgadm' will show you what the OS sees.
03-02-2021 11:24 PM
I doubt robtest will work. as it's complaining about the path to the robot.
robtest is 'almost' independent of NetBackup - but it does use the 'robot config' so it knows the path the robot(s).
03-03-2021 09:37 AM
Output of cfgadm command on server that's not seeing tape drives:
AP_Id Type receptacle occupant
c7 FC Connected unconfigured
C8 FC Connected unconfigured
C14 FC Connected unconfigured
Good server:
C12 FC-private Connected configured
C12::500104fxxxx tape connected configured
C12::5000104fxxxx med-changer connected configured
C13 FC connected unconfigured
03-03-2021 10:39 PM
I hope you have noticed the problem?
That all controllers show 'unconfigured' status?
You can reconfigure with cfgadm command:
cfgadm -c configure c7
(Repeat for all controllers)
followed by rebuilding device tree:
devfsadm -Cv
But best to get your OS team to investigate the cause as this might happen again.
Once devices are see again at OS-level, you will probably need to rebuild the NBU sg driver as well:
cd /usr/openv/volmgr/bin/driver
/usr/openv/volmgr/bin/sg.build all
- Install the new sg driver configuration:
/usr/bin/rm -f /kernel/drv/sg.conf
/usr/openv/volmgr/bin/driver/sg.install
- Check/verify config:
/usr/openv/volmgr/bin/sgscan
If device paths have changed, you will need to re-do NBU device config.
03-04-2021 02:55 PM
U followed your directions and ran these commands to configure the controllers but the results are the same when I ran the command: cfgadm -al -o show_FCP_dev
You can reconfigure with cfgadm command:
cfgadm -c configure c7
(Repeat for all controllers)
followed by rebuilding device tree:
devfsadm -Cv
03-04-2021 10:17 PM
You certainly have a problem between the OS and the device paths.
There is nothing to be done from NBU side.
As long as devices are not seen at OS-level, NBU cannot use them.
You need to check physical connections, switch logs, hba logs and/or errors in messages file.
Get OS and hardware support involved.