cancel
Showing results for 
Search instead for 
Did you mean: 

Media Server - LTO5 IN Solaris 11 (LDOMS) Issues

Nuno_Grave
Level 4
Partner Accredited

I have a problem with the configuration with two Media Servers.

SO : Solaris 11 (LDOM)
Tape Drive : LTO5 IBM


The devices are properly configured and presented to two Media Servers

root@plpsvm19 # sgscan tape
#
#WARNING: detected StorEdge Network Foundation connected devices not in
#         SG configuration file:
#
#    Device World Wide Port Name 500601613ea012c4
#    Device World Wide Port Name 500601683ea012c4
#    Device World Wide Port Name 500601613ea012c4
#
#    See /usr/openv/volmgr/NetBackup_DeviceConfig_Guide.txt chapter
#    "Special configuration for "Sun StorEdge Network Foundation" HBA/Driver"
#    for information on how to use sg.build and sg.install to
#    configure these devices
#

/dev/sg/c0tw500507604481cc32l0: (/dev/rmt/10): "IBM     ULT3580-TD5"
/dev/sg/c0tw500507604481cc34l0: (/dev/rmt/20): "IBM     ULT3580-TD5"
/dev/sg/c0tw500507604481cc39l0: (/dev/rmt/30): "IBM     ULT3580-TD5"
/dev/sg/c0tw500507604481cc3bl0: (/dev/rmt/40): "IBM     ULT3580-TD5"
root@plpsvm19 # sgscan changer
#
#WARNING: detected StorEdge Network Foundation connected devices not in
#         SG configuration file:
#
#    Device World Wide Port Name 500601613ea012c4
#    Device World Wide Port Name 500601683ea012c4
#    Device World Wide Port Name 500601613ea012c4
#
#    See /usr/openv/volmgr/NetBackup_DeviceConfig_Guide.txt chapter
#    "Special configuration for "Sun StorEdge Network Foundation" HBA/Driver"
#    for information on how to use sg.build and sg.install to
#    configure these devices
#

/dev/sg/c0tw500507604481cc32l1: "IBM     03584L32"
root@plpsvm19 #

Netbackup Setup

root@plpsvm19 # vmglob -listall -b | egrep -i "svm"
robot   ROBOT1                0000078252490403          plpsvm22.bk.xplocal
drive   IBM.ULT3580-TD5.003   00078A219A                plpsvm22.bk.xplocal
drive   IBM.ULT3580-TD5.002   00078A21A1                plpsvm22.bk.xplocal
drive   IBM.ULT3580-TD5.001   00078A2195                plpsvm22.bk.xplocal
drive   IBM.ULT3580-TD5.000   00078A2197                plpsvm22.bk.xplocal
robot   ROBOT1                0000078252490403          plpsvm19.bk.xplocal
drive   IBM.ULT3580-TD5.003   00078A219A                plpsvm19.bk.xplocal
drive   IBM.ULT3580-TD5.002   00078A21A1                plpsvm19.bk.xplocal
drive   IBM.ULT3580-TD5.001   00078A2195                plpsvm19.bk.xplocal
drive   IBM.ULT3580-TD5.000   00078A2197                plpsvm19.bk.xplocal
root@plpsvm19 #
root@plpsvm19 # tpconfig -l
Device Robot Drive       Robot                    Drive                Device          Second
Type     Num Index  Type DrNum Status  Comment    Name                 Path            Device Path
robot      1    -    TLD    -       -  -          -                    /dev/sg/c0tw500507604481cc32l1
  drive    -    0 hcart2    1    DOWN  -          IBM.ULT3580-TD5.000  /dev/rmt/10cbn
  drive    -    1 hcart2    2      UP  -          IBM.ULT3580-TD5.001  /dev/rmt/20cbn
  drive    -    2 hcart2    3      UP  -          IBM.ULT3580-TD5.002  /dev/rmt/30cbn
  drive    -    3 hcart2    4      UP  -          IBM.ULT3580-TD5.003  /dev/rmt/40cbn
root@plpsvm19 #

The status of the drives:

root@plpsvm19 #
root@plpsvm19 # mt -f /dev/rmt/10cbn stat
/dev/rmt/10cbn: no tape loaded or drive offline            #This state show device "Normal", but in  device i have one tape mounted, (see in robtest output bellow)
root@plpsvm19 # mt -f /dev/rmt/20cbn stat
/dev/rmt/20cbn: no tape loaded or drive offline
root@plpsvm19 # mt -f /dev/rmt/30cbn stat
/dev/rmt/30cbn: no tape loaded or drive offline
root@plpsvm19 # mt -f /dev/rmt/40cbn stat
/dev/rmt/40cbn: no tape loaded or drive offline
root@plpsvm19 #
root@plpsvm19 # robtest
Configured robots with local control supporting test utilities:
  TLD(1)     robotic path = /dev/sg/c0tw500507604481cc32l1

Robot Selection
---------------
  1)  TLD 1
  2)  none/quit
Enter choice: 1

Robot selected: TLD(1)   robotic path = /dev/sg/c0tw500507604481cc32l1

Invoking robotic test utility:
/usr/openv/volmgr/bin/tldtest -rn 1 -r /dev/sg/c0tw500507604481cc32l1

Opening /dev/sg/c0tw500507604481cc32l1
MODE_SENSE complete
Enter tld commands (? returns help information)
s d\
drive 1 (addr 257) access = 0 Contains Cartridge = yes
Source address = 1035 (slot 9)
Barcode = F42814                               ; #mounted tape drive, as indicated above
SCSI ID from drive 1 is 66
drive 2 (addr 258) access = 1 Contains Cartridge = no
SCSI ID from drive 2 is 68
drive 3 (addr 259) access = 1 Contains Cartridge = no
SCSI ID from drive 3 is 73
drive 4 (addr 260) access = 1 Contains Cartridge = no
SCSI ID from drive 4 is 75
READ_ELEMENT_STATUS complete
unload d1                                      ; when i unload
Opening /dev/rmt/10cbn, on the local host, please wait...
Error - cannot open /dev/rmt/10cbn (I/O error)       ; i get error
q

Robot Selection
---------------
  1)  TLD 1
  2)  none/quit
Enter choice: 2
root@plpsvm19 #

Nor have the case of SCSI reservations because of the drives in SSO:

root@plpsvm19 # vmoprcmd -crawlreleasebyname IBM.ULT3580-TD5.000
Host plpsvm19.bk.xplocal returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found
Host plpsvm22.bk.xplocal returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found

Host plpsvm19.bk.xplocal returned: No PERSISTENT REGISTRATIONS or SPC-2 RESERVATION found
root@plpsvm19 #

There is an inconsistency between what system indicates  and  actual state of the device

There are no errors and status is normal In :

SAN,
Robot
Tape Drives 

NetBackup uses the drives, because they are state show  up,  mounts the tape, can not write data into the same and can not dismount the tape

22 REPLIES 22

Marianne
Level 6
Partner    VIP    Accredited Certified

It seems tape is already in the 'unload' position in the drive and can be moved back to its 'home' slot.

m d1 s9

Andy_Welburn
Level 6

Maybe library has been loaded manually & *all* empty slots have been filled forgetting that some may actually be 'occupied' by those that are loaded in drive(s).

NB/robot cannot move it back & therefore leaves it in the drive in the ' unloaded' state

Nuno_Grave
Level 4
Partner Accredited

unload d1 ; when i unload
Opening /dev/rmt/10cbn, on the local host, please wait...
Error - cannot open /dev/rmt/10cbn (I/O error) ; i get error
q

Andy_Welburn
Level 6

as Marianne has stated?

Try Mariannes suggestion to move it back to it's original slot

Rui_Almeida
Level 2

The tape is not accessible to move it to it's home slot as stated:

drive 1 (addr 257) access = 0 Contains Cartridge = yes

the unload didn´t work, because it seems the Media Server doesn´t show the correct status of the tape device, besides the correct configuration.

On another LDOM with IBM LTO4 tape drives, with same configuration, we don´t have any working problem.

Any know issues related to IBM LTO5 in Solaris11?

 

Marianne
Level 6
Partner    VIP    Accredited Certified

The robot believes there is a tape in the drive.

May reboot the robot?

Other alternative: go to the robot, open the door and physically check the drive.
Press the eject button on the drive.

Opening and closing the door will also force the robot to inventory itself.

 

Rui_Almeida
Level 2

Hi Marianne,

No, the robot is Ok, i can mount/dismount from de RobotGUI the tape on that drive without problems.

The tape drive status there is mounted and Ok also.

On the SO, the status of the tape drive doesn´t change when tape is mounted or dismounted:

root@plpsvm19:/>mt -f /dev/rmt/10cbn stat
/dev/rmt/10cbn: no tape loaded or drive offline
root@plpsvm19:/>

I believe is something on the Media Server regarding scsi commands not reporting the correct status of the drive to the SO. Something at driver or SO level.

I already reinstalled the sg driver, with some reconfig reboots as either, and what i get is the tape drive is visible to the SO, like we correctly managed in the config, we can mount the tape from a netbackup job, but can´t write and dismount to it.

I have 2 Ldoms Media Servers with IBM LTO4 working fine, and 2 Ldoms Media Servers with IBM LTO5, none of which working. All of them has the same SO tape device configurations.

Wonder if someone had same issues with LTO5 drives in Solaris11...

Marianne
Level 6
Partner    VIP    Accredited Certified

Can we take a step back?

Are we talking about Guest or Control Domain?
 

Seems only Disk Storage Units are supported in Guest Domains (probably because of issues such as this one).

See http://www.symantec.com/docs/TECH162994

All NetBackup components supported with Solaris 10 SPARC physical servers are supported in a Solaris 10 LDoms Control Domain ....
Guest domain support is limited to standard client, database agents, master server and disk media server.

Since Solaris 11 is not explicitly mentioned, we can assume that same will apply to Solaris 11.

Nuno_Grave
Level 4
Partner Accredited
I talk about I/O Domain.

Marianne
Level 6
Partner    VIP    Accredited Certified

You may want to request clarity about I/O domain from Symantec Support.

The way I read TN http://www.symantec.com/docs/TECH162994   is that tape devices are supported in Control domain only.

Nuno_Grave
Level 4
Partner Accredited

I don't have entry in solaris 11 "/kernel/drv/sparcv9/st" for IBM ULT3580-TD5 OR "IBM     03584L32" is normal ?

root@plpsvm19 # strings /kernel/drv/sparcv9/st |grep -i IBM
IBM Ultrium Gen 4 LTO
IBM     ULTRIUM-TD4
*0IBM Ultrium Gen 4 LTO
IBM     ULT3580-TD4
*0IBM Ultrium Gen 3 LTO
IBM     ULTRIUM-TD3
IBM Ultrium Gen 3 LTO
IBM     ULT3580-TD3
IBM Ultrium Gen 2 LTO
IBM     ULTRIUM-TD2
#dIBM Ultrium Gen 2 LTO
IBM     ULT3580-TD2
#dIBM Ultrium LTO
IBM     ULTRIUM-TD1
IBM Ultrium LTO
IBM     ULT3580-TD1
IBM 3592 Cartridge
IBM     03592


I try put in /kernel/drv/st.conf 

"IBM     ULT3580-TD5", "IBM     ULT3580-TD5     ", "CFGIBMULT3580TD5";
CFGIBMULT3580TD5 = 2,0x3B,0,0x1018619,4,0x46,0x46,0x58,0x58,3,60,1500,600,16920,780,780,16380;

#tape-config-list=
# "VENDOR  A Product ID", "A Prettier Name to Display", "A-Config-Name",
# "VENDORB Other Product ID", "An Other Pretty Name", "Other-Config-Name";
#
# Then for each config-Name there must be a setup string that looks like this.
#
# A-Config-Name = \
#   2,0x34,0,0x18659,4,0x47,0x47,0x47,0x47,1,120,120,3600,3600,3600,3600,3600;
# Other-Config-Name = 1,0x3B,0,0x18659,4,0x40,0x40,0x40,0x40,3;
tape-config-list=
"IBM     ULT3580-TD5", "IBM     ULT3580-TD5     ", "CFGIBMULT3580TD5";
CFGIBMULT3580TD5 = \
  2,0x3B,0,0x1018619,4,0x46,0x46,0x58,0x58,3,60,1500,600,16920,780,780,16380;

But, no change.

Robot Selection
---------------
  1)  TLD 1
  2)  none/quit
Enter choice: 1

Robot selected: TLD(1)   robotic path = /dev/sg/c0tw500507604481cc32l1

Invoking robotic test utility:
/usr/openv/volmgr/bin/tldtest -rn 1 -r /dev/sg/c0tw500507604481cc32l1

Opening /dev/sg/c0tw500507604481cc32l1
MODE_SENSE complete
Enter tld commands (? returns help information)
s d
drive 1 (addr 257) access = 1 Contains Cartridge = no
SCSI ID from drive 1 is 66
drive 2 (addr 258) access = 0 Contains Cartridge = yes
Source address = 1039 (slot 13)
Barcode = F42824
SCSI ID from drive 2 is 68
drive 3 (addr 259) access = 0 Contains Cartridge = yes
Source address = 1036 (slot 10)
Barcode = F42817
SCSI ID from drive 3 is 73
drive 4 (addr 260) access = 1 Contains Cartridge = no
SCSI ID from drive 4 is 75
READ_ELEMENT_STATUS complete
unload d2
Opening /dev/rmt/20cbn, on the local host, please wait...
Error - cannot open /dev/rmt/20cbn (I/O error)
q

root@plpsvm19 # mt -f /dev/rmt/20cbn status
/dev/rmt/20cbn: no tape loaded or drive offline
root@plpsvm19 #

Mark_Solutions
Level 6
Partner Accredited Certified

Have you done an s s in robtest to see if anything else is in slot 13?

If so just find an empty slot and do a m d2 s xx and then update the NBU inventory

Hope this helps

GlenG
Level 4

The reply I received in the past from Sun's NetBackup support indicated that NetBackup depends on the O/S for tape access and control.  Have you tried Oracle's support - MOS or MOSC?

I had NetBackup/tape library (SL48 w/LT04's) problems for many months.  NetBackup would fail to talk to the robot.  In the end I believe it was firmware patch to the FC SW that resovled it.

GlenG

Rui_Almeida
Level 2

Hi GlenG,

Our problem is different, because robotic control works just fine but LTO5 drives don´t.

Can you tell us more about your FC SW firmware issue? Our FC sw's are brocade 5300, with fabric OS v6.4.0b. We are planning also to apply a more recent fw. Still is not any issue for any other distributions like SLES10/11 and Windows in the same SAN and for the same tape drives, so for me the problem is located in the Solaris 11 or in Ldom Hw...

We already asked IBM about support for LTO5 in Solaris11, but without a definitive conclusion about which drivers can we use (native or IBM driver).

 

GlenG
Level 4

xmaavx,

-> Can you tell us more about your FC SW firmware issue?

I am using Brocade 5000 - I do not know the patch level off hand.  The problems we saw was NetBackup could not see the robot or saw it as busy.  It could happen just about any time of the day or week, but would happen about once every 6 to 10 weeks.  After the SE replaced all the hardware but the case, he suggest it might be a software problem.  Solaris support had me move FC cables send them dumps from the library and tape drives, dumps from the FC sw and of course many explorers.  I updated the drive and library firmware several times - nothing helped.

 

The problems did not stop until I bugged the SE to patch the 5000.  I think it as been about 18 months without problems.  I see errors in the log from time to time but NetBackup seems to understand and keeps on going.

 

Isn't it just amazing how hard it is to get reliable hardware specific info?

 

have a good day,

GlenG

Rui_Almeida
Level 2

Hi GlenG,

Thanks for your input. We will also go for FC switch firmware updates as either.

In this case I found:

Symantec has a tested matrix with Media Servers in Solaris11 with IBM LTO5...

IBM told us that native Solaris drivers should work, but we can install IBM drivers supported in Solaris10.

Oracle didn´t confirmed us about LTO5 native suportability in Solaris11.

After reinstaled the I/O Domains with Solaris10, and replaced the same configuration as before (same as with Solaris11), we managed to put this working without any issues.

Thank you all for your help and feedback.

peter12
Level 4
Partner Accredited

Hi everyone:

I've got a similar problem, Solaris 11 with no local zones or LDOMS installed.

I can configure the robot and drives with multiple paths to the same master.

Backup stays at mounting, when i try robtest mounts the tape. I can see in the grapical interface of the library the tape is mounted.. but mt -f doesn´t return a thing, or if i'm lucky i can see the tape in one of the drives.

While this behavior points more to an incompatibility IBM LTO5 and Solaris 11, i'd like to know if someone else has run into this trouble, or patching the switch is the only solution  we've so far (what was te Firmware on the switch that worked?)

Best regards  

   

 

jim_dalton
Level 6

ME I would go through the whole setup again sgscan etc etc.

Ive seen this kind of thing before...the drive config was wrong and a tape in a drive was not the drive netbackup thought it was ie the robot view of drives drive didnt align with netbackups.

Way to unpick it: physically put a tape in each drive in turn , mt -f /dev/rmt/xxxcbn status and record what you see.

Try reading data from the tape. This will check your Solaris side devices are correct.Thats your baseline.

Then correlate with what happens mounting via robtest and.or netbackup. As I recall I had a  drive in the wrong place from netbackups viewpoint. This you can fix when doing stunit wizard: drag and drop the drive to its right location.

If all this is fine, you could well be looking at LTO5 specfics. 

Jim

peter12
Level 4
Partner Accredited

If i try to make in Solaris 11 a mt -f /dev/rmt/xxx get no tape loaded, even when i see is loaded.

To corroborate y zoned another server, this one solaris 10, to that tape, and responded (unit attention, that's ok).

If i try to use native solaris 11 driver is not working, if i install IBM tape driver i can use at OS level /dev/rmt/xst device,  but NBU detects it as unusable. 

Is there a way to tell NBU to work with the IBM driver for the LTO 5 tape drive ?

Best regards.