cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

Drives going in MIXED mode again and again

wannawin
Level 6

Hello Team.

 

Drives going in MIXED mode again and again.

[80092345@usgaub5500 bin]$ ./tpconfig -l
Device Robot Drive       Robot                    Drive                Device     Second
Type     Num Index  Type DrNum Status  Comment    Name                 Path       Device Path
robot      0    -    TLD    -       -  -          -                    /dev/sg2
  drive    -    0  hcart    6    DOWN  -          IBM.ULTRIUM-TD4.261  /dev/nst7
  drive    -    2  hcart    8      UP  -          IBM.ULTRIUM-TD4.263  /dev/nst6
  drive    -    3  hcart    2      UP  -          IBM.ULTRIUM-TD4.257  /dev/nst4
  drive    -    4  hcart    7      UP  -          IBM.ULTRIUM-TD4.262  /dev/nst3
  drive    -    5  hcart    1    DOWN  -          IBM.ULTRIUM-TD4.256  /dev/nst0
  drive    -    6  hcart    5    DOWN  -          IBM.ULTRIUM-TD4.260  /dev/nst2
  drive    -    7  hcart    3      UP  -          IBM.ULTRIUM-TD4.258  /dev/nst1
 

[80092345@usgaub5500 bin]$ ./vmoprcmd  -d

                                PENDING REQUESTS

                                     <NONE>

                                  DRIVE STATUS

Drv Type   Control  User      Label  RecMID  ExtMID  Ready   Wr.Enbl.  ReqId
  0 hcart  DOWN-TLD             -                     No       -         0
  2 hcart    TLD               Yes   HS3352  HS3352   Yes     Yes        0
  3 hcart    TLD               Yes   HS3233  HS3233   Yes     Yes        0
  4 hcart    TLD               Yes   HS3167  HS3167   Yes     Yes        0
  5 hcart  DOWN-TLD             -                     No       -         0
  6 hcart  DOWN-TLD             -                     No       -         0
  7 hcart    TLD               Yes   HS3248  HS3248   Yes     Yes        0

                             ADDITIONAL DRIVE STATUS

Drv DriveName            Shared    Assigned        Comment
  0 IBM.ULTRIUM-TD4.261   Yes      -
  2 IBM.ULTRIUM-TD4.263   Yes      usgaub5500
  3 IBM.ULTRIUM-TD4.257   Yes      usgaub5500
  4 IBM.ULTRIUM-TD4.262   Yes      usgaub5500
  5 IBM.ULTRIUM-TD4.256   Yes      -
  6 IBM.ULTRIUM-TD4.260   Yes      -
  7 IBM.ULTRIUM-TD4.258   Yes      usgaub5500
 

[80092345@usgaub5500 bin]$ ./tpautoconf -report_disc
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.261
 Drive Path = /dev/nst7
 Inquiry = "IBM     ULTRIUM-TD4     97F9"
 Serial Number = 1022003168
 TLD(0) definition Drive = 6
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.263
 Drive Path = /dev/nst6
 Inquiry = "IBM     ULTRIUM-TD4     BBH4"
 Serial Number = 1024003168
 TLD(0) definition Drive = 8
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.257
 Drive Path = /dev/nst4
 Inquiry = "IBM     ULTRIUM-TD4     97F9"
 Serial Number = 1012003168
 TLD(0) definition Drive = 2
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.262
 Drive Path = /dev/nst3
 Inquiry = "IBM     ULTRIUM-TD4     97F9"
 Serial Number = 1023003168
 TLD(0) definition Drive = 7
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.256
 Drive Path = /dev/nst0
 Inquiry = "IBM     ULTRIUM-TD4     97F9"
 Serial Number = 1011003168
 TLD(0) definition Drive = 1
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.260
 Drive Path = /dev/nst2
 Inquiry = "IBM     ULTRIUM-TD4     BBH4"
 Serial Number = 1021003168
 TLD(0) definition Drive = 5
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.258
 Drive Path = /dev/nst1
 Inquiry = "IBM     ULTRIUM-TD4     97F9"
 Serial Number = 1013003168
 TLD(0) definition Drive = 3
 Hosts configured for this device:
  Host = usgaub5500
=========== Missing Device or no local control path (Robot) ===========
 Defined as robotic TLD(0)
 Inquiry = "SPECTRA PYTHON          2000"
 Serial Number = 9110003168
 Robot Path = /dev/sg2
 Drive = 6, Drive Name = IBM.ULTRIUM-TD4.261, Serial Number = 1022003168
 Hosts configured for this device:
  Host = usgaub5500

 

Please Help

1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited

Hardware/ Media errors.

 

Apr  1 12:28:53 usgaub5500 bptm[22336]: TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive IBM.ULTRIUM-TD4.257 (index 3), Media Id HS3954
Apr  1 12:28:53 usgaub5500 bptm[22336]: TapeAlert Code: 0x06, Type: Critical, Flag: WRITE FAILURE, from drive IBM.ULTRIUM-TD4.257 (index 3), Media Id HS3954
Apr  1 12:28:53 usgaub5500 bptm[22336]: TapeAlert Code: 0x27, Type: Warning, Flag: DIAGNOSTICS REQ., from drive IBM.ULTRIUM-TD4.257 (index 3), Media Id HS3954

Nothing can be done in NBU to fix this, you have to talk with the hardware vendor.

 

Martin

View solution in original post

29 REPLIES 29

Dan4
Level 5
Certified
wannawin, Seems there are missing drive paths. You would need to delete those drives and re-configure those into NBU. That should work. Let us know. Best Regards, Dan

Marianne
Level 6
Partner    VIP    Accredited Certified

You need to find out what is wrong with server usgaub5500 at OS level.

NBU needs OS for device access - nothing can be done in NBU to fix device access problems.
Check physical connections between HBA and switch.

Check /var/log/messages for errors.

Vickie
Level 6

 

Cause of this issue might be one of the below,
 
1) Reservation conflicts.
2) Restart of a Media Server daemons or Services.
3) This would happen when some Media Servers sharing tape drives (i.e SSO) are restarted, while others are still up and running (sometimes even doing backups).
 
Proposed resolution :
 
A) Restart affected Media Server and Master Server.
 
OR
 
B) The best method to fix this would be to get a window, when NBU Daemons/Services can be Stopped           and Started on ALL the Media Servers and the Master Server.
Would Suggest these steps to 'clean up' the drive control modes.
 
1. Cancel All Jobs                                         (Master Server)

    "bpdbjobs -cancel_all"

2. Suspend jobs and reset allocations and close GUI.            (Master Server)

    "nbpemreq -suspend_scheduling"

    "nbtbutil -resetAll"

    Close all NBU GUI's everywhere.                        (yes Everywhere)

3 Shut down NBU services/daemons                 (Master and ALL Media Servers)

    "bp.kill_all"    or  "netbackup stop"

4. Terminate all NBU Processes if any are found lingering around.(Master and ALL Media Servers)

     "bpps -x"   or "bpps"

     "kill -9 <PID>" 

5.  Start NBU daemons/services on Master

    "bp.start_all"  or "netbackup start"

6. Start NBU daemons/services on Media Servers which have Robotic Control

    "bp.start_all"  or "netbackup start"

7. Start NBU daemons/services on remaining Media Servers

    "bp.start_all"  or "netbackup start"

8.  Open GUI's NOW. And see the drive status.

CRZ
Level 6
Employee Accredited Certified

Hey wannawin, does your company have a support contract with us? 

You may need to start opening cases instead of Connect threads.

wannawin
Level 6

Hello CRZ.

I am very SAD to say that my company does not have direct support contract with symantec, that is why i am facing this much problem, actually i joined this company 20 days back and my project manager told me that to resolve eavch and every issue ASAP but i am the only guy there who work on troubleshooting and facing daily a new issue.....   :(  and i want to clean that all...

 

Hello Marianne..

How i proceed and check from OS level... there are total 7 drives and 3 are down, backups daily failing..

 

Hello Netbackup_user.

Today i will do the steps what you mentioned above...will revert after completion..

Marianne
Level 6
Partner    VIP    Accredited Certified

I have already told you where to start:

 

Check physical connections between HBA and switch.

Check /var/log/messages for errors.

Further troubleshooting/actions depend on errors seen in this OS syslog file.

 

Will_Restore
Level 6
20 problems in 20 days, hope you ask for 20% bonus! :)

wannawin
Level 6

Hello Marianne/ALL

I have done all the things like (Master server,media server and reset all) but again drives going in MIXED mode.

 

Below are the /var/log/messages..

[80092345@usgaub5500 log]$ tail -500 messages | grep DOW
Apr  2 06:37:13 usgaub5500 ltid[9591]: Request for media ID HS3985 is being rejected because mount requests are disabled (reason = robotic daemon going to DOWN state)
Apr  2 10:16:23 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.261 (device 0)
Apr  2 10:28:11 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.262 (device 4)
Apr  2 10:35:11 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.260 (device 6)
Apr  2 10:38:48 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.262 (device 4)
Apr  2 10:40:05 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.256 (device 5)
 

Please update

 

 

Mark_Solutions
Level 6
Partner Accredited Certified
I see your tapes drives are all IBM LTO4 but have different firmware releases (97F9 and BBH4) Have you had any work done on the library recently? Firmware upgrades could cause the drives scsi enquiry string to change and so would need re-adding to the system Start at the O/S to make sure it is seeing all drives and then reconfigure within NetBackup If possible get all your drives on the same firmware release (preferably the latest)

Dan4
Level 5
Certified
Wannawin, As per Marianne's post, Check physical connections between HBA and switch. Possible zoning issue of drives from Switch level. Please get your Storage team look for the zoned drives and there status. Should be something wrong going over there. + Dan

wannawin
Level 6

Hello Mark.

All drives are visible at OS level and done with drive reconfigur via drive configuration wizard. yes recently they upgrade the drive firmware...

 

Found something more

mv d3 s72

Initiating MOVE_MEDIUM from address 258 to 4167

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d3 s73

Initiating MOVE_MEDIUM from address 258 to 4168

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d3 s73

Initiating MOVE_MEDIUM from address 258 to 4168

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d3 s74

Initiating MOVE_MEDIUM from address 258 to 4169

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d4 s78

Initiating MOVE_MEDIUM from address 259 to 4173

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d4 s78

Initiating MOVE_MEDIUM from address 259 to 4173

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

 

mv d6 s77

Initiating MOVE_MEDIUM from address 261 to 4172

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d6 s76

Initiating MOVE_MEDIUM from address 261 to 4171

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d7 s75

Initiating MOVE_MEDIUM from address 262 to 4170

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

 And from /var/log/messages found that

Apr  2 06:37:13 usgaub5500 tldd[10292]: TLD(0) going to DOWN state, status: Robotic arm has no addressable holder

Apr  2 06:37:13 usgaub5500 ltid[9591]: Request for media ID HS3985 is being rejected because mount requests are disabled (reason = robotic daemon going to DOWN state)

What does it mean "robotic daemon going to DOWN state"

 

Apr  2 10:16:23 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.261 (device 0)

Robotic control host is Master server

Marianne
Level 6
Partner    VIP    Accredited Certified

Please copy the entire messages file to messages.txt and post as File attachment.

To just grep for DOWN does not help and does not tell us ANYTHING about what went wrong to cause the DOWN state.

About robtest - did you 'unload' drives before trying to move tapes back to slot?

e.g. 

unload d3      (wait for drive to unload tape - you will receive a message...  then move to slot)

mv d3 s72

 

As per Mark's excellent post - please ensure that all tape drives are one the same firmware level. At the moment they are not.

The following says to me that you need to log a call with your hardware vendor:

 Robotic arm has no addressable holder

Mark_Solutions
Level 6
Partner Accredited Certified
If the O/S sees everything but you have had firmware upgrades you may need to delete everything from NetBackup (drives and then robotics) and then re-add them back in using the device wizard - but I would get all drive and robotic firmware up to date first and also check the library interface itself to make sure everything is OK Perhaps the firmware upgrade has affected the partition of the library and it needs re-setting via the web GUI for the tape library as it sounds like the robotics are disjoined fom the tape drives Sort out the library and its firmware and then delete and re-add everything in NetBackup

wannawin
Level 6

Hello Mark.

I will logged a case with vendor and will upgrade the firmware to the latest one . Re-setting of library means hard reboot of library or something different?

Mark_Solutions
Level 6
Partner Accredited Certified
By re-setting I meant that you may need to re-create the partition within the library (if it uses partitions etc.) Just check that web GUI is showing the robot, drives, magazines and load ports all in the same partition and not leaving any parts orphaned - also that the robotic path is correct as many use robotic pass through via a drive If the library vendor is coming out they can probably assist with all of this

wannawin
Level 6

Hello Marianne/All.

Please find /var/log/messages..

mph999
Level 6
Employee Accredited

Hardware/ Media errors.

 

Apr  1 12:28:53 usgaub5500 bptm[22336]: TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive IBM.ULTRIUM-TD4.257 (index 3), Media Id HS3954
Apr  1 12:28:53 usgaub5500 bptm[22336]: TapeAlert Code: 0x06, Type: Critical, Flag: WRITE FAILURE, from drive IBM.ULTRIUM-TD4.257 (index 3), Media Id HS3954
Apr  1 12:28:53 usgaub5500 bptm[22336]: TapeAlert Code: 0x27, Type: Warning, Flag: DIAGNOSTICS REQ., from drive IBM.ULTRIUM-TD4.257 (index 3), Media Id HS3954

Nothing can be done in NBU to fix this, you have to talk with the hardware vendor.

 

Martin

Marianne
Level 6
Partner    VIP    Accredited Certified

Seems 'someone' has been manually moving tapes around in the robot?

 

Apr  2 10:15:33 usgaub5500 tldcd[27744]: TLD(0) cannot dismount drive 6, slot 14 already is full

Apr  2 10:15:39 usgaub5500 tldcd[26783]: TLD(0) cannot dismount drive 5, slot 11 already is full

Apr  2 10:28:08 usgaub5500 tldcd[31000]: TLD(0) cannot dismount drive 7, slot 1 already is full

Apr  2 10:28:33 usgaub5500 kernel: st 7:0:3:0: reservation conflict

Apr  2 10:28:33 usgaub5500 ltid[31227]: Operator requested SCSI Release of Drive IBM.ULTRIUM-TD4.261 was successful

Apr  2 10:28:48 usgaub5500 tldcd[31147]: TLD(0) expected barcode (HS2790          ) in slot 7, found barcode (HS3985          )

Apr  2 10:48:39 usgaub5500 tldcd[877]: TLD(0) expected barcode (HS4057          ) in slot 1, found barcode (HS2790          )
 
Please suspend all backups, remove tapes from slot 1, 7, 11 and 14.
 
It may be possible to use robtest to move these tapes to the cap. 
 
Use robtest to 'unload' drive 5, 6 and 7, then move those tapes to their correct slots.
 
When you are sure that there are no tapes in any of the drives, power cycle the robot.
This will force an inventory of the robot when it starts up.
 
Wait for tld(0) to go to UP state in NBU, then do inventory with NBU. Select 'empty media access port' before you select 'Start'.
 
Let us know how it goes...
 
 

wannawin
Level 6

Hello Marianne.

 

Trying to unload drives but it gives error..

unload d1
Opening /dev/nst0, on the local host, please wait...
Error - cannot open /dev/nst0 (Input/output error)
unload d5
Opening /dev/nst2, on the local host, please wait...
Error - cannot open /dev/nst2 (Input/output error)
unload 7
Opening /dev/nst3, on the local host, please wait...
Error - cannot open /dev/nst3 (Input/output error)

Please suggest..