Forum Discussion

liuyl's avatar
liuyl
Level 6
6 years ago

The same tpautoconf -report_disc after the tpautoconf -replace_drive successfully

After one fault TD was replaced successfully via tpautoconf on the Media Server,  why its -report_disc still output the preivous mismatch record?
Notes: On the corresponding Master Server side,  the tpautoconf -report_disc output with empty,  and it can also work fine on the Media Server!

#
#
# tpautoconf -report_disc
======================= Missing Device (Drive) =======================
Drive Name = HP.ULTRIUM5-SCSI.002
Drive Path = /dev/rmt6.1
Inquiry = "HP Ultrium 5-SCSI I6RZ"
Serial Number = HU1248TF0U
TLD(3) definition Drive = 10
Hosts configured for this device:
Host = JCERPDB1
======================= New Device (Drive) =======================
Inquiry = "HP Ultrium 5-SCSI I5GZ"
Serial Number = HU1213MTPL
Drive Path = /dev/rmt6.1
#
#
#
# tpconfig -l|tail -28
drive - 11 hcart3 6 UP - IBM.ULT3580-TD3.001 /dev/rmt19.1
drive - 12 hcart3 3 UP - IBM.ULT3580-TD3.006 /dev/rmt2.1
drive - 13 hcart3 8 UP - IBM.ULT3580-TD3.000 /dev/rmt20.1
drive - 14 hcart3 5 UP - IBM.ULT3580-TD3.005 /dev/rmt3.1
drive - 15 hcart3 7 UP - IBM.ULT3580-TD3.004 /dev/rmt4.1
robot 2 - TLD - - - - jchxbak
drive - 21 hcart3 1 UP - HP.ULTRIUM6-SCSI.001 /dev/rmt21.1
drive - 22 hcart3 2 UP - HP.ULTRIUM6-SCSI.007 /dev/rmt22.1
drive - 23 hcart3 3 UP - HP.ULTRIUM6-SCSI.003 /dev/rmt23.1
drive - 24 hcart3 4 UP - HP.ULTRIUM6-SCSI.005 /dev/rmt24.1
drive - 25 hcart3 5 UP - HP.ULTRIUM6-SCSI.002 /dev/rmt25.1
drive - 26 hcart3 6 UP - HP.ULTRIUM6-SCSI.004 /dev/rmt26.1
drive - 27 hcart3 7 UP - HP.ULTRIUM6-SCSI.000 /dev/rmt27.1
drive - 28 hcart3 8 UP - HP.ULTRIUM6-SCSI.006 /dev/rmt28.1
robot 3 - TLD - - - - jchxbak
drive - 2 hcart2 2 UP - HP.ULTRIUM5-SCSI.009 /dev/rmt10.1
drive - 3 hcart2 3 UP - HP.ULTRIUM5-SCSI.010 /dev/rmt11.1
drive - 4 hcart2 4 UP - HP.ULTRIUM5-SCSI.011 /dev/rmt12.1
drive - 5 hcart2 5 UP - HP.ULTRIUM5-SCSI.006 /dev/rmt13.1
drive - 6 hcart2 6 UP - HP.ULTRIUM5-SCSI.007 /dev/rmt14.1
drive - 7 hcart2 7 UP - HP.ULTRIUM5-SCSI.005 /dev/rmt15.1
drive - 8 hcart2 8 UP - HP.ULTRIUM5-SCSI.004 /dev/rmt16.1
drive - 16 hcart2 9 UP - HP.ULTRIUM5-SCSI.003 /dev/rmt5.1
drive - 17 hcart2 10 UP - HP.ULTRIUM5-SCSI.002 /dev/rmt6.1
drive - 18 hcart2 11 UP - HP.ULTRIUM5-SCSI.001 /dev/rmt7.1
drive - 19 hcart2 12 UP - HP.ULTRIUM5-SCSI.000 /dev/rmt8.1
drive - 20 hcart2 1 UP - HP.ULTRIUM5-SCSI.008 /dev/rmt9.1
drive - 0 pcd - DISABL - IBM.DDSGEN6.000 /dev/rmt0.1
#
#
#
# tpautoconf -replace_drive HP.ULTRIUM5-SCSI.002 -path /dev/rmt6.1
Found a matching device in global DB, HP.ULTRIUM5-SCSI.002 on host JCERPDB1
#
#
#
# tpautoconf -report_disc|grep -Ei "124|1213"
Serial Number = HU1248TF0U
Serial Number = HU1213MTPL
#
#
#
# stopltid
#
#
#
# ltid -v
#
#
#
# tpautoconf -report_disc|grep -Ei "124|1213"
Serial Number = HU1248TF0U
Serial Number = HU1213MTPL
#
#
#
# netbackup stop
stopping the NetBackup Service Monitor
stopping the NetBackup Service Layer
stopping the NetBackup Remote Monitoring Management System
stopping the NetBackup compatibility daemon
stopping the Media Manager device daemon
stopping the Media Manager volume daemon
stopping the NetBackup client daemon
stopping the NetBackup network daemon
#
#
#
# bpps -a
NB Processes
------------


MM Processes
------------
#
#
#
# netbackup start
NetBackup network daemon started.
NetBackup client daemon started.
NetBackup SAN Client Fibre Transport daemon started.
NetBackup Database Server started.
NetBackup Event Manager started.
NetBackup Audit Manager started.
NetBackup Enterprise Media Manager started.
NetBackup Resource Broker started.
Media Manager daemons started.
NetBackup request daemon started.
NetBackup compatibility daemon started.
NetBackup Job Manager started.
NetBackup Policy Execution Manager started.
NetBackup Storage Lifecycle Manager started.
NetBackup Remote Monitoring Management System started.
NetBackup Key Management daemon started.
NetBackup Service Layer started.
NetBackup Agent Request Server started.
NetBackup Bare Metal Restore daemon not started.
NetBackup Vault daemon started.
NetBackup Service Monitor started.
NetBackup Bare Metal Restore Boot Server daemon started.
#
#
#
# bpps -a
NB Processes
------------
root 17301572 1 0 15:22:04 - 0:00 /usr/openv/netbackup/bin/vnetd -standalone
root 4194620 1 0 15:22:08 - 0:00 /usr/openv/netbackup/bin/nbsvcmon
root 4784508 1 0 15:22:06 - 0:00 /usr/openv/netbackup/bin/nbrmms
root 7537248 1 0 15:22:04 - 0:00 /usr/openv/netbackup/bin/bpcd -standalone
root 4522834 1 0 15:22:08 - 0:00 /usr/openv/netbackup/bin/bmrbd
root 5178226 1 0 15:22:06 - 0:00 /usr/openv/netbackup/bin/bpcompatd
root 7930840 1 0 15:22:07 - 0:00 /usr/openv/netbackup/bin/nbsl


MM Processes
------------
root 21692480 3212062 0 15:22:09 - 0:00 tldd -v
root 6226440 1 0 15:22:06 - 0:00 vmd -v
root 9568958 3212062 0 15:22:11 - 0:00 avrd -v
root 3212062 1 0 15:22:06 - 0:00 /usr/openv/volmgr/bin/ltid
#
#
#
# tpautoconf -report_disc|grep -Ei "124|1213"
Serial Number = HU1248TF0U
Serial Number = HU1213MTPL
#
#
#

  • I do not know the answer to that - I suspect if you dug through NBDB you would perhaps find something amiss with regard to wuich machines are mapped to the drive.

    I have never seen this not work before, as I mentioned.  It's a simple concept that has been working for years, and from what I saw, it appears that the new drive may have been added before the replace_drive was run and it has got itself all upset.

    The only way to fix this, well two ways ...

    Delete the missing drive and possibley the new drive from the config and readd - this should clear the missing device from the output.

    Manual SQL commands to remove it from NBDB - but this is very last resort and would only be used if the method above fails.

    I don't think there is much else I can add to this, because ultimately the fix will be as I mention, and I am confident that in the future if a drive is swapped - running only tpautoconf -replace_drive and not adding the drive, will be successful.

  • Have you tried restarting the ltid service or the Media server? Often times this solves the problem.

  •  

    From report_disc

    ======================= Missing Device (Drive) =======================
    Drive Name = HP.ULTRIUM5-SCSI.002
    Drive Path = /dev/rmt6.1
    Inquiry = "HP Ultrium 5-SCSI I6RZ"
    Serial Number = HU1248TF0U
    TLD(3) definition Drive = 10
    Hosts configured for this device:
    Host = JCERPDB1
    ======================= New Device (Drive) =======================
    Inquiry = "HP Ultrium 5-SCSI I5GZ"
    Serial Number = HU1213MTPL
    Drive Path = /dev/rmt6.1


    If the new drive shown is the replacement for the missing drive (I presume it is, but this may not be the case) ... then run ...

    tpautoconf -replace_drive HP.ULTRIUM5-SCSI.002 /dev/rmt6.1

    The issue happenes because NBU does not automatically detect swapped drives, it needs to be told that the <drivename> has been replaces by the drive at <new path> - in this case, the path is the same.

    I think you need to restart ltid afterwars, stopltid, then, ltid -v

    • Genericus's avatar
      Genericus
      Moderator

      mph999 - the process you descibe is exactly corrrect - it is what is supposed to happen.

      However - it does not always work

    • liuyl's avatar
      liuyl
      Level 6

      From my above post contents,  you can see that I had already done so(plus restarting the whole NBU services)!
      But still the same mismatch result!

      • Marianne's avatar
        Marianne
        Level 6

        Always best to delete the drive that was removed, restart ltid,  then run device config, followed by restart of ltid. 

        If the drive is shared, you need to delete the drive on all media servers. Best to do this on the master server. 

        This made me wonder if the drive is shared: 

        tpautoconf -replace_drive HP.ULTRIUM5-SCSI.002 -path /dev/rmt6.1
        Found a matching device in global DB, HP.ULTRIUM5-SCSI.002 on host JCERPDB1