12-16-2018 03:26 AM - edited 12-16-2018 05:52 AM
After one fault TD was replaced successfully via tpautoconf on the Media Server, why its -report_disc still output the preivous mismatch record?
Notes: On the corresponding Master Server side, the tpautoconf -report_disc output with empty, and it can also work fine on the Media Server!
#
#
# tpautoconf -report_disc
======================= Missing Device (Drive) =======================
Drive Name = HP.ULTRIUM5-SCSI.002
Drive Path = /dev/rmt6.1
Inquiry = "HP Ultrium 5-SCSI I6RZ"
Serial Number = HU1248TF0U
TLD(3) definition Drive = 10
Hosts configured for this device:
Host = JCERPDB1
======================= New Device (Drive) =======================
Inquiry = "HP Ultrium 5-SCSI I5GZ"
Serial Number = HU1213MTPL
Drive Path = /dev/rmt6.1
#
#
#
# tpconfig -l|tail -28
drive - 11 hcart3 6 UP - IBM.ULT3580-TD3.001 /dev/rmt19.1
drive - 12 hcart3 3 UP - IBM.ULT3580-TD3.006 /dev/rmt2.1
drive - 13 hcart3 8 UP - IBM.ULT3580-TD3.000 /dev/rmt20.1
drive - 14 hcart3 5 UP - IBM.ULT3580-TD3.005 /dev/rmt3.1
drive - 15 hcart3 7 UP - IBM.ULT3580-TD3.004 /dev/rmt4.1
robot 2 - TLD - - - - jchxbak
drive - 21 hcart3 1 UP - HP.ULTRIUM6-SCSI.001 /dev/rmt21.1
drive - 22 hcart3 2 UP - HP.ULTRIUM6-SCSI.007 /dev/rmt22.1
drive - 23 hcart3 3 UP - HP.ULTRIUM6-SCSI.003 /dev/rmt23.1
drive - 24 hcart3 4 UP - HP.ULTRIUM6-SCSI.005 /dev/rmt24.1
drive - 25 hcart3 5 UP - HP.ULTRIUM6-SCSI.002 /dev/rmt25.1
drive - 26 hcart3 6 UP - HP.ULTRIUM6-SCSI.004 /dev/rmt26.1
drive - 27 hcart3 7 UP - HP.ULTRIUM6-SCSI.000 /dev/rmt27.1
drive - 28 hcart3 8 UP - HP.ULTRIUM6-SCSI.006 /dev/rmt28.1
robot 3 - TLD - - - - jchxbak
drive - 2 hcart2 2 UP - HP.ULTRIUM5-SCSI.009 /dev/rmt10.1
drive - 3 hcart2 3 UP - HP.ULTRIUM5-SCSI.010 /dev/rmt11.1
drive - 4 hcart2 4 UP - HP.ULTRIUM5-SCSI.011 /dev/rmt12.1
drive - 5 hcart2 5 UP - HP.ULTRIUM5-SCSI.006 /dev/rmt13.1
drive - 6 hcart2 6 UP - HP.ULTRIUM5-SCSI.007 /dev/rmt14.1
drive - 7 hcart2 7 UP - HP.ULTRIUM5-SCSI.005 /dev/rmt15.1
drive - 8 hcart2 8 UP - HP.ULTRIUM5-SCSI.004 /dev/rmt16.1
drive - 16 hcart2 9 UP - HP.ULTRIUM5-SCSI.003 /dev/rmt5.1
drive - 17 hcart2 10 UP - HP.ULTRIUM5-SCSI.002 /dev/rmt6.1
drive - 18 hcart2 11 UP - HP.ULTRIUM5-SCSI.001 /dev/rmt7.1
drive - 19 hcart2 12 UP - HP.ULTRIUM5-SCSI.000 /dev/rmt8.1
drive - 20 hcart2 1 UP - HP.ULTRIUM5-SCSI.008 /dev/rmt9.1
drive - 0 pcd - DISABL - IBM.DDSGEN6.000 /dev/rmt0.1
#
#
#
# tpautoconf -replace_drive HP.ULTRIUM5-SCSI.002 -path /dev/rmt6.1
Found a matching device in global DB, HP.ULTRIUM5-SCSI.002 on host JCERPDB1
#
#
#
# tpautoconf -report_disc|grep -Ei "124|1213"
Serial Number = HU1248TF0U
Serial Number = HU1213MTPL
#
#
#
# stopltid
#
#
#
# ltid -v
#
#
#
# tpautoconf -report_disc|grep -Ei "124|1213"
Serial Number = HU1248TF0U
Serial Number = HU1213MTPL
#
#
#
# netbackup stop
stopping the NetBackup Service Monitor
stopping the NetBackup Service Layer
stopping the NetBackup Remote Monitoring Management System
stopping the NetBackup compatibility daemon
stopping the Media Manager device daemon
stopping the Media Manager volume daemon
stopping the NetBackup client daemon
stopping the NetBackup network daemon
#
#
#
# bpps -a
NB Processes
------------
MM Processes
------------
#
#
#
# netbackup start
NetBackup network daemon started.
NetBackup client daemon started.
NetBackup SAN Client Fibre Transport daemon started.
NetBackup Database Server started.
NetBackup Event Manager started.
NetBackup Audit Manager started.
NetBackup Enterprise Media Manager started.
NetBackup Resource Broker started.
Media Manager daemons started.
NetBackup request daemon started.
NetBackup compatibility daemon started.
NetBackup Job Manager started.
NetBackup Policy Execution Manager started.
NetBackup Storage Lifecycle Manager started.
NetBackup Remote Monitoring Management System started.
NetBackup Key Management daemon started.
NetBackup Service Layer started.
NetBackup Agent Request Server started.
NetBackup Bare Metal Restore daemon not started.
NetBackup Vault daemon started.
NetBackup Service Monitor started.
NetBackup Bare Metal Restore Boot Server daemon started.
#
#
#
# bpps -a
NB Processes
------------
root 17301572 1 0 15:22:04 - 0:00 /usr/openv/netbackup/bin/vnetd -standalone
root 4194620 1 0 15:22:08 - 0:00 /usr/openv/netbackup/bin/nbsvcmon
root 4784508 1 0 15:22:06 - 0:00 /usr/openv/netbackup/bin/nbrmms
root 7537248 1 0 15:22:04 - 0:00 /usr/openv/netbackup/bin/bpcd -standalone
root 4522834 1 0 15:22:08 - 0:00 /usr/openv/netbackup/bin/bmrbd
root 5178226 1 0 15:22:06 - 0:00 /usr/openv/netbackup/bin/bpcompatd
root 7930840 1 0 15:22:07 - 0:00 /usr/openv/netbackup/bin/nbsl
MM Processes
------------
root 21692480 3212062 0 15:22:09 - 0:00 tldd -v
root 6226440 1 0 15:22:06 - 0:00 vmd -v
root 9568958 3212062 0 15:22:11 - 0:00 avrd -v
root 3212062 1 0 15:22:06 - 0:00 /usr/openv/volmgr/bin/ltid
#
#
#
# tpautoconf -report_disc|grep -Ei "124|1213"
Serial Number = HU1248TF0U
Serial Number = HU1213MTPL
#
#
#
Solved! Go to Solution.
12-31-2018 02:11 PM
I do not know the answer to that - I suspect if you dug through NBDB you would perhaps find something amiss with regard to wuich machines are mapped to the drive.
I have never seen this not work before, as I mentioned. It's a simple concept that has been working for years, and from what I saw, it appears that the new drive may have been added before the replace_drive was run and it has got itself all upset.
The only way to fix this, well two ways ...
Delete the missing drive and possibley the new drive from the config and readd - this should clear the missing device from the output.
Manual SQL commands to remove it from NBDB - but this is very last resort and would only be used if the method above fails.
I don't think there is much else I can add to this, because ultimately the fix will be as I mention, and I am confident that in the future if a drive is swapped - running only tpautoconf -replace_drive and not adding the drive, will be successful.
12-16-2018 10:52 PM
Have you tried restarting the ltid service or the Media server? Often times this solves the problem.
12-16-2018 11:45 PM
From report_disc
======================= Missing Device (Drive) =======================
Drive Name = HP.ULTRIUM5-SCSI.002
Drive Path = /dev/rmt6.1
Inquiry = "HP Ultrium 5-SCSI I6RZ"
Serial Number = HU1248TF0U
TLD(3) definition Drive = 10
Hosts configured for this device:
Host = JCERPDB1
======================= New Device (Drive) =======================
Inquiry = "HP Ultrium 5-SCSI I5GZ"
Serial Number = HU1213MTPL
Drive Path = /dev/rmt6.1
If the new drive shown is the replacement for the missing drive (I presume it is, but this may not be the case) ... then run ...
tpautoconf -replace_drive HP.ULTRIUM5-SCSI.002 /dev/rmt6.1
The issue happenes because NBU does not automatically detect swapped drives, it needs to be told that the <drivename> has been replaces by the drive at <new path> - in this case, the path is the same.
I think you need to restart ltid afterwars, stopltid, then, ltid -v
12-17-2018 06:35 AM
From my above post contents, you can see that I had already done so(plus restarting the whole NBU services)!
But still the same mismatch result!
12-18-2018 12:59 AM
Always best to delete the drive that was removed, restart ltid, then run device config, followed by restart of ltid.
If the drive is shared, you need to delete the drive on all media servers. Best to do this on the master server.
This made me wonder if the drive is shared:
tpautoconf -replace_drive HP.ULTRIUM5-SCSI.002 -path /dev/rmt6.1
Found a matching device in global DB, HP.ULTRIUM5-SCSI.002 on host JCERPDB1
12-18-2018 03:28 AM - edited 12-28-2018 11:58 PM
That is also my doubt!
Because it is indeed a SSO TD, but why there would be only its own one host entry in my output of tpautoconf?
https://www.veritas.com/support/en_US/article.000027601
http://symcnbu.blogspot.com/2010/04/updating-replaced-tape-drive-in-nbu.html
12-18-2018 12:47 PM
The tpautoconf -replace_drive is supposed to swap the new path on the fly, I have the same issue with my SSO drives sometimes.
I have to stop netbackup - it is vitally important the drive is clear and no reservations remain - especially in shared drives.
I use "nbrbutil -dump" and grep for the drive name to make sure there is nothing internal to NetBackup manipulating the drive - reservations and unload commands can remain hidden there!
Once all is clear, you can run the replace drive command - I like to also do the tpautoconf -a and recycle netbackup once the drive is online.
I have replaced the drive using just the replace_drive command, and verified the serial number in NetBackup is changed, and 5 - 10 minutes later it reverts to the old one! This causes drive issues due to the serialization, so unmount commands are executed and NetBackup thinks it has emptied the drive but the path is bad, so the tape stays in the drive - tapes get frozen because they fail to load.
12-18-2018 12:53 PM
Now, since this non-intrusive command is not consistant, I am unable to use it as a non-intrusive command.
When ever I have to swap a drive and rescan it, I am forced to do a complete NetBackup shutdown.
The good news is that if you totally stop netbackup and rescan the drives and restart the media servers, the drive rescan usually works.
The bad news - it can look like it is replaced, then revert back - usually in about 10 minutes.
12-18-2018 12:56 PM
12-18-2018 01:01 PM
liuyl - I have the same issue. I did not have the time or inclination to solve the issue for Netbackup, so I found a workaround.
My support thought it was caused by the SSO - somebody has cached information about the drive and overwrites your drive replacement command.
12-19-2018 05:34 AM
From the link in a prior post, I noticed a few KEY points
down the drive - this ensures nothing is using it - I would add the steps using "nbrbutil -dump | grep -i drive" from my post.
Run the tpautoconf -replace_drive and tpautoconf -a from the robot control host! I thnk this is where I may have gone wrong!
1 Down the drive. In the Device Monitor, select the drive to swap or update. From the Actions menu, select Down Drive.
2 Replace the drive or physically update the firmware for the drive. If you replace the drive, specify the same SCSI ID for the new drive as the old drive.
3 To produce a list of new and missing hardware, run tpautoconf -report_disc on one of the reconfigured servers. This command scans for new hardware and produce a report that shows the new and the replaced hardware.
4 Ensure that all servers that share the new hardware are up and that all Netbackup services are active.
5 Run tpautoconf with the -replace_drivedrive_name -path path_name options or -replace_robotrobot_number -pathrobot_path options. The tpautoconfcommand reads the serial number from the new hardware device and then updates the EMM database.
6 If the new device is an unserialized drive, run the device configuration wizard on all servers that share the drive. If the new device is a robot, run the device configuration wizard on the server that is the robot control host.
7 Up the drive. In the Device Monitor, select the new drive. From the Actions menu, select Up Drive.
12-19-2018 05:38 AM
12-20-2018 06:42 AM
In fact, it is too hard to avoid such problem, even though we have applied the above conditions and steps!
It seems that the replace_drive option could just only find its own local mismatch S/N records, and also it do not update with the new S/N record at all!
12-20-2018 10:45 PM
You have 2 choices:
We won't be able to solve your issue in this forum.
12-23-2018 11:50 PM - edited 12-23-2018 11:59 PM
If you reconfigure the drives via the wizard, we won't be able to troubleshoot this, as any evience will have disappeared.
As a very minium, we would need:
Add -zr SQL in /usr/openv/var/global/server.conf
Add VERBOSE to /usr/openv/volmgr/vm.conf
Create dir /usr/openv/volmgr/debug/tpcommand
Restart services
nbdb_unload output (/usr/openv/db/bin/nbdb_unload /tmp/output.before)
Recreate issue
nbdb_unload output command again (/usr/openv/db/bin/nbdb_unload /tmp/output.after)
tpcommand log
server.log (from /usr/openv/db/log)
12-25-2018 07:06 AM - edited 12-26-2018 06:33 PM
OK!
But I am a bit afriad that the nbdb_unload would result in some unexpected worse situations!
https://vox.veritas.com/t5/NetBackup/Help-needed-nbdb-unload/td-p/843803
Notes: it seems that my worry is superfluous about it, so I will do that soon !
Now I have done and uploaded all the logs you need!
12-26-2018 06:48 PM
From my tpcommand.log, I can see that the replace_drive with the new TD S/N did failed!
09:32:07.182 [25886860] <16> update_drive: (0) UpdateDrive failed, emmError = 2009005, nbError = 0
09:32:07.182 [25886860] <16> MMreplace_hw: (-) Translating EMM_ERROR_DriveSerialNumberAlreadyExists(2009005) to 91 in the Device Config context
12-26-2018 11:52 PM
This is the missing device in the NBDB table that 'defines' devices:
'2000423',0x16FFD58A4E1211E68000FE9DA945725D,'2','10','0','128','1','NetBackup HCART2','NetBackup HCART','523118080','16176','6','0','HP.ULTRIUM5-SCSI.002','','2000420','3','8','','10','HP','Ultrium 5-SCSI','I6RZ','','','','HU1248TF0U','','','HP Ultrium 5-SCSI I6RZ','','0','','1000015','1000014','1970-01-01 00:00:00.000000','1970-01-01 00:00:00.000000','2018-12-23 18:12:07.000000','0','3663947','0','1',0x00000000,0x00000000000000000000000000000000,'-1','-1','1970-01-01 00:00:00.000000','0','0','-1','-1','-1','-1','','','','82','0','0','8388608','2016-07-19 08:37:08.362446','2018-12-27 01:02:50.048002'
This is the new one ...
'2000792',0x4148AA1CE45411E880009852720722F7,'2','10','0','128','1','NetBackup HCART2','NetBackup HCART','523118080','16176','6','0','HP.ULTRIUM5-SCSI.012','','2000420','3','8','','10','HP','Ultrium 5-SCSI','I5GZ','','','','HU1213MTPL','','','HP Ultrium 5-SCSI I5GZ','','0','','1000003','1000002','1970-01-01 00:00:00.000000','1970-01-01 00:00:00.000000','2018-12-27 00:38:52.000000','0','207705','0','0',0x00000000,0x00000000000000000000000000000000,'-1','-1','1970-01-01 00:00:00.000000','0','0','-1','-1','-1','-1','','','','82','0','0','8388608','2018-11-09 03:18:35.846744','2018-12-27 00:45:23.609944'
So, quite simply, it seems the new drive was added via the wizard or manually before the tpautoconf -replace_drive command was run.
If you manually delete the drive with name HP.ULTRIUM5-SCSI.002 hopefully it will resolve the issue.
12-27-2018 12:12 AM
1) How to explain such phenomenon that the replace_drive cannot take effect once the new TD S/N was added via DW or tpautoconf -a?
2) Are all the SSO TD S/N registered with every Media Server, that is, why my tpautoconf can only find their own mismatch records?
Notes, that means I must run tpautoconf on all the corresponding SSO Media Servers.
12-27-2018 02:20 AM
1. As per the log message you found, it can't take effect because it already exists.
2. The drive is only referenced once in the device table, it is given a unique device key ( a number). There is another table that references the 'device key' of the drive to each media server it is associated with.
In theory therefore, you should only need to run tpautoconf on one server ...