Forum Discussion

stan56's avatar
stan56
Level 4
16 years ago
Solved

drives serial numbers mismatch

I'm using NetBackup 5.1 with SSO option (planning to upgrade to 6.5 very soon). Solaris Master & Linux media servers, FC tape library.

Recently I've been getting a bizarre issue with a couple of tape drives not being usable on the master, while they are working fine on the media servers. I've been sort of ignoring them since the master server is not running any tape backups anyway, but now I have another drive that all of a sudden doesn't work on media servers, either. I poked a round a bit and found some pretty bad mismatches in drive serial numbers between what "scan" is seeing and what "tpautoconf -t" reports. Oddly enough, one of the drives in tpautoconf is listed twice. In "scan" output, I have 2 drives that are listed twice (same rmt number, but different sg names). Now I'm starting to wonder if I should probably rebuild the sg devices along with the globDB. Am I thinking of the right solution?

Also, just to make sure I understand how "scan" and "tpautoconf" work. If they report drastically different results, which one should I assume is correct? I'm thinking "scan" would be the one to trust, but I'm not sure. The reason being, the rmt device links reported by "scan" for the 3 drives that are not usable on the master server still don't work (I get an error issuing mt command to them), while I know the same drives do work on the media servers.
  • Thanks for all the help, I finally fixed the problem with the serials mismatch by using the procedure for drive replacement which was suggested by several people in the thread. I also found an updated version of the procedure, it's much easier to follow than the original which was rather confusing: http://seer.entsupport.symantec.com/docs/271366.htm

    Now I'm wondering how those serials got changed. The drives may have been replaced, every now and then we get a get a broken drive and IBM comes out to replace it. Operations take care of it and they don't always inform us when it happens since there's no downtime. I was always under the impression that the serial numbers don't change, just like WWN's dont' change. But I may be wrong. We did have a few situations where we would get some drives that would suddenly become unusable and the way to fix it was basically to rebuild the globDB. Perhaps this was caused by the serial number change. Next time I run into this situation I'll be sure to check for serials mismatch first.
  • -> On the problem media servers, rebuild the OS device tree (To make sure that the device files are pointing to correct devices)
    -> Run scan command to make sure that the output looks correct (no duplicate devices, no missing devices,etc)
    -> Run tpautoconf -report_desc (to report the descripencies)
    -> Delete all the drive paths that are seen in the output of the above command and reconfigure them.
    -> If there is any sort of change on the hardware, update to the latest device mappings file


    Reasons:

    Drive paths (in terms of OS) changes usually after a reboot or a firmware upgrade.
    This tends to make a descripency in netbackup configuration, because it stores the drive paths in the golbDB
  • Anonymous's avatar
    Anonymous
    Here are in depth docs to reference relating to drive changes in NetBackup config

    How to replace devices in a shared storage option configuration on NetBackup media servers
    http://seer.entsupport.symantec.com/docs/271280.htm

    How to update VERITAS NetBackup (tm) if a tape drive is replaced in a robot
    http://seer.entsupport.symantec.com/docs/259835.htm

    How to upgrade tape drive firmware (Flash video)
    http://seer.entsupport.symantec.com/docs/277872.htm

  • Back when I had scsi drives I learned the lesson that you have to correct netbackup whenever you change a tape drives.
    As removing and adding a drive is so easy I just do it by hand.
    Remove drive from netbackup,
    replace drive,
    add drive to netbackup.

    This can be a big issue with scsi drives as you could

    Have drive serial 1234
    remove and replace with 5678
    have drive 4567 removed and replaced with 1234
     (as the drives are refurbished)  now netbackup gets really confused as it thinks it has two drives with the same serial because I never removed any of the old ones, and just kept adding new ones.


    fiber drives are a bit easier, but I still make the habit of correcting it every time I change out a drive.
  • Thanks, the tpautoconf -report_disc command helped quite a bit! I now pinpointed the 3 drives that have wrong serial numbers on the master. I remember having an issue a few weeks back where all of a sudden those drives became unusable on the media servers (bringing them up would return an error), so I ended up deleting and re-adding them on the media servers. Perhaps I can just do the same on the master? I'm not sure I want to rebuild globDB completely for something that can be fixed so easily, assuming it's a proper way to fix it.

    Of course it doesn't fix the duplicate device issue, but I'm not sure how big an issue it really is, it may be just cosmetic. On second thought, there are still 3 drives not usable by the master (not the same 3 drives that have serial numbers mismatch). Perhaps rebuilding the device tree would fix them, I know the drives work because they do work on the media servers.

    I also figured out why I was getting discrepancies between tpautoconf and scan. To make a long story short, it was a "user error", I was looking at the wrong output. They do match, as they should, since both commands actually scan the device tree.
  • Hi,

    a quick way to get your drive configuration nice and clean would be to delete the shared drives, but let one instance of every drive remaining (on one of your working media servers). rebuild the os devices (rm /dev/rmt/*;update_drv st), run scan -changer to find out, which serial number is in which slot. now run scan -tape to find out, which serial number corresponds to which device path. now use the gathered information to configure the devices correctly. on the other servers just rebuild the os devices and run tpautoconf -a: the drives are already known on the media server and the sharing is automatically configured.
  • I use steps similar to those listed by others:
    (especially useful tech note: http://seer.entsupport.symantec.com/docs/271280.htm)
    • Change to the /usr/openv/volmgr/bin directory
    • Run the command: tpautoconf –report_disc
    • In the output of the tpautoconf command you should see some information about a ‘Missing Device (Drive)’ and a ‘New Device (Tape)’.
    • Note the ‘Drive-Name’ and ‘Drive-Path’ information in the ‘Missing Device (Drive)’ section
    • Next run the command: tpautoconf –replace_drive <Drive-Name> - path <Drive-Path>
    • This command will update the serial number and drive information in the device manager databases of all servers
    • Once the command completes, restart the NetBackup services/daemons on all servers
    Other things to take into consideration: firmware levels of the new drives; try to do all this work during a 'quiet' period for NBU; if all else fails, delete all the drive devices and then re-run the Storage Config Wizard through the Java GUI (there's nothing wrong with deleting and recreating all of your devices from time to time; just make a note of any special configuration choices that exist).  Can be nerve wracking, but eventually things will straighten themselves out.

    Had to deal with a "Cannot synchronize global device database" issue once and that was a lot of fun!
    http://support.veritas.com/docs/269133

    GENERAL ERROR: Error occurs when trying to rebuild the globDB: "Cannot synchronize global device database".

  • Thanks for all the help, I finally fixed the problem with the serials mismatch by using the procedure for drive replacement which was suggested by several people in the thread. I also found an updated version of the procedure, it's much easier to follow than the original which was rather confusing: http://seer.entsupport.symantec.com/docs/271366.htm

    Now I'm wondering how those serials got changed. The drives may have been replaced, every now and then we get a get a broken drive and IBM comes out to replace it. Operations take care of it and they don't always inform us when it happens since there's no downtime. I was always under the impression that the serial numbers don't change, just like WWN's dont' change. But I may be wrong. We did have a few situations where we would get some drives that would suddenly become unusable and the way to fix it was basically to rebuild the globDB. Perhaps this was caused by the serial number change. Next time I run into this situation I'll be sure to check for serials mismatch first.