Forum Discussion

engops-rob's avatar
engops-rob
Level 3
2 years ago

Same Media server, new storage

I have a relatively new and unused Media Server VM connected successfully to our master. For a datastore I was initially using a (local to the VM) virtual disk. The next day I noticed that all of my backups had failed, because the virtual disk for the backups had fatal problems and was keeping the VM from booting. I removed the disk, confirmed that the Media server could connect and added new local storage, but backup failures make me believe that the new storage (slightly smaller and mounted using the same mount point) is not successfully configured. 

The Java console confirms that the media server is in successful contact with the master. 

Can someone please describe the process I should be using to inform the master that the old storage on the media server is defunct and that new storage is available? 

 

This is the log for the backup I attempted; 

 

  • Aug 02, 2023 11:27:07 AM - Info bpdbm (pid=4272) image catalog cleanup
  • Aug 02, 2023 11:27:07 AM - Info bpdbm (pid=4272) deleting images which expire before Wed Aug 2 11:27:07 2023 (1690990027)
  • Aug 02, 2023 11:27:08 AM - Info nbdelete (pid=5492) deleting expired images. Media Server: nbu Media: @aaaaj
  • Aug 02, 2023 11:27:08 AM - requesting resource @aaaaj
  • Aug 02, 2023 11:27:08 AM - granted resource MediaID=@aaaaj;DiskVolume=nbu;DiskPool=DP-HCP;Path=nbu;StorageServer=ark.archivas.com;MediaServer=nbu
  • Aug 02, 2023 11:27:09 AM - Info bpdm (pid=5502) started
  • Aug 02, 2023 11:27:09 AM - started process bpdm (pid=5502)
  • Aug 02, 2023 11:27:11 AM - Info bpdm (pid=5502) initial volume nbu: Kbytes total capacity: 9007199254740991, used space: 0, free space: 9007199254740991
  • Aug 02, 2023 11:27:32 AM - Info bpdm (pid=5502) ending volume nbu: Kbytes total capacity: 9007199254740991, used space: 0, free space: 9007199254740991
  • Aug 02, 2023 11:27:32 AM - Info bpdm (pid=5502) EXITING with status 0
  • Aug 02, 2023 11:27:32 AM - Info nbdelete (pid=5492) deleting expired images. Media Server: svpscrck8nbu01.sc.eng.hitachivantara.com Media: @aaaaq
  • Aug 02, 2023 11:27:32 AM - Error nbdelete (pid=5492) Cannot obtain resources for this job : error [2074]
  • Aug 02, 2023 11:27:32 AM - Warning bpdbm (pid=4272) nbdelete failed with status (1)
  • Aug 02, 2023 11:27:32 AM - requesting resource @aaaaq
  • the requested operation was partially successful(1)

 

  • The proper way is to delete all the netbackup images in the storage unit not working, remove the storage unit and create a new one. 

    But here is some questions to assist debugging the issues. 

    • Does this new disk consist of the old NetBackup images written to the old disk ?
    • What is the storage unit type of this disk unit - it looks like a advanced disk ?

    To get all the log details from the job run the command :

    #vxlogview -p 51216 -X "jobid=12345"

    12345 = is the activity monitor job id. Don't post the debug output in a post if you choose to share it with the community, it will be marked as SPAM

    Do you also get a 2074 when running 

    # bpimage -cleanup –allclients

    • engops-rob's avatar
      engops-rob
      Level 3

      Thanks for the reply.

      This is the part that I feel like I need to do; "remove the storage unit and create a new one." Maybe I'm making this too hard, but I can't see where I would do this on an existing media server after the old, non-responsive storage was deleted from the media server.

      That said, with the timing of the disk failure so close after adding the MS, there were no backups on the old disk.
      Under Storage Units the category is MSDP, and under Disk Pools the storage server type is PureDisk

      This command hangs on the master;
      vxlogview -p 51216 -X "jobid=30814"
      ...and I get this on the client media server (maybe this is to be expected)
      "V-1-45 No log files found."

      I do not get a 2074 when running
      # bpimage -cleanup –allclients

      Thanks for the help, I am very new to NBU.

      • davidmoline's avatar
        davidmoline
        Level 6

        Hi engops-rob 

        If there was nothing on the old disk, then the way to go is as Nicolai states, remove the old device and re-creat (there is a lot of data on the MSDP drive that is required to make it all work - if this wasn't copied/recreated then it is no surprise it is not working). 

        To remove the old device - in the java gui try deleting the devices in the follwing order (if anything doesn't delete it will give you the reason why)

        1. SLP using the device (or modify them to remove the STU, then wait for 24 hours or delete the old versions)
        2. Storage unit(s) associated with the device
        3. Disk pool
        4. Storage Server

        Once the storage server is gone, clean up (remove) any residual files on the MSDP mount, and use the wizard to create the MSDP pool again.

        Cheers
        David

  • Thanks David, 

    To remove the old device - in the java gui try deleting the devices in the follwing order (if anything doesn't delete it will give you the reason why)

    SLP using the device (or modify them to remove the STU, then wait for 24 hours or delete the old versions)
    nbstl -L told me there were none, so I moved on to the storage units

    Storage unit(s) associated with the device Disk pool Storage Server
    I was able to delete the appropriate one with no issues.

    Once the storage server is gone, clean up (remove) any residual files on the MSDP mount, and use the wizard to create the MSDP pool again.
    I found the option to remove the media server under Media and Device Mangement -> Media Servers -> Remove Device Host. The Media server's status is "offline", and the action fails with an error; "the media is allocated for use (199)"
    As far as the MSDP mount, it's already been replaced with a fresh, empty disk.

    I still see a disk pool in the web interface, but attempting to remove that (after "downing" it) I get this error; 

    Status Code: 134, localizedMsg: unable to process request because the server resources are busy, additionalMsg: Disk Pool cannot be deleted because Disk Volume (PureDiskVolume) contains image fragments waiting to be deleted

    Rob

    • davidmoline's avatar
      davidmoline
      Level 6

      Hi engops-rob 

      Okay - additional step required which is to purge the deltion list for the particular volume (background, when an image expires, the image goes into a deletion list which is processed on a regular basis - these show in the activity monitor as the Image Cleanup jobs). Anyway, as the device is broken it is unable to process the deletion of the images/fragments. 

      Use the nbdelete -list to see what is waiting to be delete - if the entries just relate to your broken device (only one @aaaax device listed), then use the "-purge_deletion_list" option to remove them. If there are other entries, then you will need to add the option to just target your broken device ("nbdelete -purge_deletion_list -media_id @aaaaX").

      Once this is done, you should be able to carry on with the disk pool deletion. 

      David

      • engops-rob's avatar
        engops-rob
        Level 3

        That did it, thanks David.

        Something else... now when I try to re-introduce the media server I observe a few things:

        1. The media server is identified in the Java console under Media and Device Management > Devices > Media Servers but is offline.
        2. The media server is identified in the Java console under Netbackup Management > Host Properties > Media Servers and has sufficient connectivity to achieve a green checkmark.
        3. If I try to add the new disk based Storage Unit in the Java console I see historical data when browsing for the mount point, e.g.; it shows the vpfs and vpfs_shares folders from the old media. The new disk media doesn't have those folders, so it seems to me the console is using cached information.

        At this point I feel like I'm in a decent place with the recovery, so I move to the web interface to finish up with protection plans, etc., and while the Storage Unit is in shown, the Media Server is not. On a whim I tried to add it via the web interface (again with the cached folders) but I'm getting a strange "Specified servers <redacted> are not reachable or are offline." error. I say strange because I can ping the target, and services seem ok and the certificate checks out as trusted. The master's services were restarted because I felt a need to force a re-read. I figured it couldn't hurt and might help. Any thoughts?

        I appreciate the coaching, I'm learning from this process.

         

  • Scorched earth FTW. I ended up recreating from scratch and going through the whole certs-and-tokens exercise again since I never seem to get this right the first time. Thanks for the assist.

  • It looks like you’re encountering issues with backup failures due to changes in your storage configuration. To address the problem and inform the master that the old storage is defunct and the new storage is available, you should:

    • Log into the master server console where your storage servers are managed.
    • Remove the old storage references from the master’s list of available storage servers.
    • Configure the new storage server on the master, ensuring the mount point and settings match those on the media server.

    Next, confirm that the new storage is accessible and properly mounted on the media server. Make sure the mount point for the new storage is correctly set up and matches the configuration specified in your storage management settings.

    On the master server, refresh or re-synchronize the storage server list to include the new storage configuration and update any backup policies or job settings to use the new storage server.

    Review the backup logs for any new error messages or warnings that might indicate issues with the new storage setup. Ensure resources are correctly allocated and that the new storage is recognized as available.

    The log snippet suggests a resource allocation issue, indicated by error [2074]. It’s possible the master server is still trying to use the old storage configuration or there’s a mismatch in resource definitions.

    If these steps don’t resolve the issue, consult the documentation specific to your backup and storage solution or reach out to support for more detailed troubleshooting.

    Let me know if you need further assistance!

    • Nicolai's avatar
      Nicolai
      Moderator

      You are replying to a issues opened 2 years ago.