08-02-2023 11:19 AM - edited 08-02-2023 11:22 AM
I have a relatively new and unused Media Server VM connected successfully to our master. For a datastore I was initially using a (local to the VM) virtual disk. The next day I noticed that all of my backups had failed, because the virtual disk for the backups had fatal problems and was keeping the VM from booting. I removed the disk, confirmed that the Media server could connect and added new local storage, but backup failures make me believe that the new storage (slightly smaller and mounted using the same mount point) is not successfully configured.
The Java console confirms that the media server is in successful contact with the master.
Can someone please describe the process I should be using to inform the master that the old storage on the media server is defunct and that new storage is available?
This is the log for the backup I attempted;
08-03-2023 03:15 AM
The proper way is to delete all the netbackup images in the storage unit not working, remove the storage unit and create a new one.
But here is some questions to assist debugging the issues.
To get all the log details from the job run the command :
#vxlogview -p 51216 -X "jobid=12345"
12345 = is the activity monitor job id. Don't post the debug output in a post if you choose to share it with the community, it will be marked as SPAM
Do you also get a 2074 when running
# bpimage -cleanup –allclients
08-03-2023 06:54 AM - edited 08-03-2023 06:56 AM
Thanks for the reply.
This is the part that I feel like I need to do; "remove the storage unit and create a new one." Maybe I'm making this too hard, but I can't see where I would do this on an existing media server after the old, non-responsive storage was deleted from the media server.
That said, with the timing of the disk failure so close after adding the MS, there were no backups on the old disk.
Under Storage Units the category is MSDP, and under Disk Pools the storage server type is PureDisk
This command hangs on the master;
vxlogview -p 51216 -X "jobid=30814"
...and I get this on the client media server (maybe this is to be expected)
"V-1-45 No log files found."
I do not get a 2074 when running
# bpimage -cleanup –allclients
Thanks for the help, I am very new to NBU.
08-03-2023 03:31 PM
Hi @engops-rob
If there was nothing on the old disk, then the way to go is as @Nicolai states, remove the old device and re-creat (there is a lot of data on the MSDP drive that is required to make it all work - if this wasn't copied/recreated then it is no surprise it is not working).
To remove the old device - in the java gui try deleting the devices in the follwing order (if anything doesn't delete it will give you the reason why)
Once the storage server is gone, clean up (remove) any residual files on the MSDP mount, and use the wizard to create the MSDP pool again.
Cheers
David
08-08-2023 12:12 PM - edited 08-08-2023 12:14 PM
Thanks David,
To remove the old device - in the java gui try deleting the devices in the follwing order (if anything doesn't delete it will give you the reason why)
SLP using the device (or modify them to remove the STU, then wait for 24 hours or delete the old versions)
nbstl -L told me there were none, so I moved on to the storage units
Storage unit(s) associated with the device Disk pool Storage Server
I was able to delete the appropriate one with no issues.
Once the storage server is gone, clean up (remove) any residual files on the MSDP mount, and use the wizard to create the MSDP pool again.
I found the option to remove the media server under Media and Device Mangement -> Media Servers -> Remove Device Host. The Media server's status is "offline", and the action fails with an error; "the media is allocated for use (199)"
As far as the MSDP mount, it's already been replaced with a fresh, empty disk.
I still see a disk pool in the web interface, but attempting to remove that (after "downing" it) I get this error;
Status Code: 134, localizedMsg: unable to process request because the server resources are busy, additionalMsg: Disk Pool cannot be deleted because Disk Volume (PureDiskVolume) contains image fragments waiting to be deleted
Rob
08-08-2023 04:03 PM
Hi @engops-rob
Okay - additional step required which is to purge the deltion list for the particular volume (background, when an image expires, the image goes into a deletion list which is processed on a regular basis - these show in the activity monitor as the Image Cleanup jobs). Anyway, as the device is broken it is unable to process the deletion of the images/fragments.
Use the nbdelete -list to see what is waiting to be delete - if the entries just relate to your broken device (only one @aaaax device listed), then use the "-purge_deletion_list" option to remove them. If there are other entries, then you will need to add the option to just target your broken device ("nbdelete -purge_deletion_list -media_id @aaaaX").
Once this is done, you should be able to carry on with the disk pool deletion.
David
08-09-2023 09:38 AM
That did it, thanks David.
Something else... now when I try to re-introduce the media server I observe a few things:
At this point I feel like I'm in a decent place with the recovery, so I move to the web interface to finish up with protection plans, etc., and while the Storage Unit is in shown, the Media Server is not. On a whim I tried to add it via the web interface (again with the cached folders) but I'm getting a strange "Specified servers <redacted> are not reachable or are offline." error. I say strange because I can ping the target, and services seem ok and the certificate checks out as trusted. The master's services were restarted because I felt a need to force a re-read. I figured it couldn't hurt and might help. Any thoughts?
I appreciate the coaching, I'm learning from this process.
08-09-2023 03:25 PM
Hi @engops-rob
At this point it may be simpler and safer to completely remove the media server and uninstall NetBackup from it and start again. As an aside although I'm not sure, the stale mounts showing may be cleared by issuing a "bpclntcmd -clear_host_cache" on the master server.
To remove the media server check the output of "nbemmcmd -listhosts" for the media server - there will probably be two entries, one labelled as media server, and the second as ndmp (which represents the storage server). These should be deleted (do the ndmp first): "nbemmcmd -deletehost -machinename <media_srv> -machinetype ndmp" then "nbemmcmd -deletehost -machinename <media_srv> -machinetype media".
Uninstalling NetBackup - if Windows is simple; for Linux use the "rpm -e" command with the list of packages (I often use this compound command: "rpm -qa | grep VRTS | xargs rpm -e" to remove all Veritas packages (best check that there is nothing else from Veritas installed before running the above). Finally, clear up any remaining files/folders from the installation location.
When reinstalling NetBackup you will probably need to create a reissue token to allow the media server to get its host certificate again (right click on the server in the certificate management section and select reissue token).
Cheers
David
08-18-2023 06:38 AM
Scorched earth FTW. I ended up recreating from scratch and going through the whole certs-and-tokens exercise again since I never seem to get this right the first time. Thanks for the assist.