05-28-2021 02:03 AM
Hi all,
our media server (puredisk) had a problem with one of the hard drives in its storage.
We replaced this drive then rebuilt the raid, no apparent problem.
But since then, all the backups which should run on this media server fall with the error 2074.
When I try to update the Disk_Pool I have an error 37 (see the attached screenshot).
I have already checked for the presence of this file <Install_path>\NetBackup\bin\ost-plugins\srvrname.cfg
I have already managed to put the Disk_Pool UP with command.
Also checked this :
05-28-2021 02:24 AM
And now after rebooting both media and master servers I got this :
05-28-2021 06:36 PM
Hi @Aurelien59
The second image from your first post is indicating that the system is unable to see the disk volume.
Are you able to see the contents of the MSDP volume - is it mount back onto the same drive/mountpoint as before?
Is the size of the volume consistent with what you are expecting?
Can you perform a file system check on the voliume successfully?
Doesn't look good at the moment, but hopefully it will be something simple.
As an aside - this pool appears to be 94% full - which is never a great idea for a dedupe pool (best to keep [well] below 90%).
Good luck
David
05-31-2021 04:32 AM
Hi @davidmoline
"Are you able to see the contents of the MSDP volume - is it mount back onto the same drive/mountpoint as before?"
Yes and yes.
"Is the size of the volume consistent with what you are expecting?"
I am not sure but I have the feeling that the volume used seems less important than before replacing the defective disk. I have no proof of the occupation of the storage before the crash.
"Can you perform a file system check on the voliume successfully?"
I performed succesfully SFC /scannow with no error.
"As an aside - this pool appears to be 94% full - which is never a great idea for a dedupe pool (best to keep [well] below 90%)."
True, it is precisely one of the projects in progress for the weeks to come.
05-31-2021 05:51 AM
Is it a communication error?
Run this from master server: bptestbpcd -client <media-server-name> -debug -verbose
What output does it give you?
05-31-2021 07:11 AM
here's the result of the command :
M:\Veritas\NetBackup\bin\admincmd>bptestbpcd -client srv-nbumed01-ft -debug -verbose
16:09:42.581 [9400.18876] <2> bptestbpcd: VERBOSE = 0
16:09:42.596 [9400.18876] <2> vnet_pbxConnect: pbxConnectEx Succeeded
16:09:42.596 [9400.18876] <2> logconnections: BPCD CONNECT FROM 123.1.3.184.29240 TO 123.8.54.180.1556 fd = 528
16:09:42.612 [9400.18876] <2> vnet_pbxConnect: pbxConnectEx Succeeded
16:09:42.643 [9400.18876] <8> do_pbx_service: [vnet_connect.c:2186] via PBX VNETD CONNECT FROM 123.1.3.184.29241 TO 123.8.54.180.1556 fd = 548
16:09:42.643 [9400.18876] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:455] VN_REQUEST_CONNECT_FORWARD_SOCKET 10 0xa
16:09:42.846 [9400.18876] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:480] ipc_string 62484
16:09:43.766 [9400.18876] <2> bpcr_get_version_rqst: bpcd version: 08000000
1 1 1
123.1.3.184:29240 -> 123.8.54.180:1556
123.1.3.184:29241 -> 123.8.54.180:1556
16:09:43.969 [9400.18876] <2> bpcr_get_peername_rqst: Server peername length = 26
16:09:44.172 [9400.18876] <2> bpcr_get_hostname_rqst: Server hostname length = 27
16:09:44.390 [9400.18876] <2> bpcr_get_clientname_rqst: Server clientname length = 27
16:09:44.609 [9400.18876] <2> bpcr_get_version_rqst: bpcd version: 08000000
16:09:44.812 [9400.18876] <2> bpcr_get_platform_rqst: Server platform length = 7
16:09:45.030 [9400.18876] <2> bpcr_get_version_rqst: bpcd version: 08000000
16:09:45.264 [9400.18876] <2> bpcr_patch_version_rqst: theRest == > <
16:09:45.264 [9400.18876] <2> bpcr_get_version_rqst: bpcd version: 08000000
16:09:45.685 [9400.18876] <2> bpcr_patch_version_rqst: theRest == > <
16:09:45.685 [9400.18876] <2> bpcr_get_version_rqst: bpcd version: 08000000
PEER_NAME = srv-nbumast-ai.process.dkm
HOST_NAME = srv-nbumed01-ft.process.dkm
CLIENT_NAME = srv-nbumed01-ft.process.dkm
VERSION = 0x08000000
PLATFORM = win_x64
PATCH_VERSION = 8.0.0.0
SERVER_PATCH_VERSION = 8.0.0.0
MASTER_SERVER = srv-nbumast-ai.process.dkm
EMM_SERVER = srv-nbumast-ai.process.dkm
NB_MACHINE_TYPE = MEDIA_SERVER
16:09:45.919 [9400.18876] <2> vnet_pbxConnect: pbxConnectEx Succeeded
16:09:45.950 [9400.18876] <8> do_pbx_service: [vnet_connect.c:2186] via PBX VNETD CONNECT FROM 123.1.3.184.29253 TO 123.8.54.180.1556 fd = 560
16:09:45.950 [9400.18876] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:455] VN_REQUEST_CONNECT_FORWARD_SOCKET 10 0xa
16:09:46.169 [9400.18876] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:480] ipc_string 62486
123.1.3.184:29253 -> 123.8.54.180:1556
<2>bptestbpcd: EXIT status = 0
16:09:46.590 [9400.18876] <2> bptestbpcd: EXIT status = 0
Looks like everything is OK here?
05-31-2021 04:03 PM
Hi @Aurelien59
Comms looks fine as expected.
My reading of the situation is that the media server is unable to reattached the MSDP volume for some reason.
Review these two article and see if all the various configuration settings in your systenm are correct
https://www.veritas.com/support/en_US/article.100024630
https://www.veritas.com/support/en_US/article.100032612
In particular the contents of the registry key are what tells the media server where to look for the MSDP volume.
Then the etc directory indicates how the volume is setup (directory locations etc.).
Cheers
David
06-01-2021 01:58 AM
M:\Veritas\pdde>spad --test
Warning: 25002: __openReaderA: could not open file S:\Dedup_NBU\etc\puredisk\spa.cfg
Error: 25002:
BadValue : (none)
File : S:\Dedup_NBU\etc\puredisk\spa.cfg
Section : Logging
Entry : HistoryPath
Reason : Expected a non-empty value.
S:\Dedup_NBU\etc\puredisk\spa.cfg: 5 error(s)
Seeing this, checked in the path, and the spa.cfg was missing. So I copied and edited a spa.cfg from another media server. In registry everything was OK.
Contentrouter.cfg was also missing. Did the same.
M:\Veritas\pdde>spoold --trace
Error [0000000000265DA0]: -1: Failed to load storage format version from S:\Dedup_NBU\data\.format
Error [0000000000265DA0]: -1: The storage format file S:\Dedup_NBU\data\.format is lost or corrupted, please run the following command to fix it:
Error [0000000000265DA0]: -1: M:\Veritas\\pdde\stconv.exe --fixformatfile
Error [0000000000265DA0]: -1: NetConnectByAddr: Failed to connect to host: Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. (10061)
Error [0000000000265DA0]: -1: NetConnectByAddr: Failed to connect to spad on port 10102 using the following interface(s): [ 123.8.54.180 ::1 ] (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur c
-92
Error [0000000000265DA0]: 25053: Could not establish a connection to SRV-NBUMED01-FT:10102: connect failed (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. )
Error [0000000000265DA0]: 25053: Connection failed connection actively refused
Error [0000000000265DA0]: -1: NetConnectByAddr: Failed to connect to host: Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. (10061)
Error [0000000000265DA0]: 25053: Could not establish a connection to SRV-NBUMED01-FT:10102: connect failed (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. )
Error [0000000000265DA0]: 25053: Connection failed connection actively refused
Error [0000000000265DA0]: 26016: Storage Format: Check failure.
M:\Veritas\pdde>stconv.exe --fixformatfile
then...
M:\Veritas\pdde>spoold --trace
Error [00000000004B3390]: -1: _dcHeaderRead: invalid version of container header 2863311530
Error [00000000004B3390]: 25032: _storeCheckContainers: failed to read index headerfrom container 111615 (data corrupt)
Error [00000000004B3390]: -1: _dcHeaderRead: invalid version of container header 2863311530
Error [00000000004B3390]: 25032: _storeCheckContainers: failed to read index headerfrom container 135245 (data corrupt)
Error [00000000004B3390]: -1: _dcHeaderRead: invalid version of container header 2863311530
Error [00000000004B3390]: 25032: _storeCheckContainers: failed to read index headerfrom container 150585 (data corrupt)
[...]
Error [0000000000445DA0]: 25002: OpenId: Could not open S:\Dedup_NBU\spool\.tlogid to read the current tlogid.
Warning [0000000000445DA0]: 25004: Could not initialize performance counter for \Process(spoold)\% Processor Time (no such object)
Warning [0000000000445DA0]: 25004: Could not initialize performance counter for \Processor(_Total)\% Processor Time (no such object)
Warning [0000000000445DA0]: 25004: Could not initialize performance counter for \Processor(0)\% Processor Time (no such object)
Warning [0000000000445DA0]: 25004: Could not initialize performance counter for \Processor(1)\% Processor Time (no such object)
[...]
Warning [0000000000445DA0]: 25004: Could not initialize performance counter for \Processor(23)\% Processor Time (no such object)
Error [0000000000445DA0]: -1: NetConnectByAddr: Failed to connect to host: Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. (10061)
Error [0000000000445DA0]: -1: NetConnectByAddr: Failed to connect to spad on port 10102 using the following interface(s): [ 123.8.54.180 ::1 ] (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur
-92
Error [0000000000445DA0]: 25053: Could not establish a connection to SRV-NBUMED01-FT:10102: connect failed (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. )
Error [0000000000445DA0]: 25053: Connection failed connection actively refused
Warning [0000000000445DA0]: 25053: Failed to get startup CR modes from SPA after 1 attempt, retrying in 10 seconds
Error [0000000000445DA0]: -1: NetConnectByAddr: Failed to connect to host: Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. (10061)
Error [0000000000445DA0]: 25053: Could not establish a connection to SRV-NBUMED01-FT:10102: connect failed (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. )
Error [0000000000445DA0]: 25053: Connection failed connection actively refused
Warning [0000000000445DA0]: 25053: Failed to get startup CR modes from SPA after 2 attempts, retrying in 10 seconds
[...]
Error [0000000000445DA0]: -1: NetConnectByAddr: Failed to connect to host: Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. (10061)
Error [0000000000445DA0]: 25053: Could not establish a connection to SRV-NBUMED01-FT:10102: connect failed (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. )
Error [0000000000445DA0]: 25053: Connection failed connection actively refused
Warning [0000000000445DA0]: 25053: Failed to get startup CR modes from SPA after 8 attempts, retrying in 10 seconds
Error [00000000004B3390]: -1: NetConnectByAddr: Failed to connect to host: Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. (10061)
Error [00000000004B3390]: -1: NetConnectByAddr: Failed to connect to spad on port 10102 using the following interface(s): [ 123.8.54.180 ::1 ] (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur
-92
Error [00000000004B3390]: 25053: Could not establish a connection to SRV-NBUMED01-FT:10102: connect failed (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. )
Error [00000000004B3390]: 25053: Connection failed connection actively refused
Error [0000000000445DA0]: -1: NetConnectByAddr: Failed to connect to host: Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. (10061)
Error [0000000000445DA0]: 25053: Could not establish a connection to SRV-NBUMED01-FT:10102: connect failed (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. )
Error [0000000000445DA0]: 25053: Connection failed connection actively refused
Error [0000000000445DA0]: 25053: Failed to get startup CR modes from SPA
Error [0000000000445DA0]: -1: NetConnectByAddr: Failed to connect to host: Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. (10061)
Error [0000000000445DA0]: 25053: Could not establish a connection to SRV-NBUMED01-FT:10102: connect failed (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. )
Error [0000000000445DA0]: 25053: Connection failed connection actively refused
Error [0000000000445DA0]: -1: NetConnectByAddr: Failed to connect to host: Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. (10061)
Error [0000000000445DA0]: 25053: Could not establish a connection to SRV-NBUMED01-FT:10102: connect failed (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. )
Error [0000000000445DA0]: 25053: Connection failed connection actively refused
Error [0000000000445DA0]: 26016: Configuration Manager: Start failure.
Error [0000000000445DA0]: -1: NetConnectByAddr: Failed to connect to host: Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. (10061)
Error [0000000000445DA0]: 25053: Could not establish a connection to SRV-NBUMED01-FT:10102: connect failed (Aucune connexion nÆa pu Ûtre Útablie car lÆordinateur cible lÆa expressÚment refusÚe. )
Error [0000000000445DA0]: 25053: Connection failed connection actively refused
06-01-2021 05:50 AM
Here is the result of "spad --test" command :
M:\Veritas\pdde>spad --test
S:\Dedup_NBU\etc\puredisk\spa.cfg: verified OK
nb : spa.cfg was missing so I copied one from another media server, and edited it. So now result seems good here. Did the same for contentrouter.cfg which was missing too
and an attached .txt file containing the result of "spoold --trace"
06-01-2021 05:35 PM
Hi @Aurelien59
I'd suggest you log a support case - the output is indicating (rightly or not) that there is some possible corruption. The fact that some key files were missing is a concern.
Are you running dedupe catalog backups on that pool (if not why not)? You may be able to retrieve the original contents of the various pool configuration files from there.
David
06-02-2021 02:58 AM
Hi David,
We are not running dedupe catalog backups on that pool.
We are only running a CATALOG_DRIVEN_BACKUP policy. Maybe it is the same thing you'r talking about ?
06-02-2021 04:11 PM
Hi @Aurelien59
The Dedupe catalog backup policy is not something that is auotmatically setup (unless you install a NetBackup appliance). It is mentioned in the Dedupe Guide but is not well known. The command that is used to create this policy is the drcontrol command (<INSTALL_PATH>\Veritas\pdde\drcontrol.exe or /usr/openv/pdde/pdcr/bin/drcontrol).
This utility can be used to create a policy that protects critical files assocaited with the dedupe pool (including the files you have had to recreate - spa.cfg & contentrouter.cfg). Run the command with no arguments will give you what options are available and where to run.
As I said before I'd strongly recommend opening a support case and have them look into recovering your dedupe pool.
Cheers
David
06-03-2021 01:15 AM
Hi David,
Unfortunately we cannot benefit from the support because we are still using version 8.0.
We will have to plan an upgrade to 8.2.
Anyway thank you for your help.
06-03-2021 04:18 PM
Hi @Aurelien59
NetBackup 8.0 is under extended support it is not unsupported, I would still attempt to log a case explaining that you want to upgrade to 8.2 (or higher), but have this as a block to that upgrade path. I can't guarantee that they will help you but it is worth the attempt.
David
06-06-2021 08:21 PM
Hello @Aurelien59
Error 37 is usually indicates a missing SERVER entry in bp.conf / Windows Registry. If this is the case then following commands should give us clear hints.
Possible to post output for following commands ?
From Master Server --> bptestbpcd -host media-server -verbose
From Media Server --> bptestbpcd -host master-server -verbose
From Media Server --> bpclntcmd -pn -verbose -debug
From Media Server --> bpclntcmd -self