cancel
Showing results for 
Search instead for 
Did you mean: 

SQL Instance Registration fails - error 25 - Validation ErrorStatus code: 25 cannot connect on socke

William_Jansen_
Level 5
Partner Certified

Hi

When we try and register a lot of SQL servers, at some point it's like the SQL Instance registration feature becomes unstable. Even servers that did work stops working. When we try and register new servers, It's like it waits a long time for timout and then we get the attached error message...

Any help on a process to restart, without cycling netbackup entirely to get it working again?

1 ACCEPTED SOLUTION

Accepted Solutions

Seems to be a bug in NBARS. After restarting nbars or "Netbackup Agent Request Server". The gui and the command line is working again. I havent seen a fix for this in 7.7.3. But after the fix, the registration process is working as designed.

Thanks for all the help and patience guys!

New screenshot for those intrested.NBARS working as designed.NBARS working as designed.

View solution in original post

13 REPLIES 13

William_Jansen_
Level 5
Partner Certified

Screenshow with a registration proces and bptestbpcd

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Nothing easy when you post.... Smiley Wink

It is not even clear where the status 25 is coming from - master or client...

I would look at bpcd log on master and client for clues... 

Michal_Mikulik1
Moderator
Moderator
Partner    VIP    Accredited Certified

Hello,

several notes:

1/ in a such situation test if mere filesystem backup works for the sql client

2/ you are talking about registering a lot of servers however there are only two instances in the printscreen.. registration is performed via CLI or GUI?

3/ if you are talking about sql servers which did good but they dont work now - you mean their re-registration, or common backup here?

Overall looks like something "environmental specific" for me..

Michal

William_Jansen_
Level 5
Partner Certified

Thanks :)

Will take a look now. Dont know why that didn't occor to me. I have a feeling it would be from the master side as it fails on all clients after i've tried to register a few.

Hello Michal

1/ in a such situation test if mere filesystem backup works for the sql client

FS backups work, even sql backups that were setup before 'timeout' occors works. Cant register new one's so dont know if they'll work after that.

2/ you are talking about registering a lot of servers however there are only two instances in the printscreen.. registration is performed via CLI or GUI?


Via GUI, It seems to happen consistently after I try to register a lot. This might be one issue in the batch causing the issue. Last time I tried registering like 100 instances. This time I was doing 10 at a time. After the 3rd back it started timing out. Now it times out even when I do 1 as per screenshot.

3/ if you are talking about sql servers which did good but they dont work now - you mean their re-registration, or common backup here?


SQL Application Registration. Seems like after a reboot of the nbu services you can register them again up to a point.

Overall looks like something "environmental specific" for me..

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Have you tried to increase Client Connect Timeout on the master server?
Something like 15 minutes (900) perhaps?

William_Jansen_
Level 5
Partner Certified

All I see is this in the master BPCD log:

09:20:20.570 [21468.17684] <2> process_requests: BPCD_LOG_RQST_NO_STATUS
09:20:20.570 [21468.17684] <2> validate_fd_and_transfer_count: Received file_num=5,transfer_count=159 for validation
09:20:20.570 [21468.17684] <2> process_requests: BPCD_LOG_RQST_NO_STATUS
09:20:20.570 [21468.17684] <2> validate_fd_and_transfer_count: Received file_num=5,transfer_count=190 for validation
09:20:20.570 [21468.17684] <2> process_requests: BPCD_LOG_RQST_NO_STATUS
09:20:20.570 [21468.17684] <2> validate_fd_and_transfer_count: Received file_num=5,transfer_count=190 for validation
09:20:20.601 [21468.17684] <2> process_requests: BPCD_LOG_RQST_NO_STATUS
09:20:20.601 [21468.17684] <2> validate_fd_and_transfer_count: Received file_num=5,transfer_count=191 for validation
09:20:20.601 [18608.15376] <2> bpcd main: accept sock = 732
09:20:20.601 [21468.17684] <2> process_requests: BPCD_LOG_RQST_NO_STATUS
09:20:20.601 [21468.17684] <2> validate_fd_and_transfer_count: Received file_num=5,transfer_count=191 for validation
09:20:20.616 [21468.17684] <2> process_requests: BPCD_LOG_RQST_NO_STATUS
09:20:20.616 [21468.17684] <2> validate_fd_and_transfer_count: Received file_num=5,transfer_count=121 for validation
09:20:20.632 [26564.3984] <2> setup_debug_log: switched debug log file for bpcd
09:20:20.632 [26564.3984] <4> bpcd main: VERBOSE=1, BPCD VERBOSE=1 Timestamp=1472799136
09:20:20.632 [26564.3984] <2> logparams: E:\Veritas\NetBackup\bin\bpcd.exe -standalone
09:20:20.632 [26564.3984] <2> process_requests: offset to GMT -7200
09:20:20.632 [26564.3984] <2> logconnections: BPCD ACCEPT FROM 10.1.14.233.63708 TO 10.1.14.233.13782 fd = 732
09:20:20.632 [26564.3984] <2> process_requests: setup_sockopts complete
09:20:20.632 [26564.3984] <2> bpcd peer_hostname: Connection from host master (10.1.14.233) port 63708
09:20:20.632 [26564.3984] <2> bpcd valid_server: comparing master and master
09:20:20.632 [26564.3984] <4> bpcd valid_server: hostname comparison succeeded
09:20:20.632 [26564.3984] <2> process_requests: output socket port number = 1374
09:20:20.632 [26564.3984] <2> process_requests: <---- NetBackup 7.7.2 0 ------------initiated
09:20:20.632 [26564.3984] <2> process_requests: VERBOSE = 1

 

And not even a connection in the BPCD log on the client.

 

 

bptestbpcd from the master > client works fine
g:\Veritas\Netbackup\logs\bpcd>bptestbpcd -verbose -client ctscomdbd01
1 1 1
10.1.14.233:54184 -> 10.3.210.85:1556
10.1.14.233:54185 -> 10.3.210.85:1556
PEER_NAME = master
HOST_NAME = CTSCOMDBD01
CLIENT_NAME = CTSCOMDBD01
VERSION = 0x07720000
PLATFORM = win_x64
PATCH_VERSION = 7.7.2.0
SERVER_PATCH_VERSION = 7.7.2.0
MASTER_SERVER = master
EMM_SERVER = master
NB_MACHINE_TYPE = CLIENT
10.1.14.233:54196 -> 10.3.210.85:1556

Michal_Mikulik1
Moderator
Moderator
Partner    VIP    Accredited Certified

Hello,

one thing - if you launch the console on a computer different from Master (e.g. on some PC), try to launch it directly on Master and repeat the operation. Then comm between components is simpler.

Then I recommend to use CLI.

If it wont help, I recommend to log a support call with Veritas. Sounds like very specific issue.

Regards

Michal

Hi

I'm doing the registration from the master. I suspect it's a process that get's stuck. I'm hoping to find which one it is so I can "restart" it without cycling all the services.

 

Do you have the command line usage for sql registration handy perhaps? Else I'll look it up and let you know.

Michal_Mikulik1
Moderator
Moderator
Partner    VIP    Accredited Certified

Hello,

refer to nbsqladm -add_instance in NetBackup7.7_RefGuide_Commands.pdf

Michal

We have a different error:

g:\Veritas\Netbackup\logs\bpcd>nbsqladm -register_instance MSSQLSERVER -host cts
comdbd01 -local_credentials
Registering instance MSSQLSERVER failed.
EXIT STATUS 23: socket read failed

 

OK, created an nbsqladm log. found this:

16:09:38.729 [21156.24700] <2> logconnections: BPRD CONNECT FROM 10.1.14.233.65337 TO 10.1.14.233.13720 fd = 652
16:09:38.729 [21156.24700] <4> bprd_request: Request String: 19 master aquaboy MSSQLSERVER ctscomdbd01 NONE NONE NONE NONE 9962 1 0 0
16:10:09.337 [21156.24700] <2> get_long_base: (1) cannot read (byte 1) from network: An existing connection was forcibly closed by the remote host.
16:10:09.337 [21156.24700] <16> bprd_request: Could not read instance infomation.
16:10:09.337 [21156.24700] <16> bprd_request: h_errno = 10054 - An existing connection was forcibly closed by the remote host.
16:10:09.352 [21156.24700] <4> Exit: EXIT STATUS 23: socket read failed

 

 BPRD reports:

16:16:48.294 [24104.15896] <4> msgbackup: waiting for response from nbpem
16:16:48.434 [3760.24048] <2> Orb::createInsecureObjectRef: corbaloc being used
is: corbaloc:pbxiop:localhost:1556:nbars/DARSManagement(Orb.cpp:1339)
16:16:48.434 [3760.24048] <2> EndpointSelector_R2::performCallUsingEndpoint: eps
r2: insecure invocation->perform_call(...) to host 127.0.0.1 FAILED(Endpoint_Sel
ector.cpp:1259)
16:16:48.434 [3760.24048] <2> Orb::connectToObjectReference: connection attempt
failed: system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0' OMG minor code (2
), described as '*unknown description*', completed = NO (OrbConnect.cpp:191)
16:16:49.448 [3760.24048] <2> Orb::createInsecureObjectRef: corbaloc being used
is: corbaloc:pbxiop:localhost:1556:nbars/DARSManagement(Orb.cpp:1339)
16:16:49.448 [3760.24048] <2> EndpointSelector_R2::performCallUsingEndpoint: eps
r2: insecure invocation->perform_call(...) to host 127.0.0.1 FAILED(Endpoint_Sel
ector.cpp:1259)
16:16:49.448 [3760.24048] <2> Orb::connectToObjectReference: connection attempt
failed: system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0' OMG minor code (2
), described as '*unknown description*', completed = NO (OrbConnect.cpp:191)
16:16:49.448 [3760.24048] <8> Orb::connectToObjectRetries: Object was never init
ialized before the max timeout

 

 

NBARS Reports the following that seems to be the problem:

0,51216,321,362,257,1480322689658,7992,17060,0:,181:(48.20376) EnableScrollableQueries failed: SQL - native=<0> sqlerror=<[Microsoft][ODBC Driver Manager] Invalid cursor state> sqlstate=<24000> retval=<2007009>(DbConnection.cpp:2052),36:DbConnection::ExecuteScrollableQuery,1
0,51216,321,362,258,1480322689658,7992,17060,0:,181:(48.20376) EnableScrollableQueries failed: SQL - native=<0> sqlerror=<[Microsoft][ODBC Driver Manager] Invalid cursor state> sqlstate=<24000> retval=<2007009>(DbConnection.cpp:2052),36:DbConnection::ExecuteScrollableQuery,1
0,51216,137,362,3261581,1480322689674,7992,15240,0:,72:[vnet_addrinfo.c:1360] vnet_cached_getaddrinfo_and_update() failed 6 0x6,23:vnet_cached_getaddrinfo,1
0,51216,137,362,3261582,1480322689674,7992,15240,0:,78:[vnet_addrinfo.c:1087] getaddrinfo() failed RV=10109 NAME=ctscomdbd01 SVC=nbcs,26:retry_getaddrinfo_for_real,1
0,51216,137,362,3261583,1480322689674,7992,15240,0:,90:[vnet_addrinfo.c:922] retry_getaddrinfo_for_real failed RV=10109 NAME=ctscomdbd01 SVC=nbcs,17:retry_getaddrinfo,1
0,51216,137,362,3261584,1480322689674,7992,15240,0:,84:[vnet_addrinfo.c:1732] retry_getaddrinfo() failed RV=10109 NAME=ctscomdbd01 SVC=nbcs,34:vnet_cached_getaddrinfo_and_update,1
0,51216,137,362,3261585,1480322689674,7992,15240,0:,72:[vnet_addrinfo.c:1360] vnet_cached_getaddrinfo_and_update() failed 6 0x6,23:vnet_cached_getaddrinfo,1
0,51216,137,362,3261586,1480322689674,7992,15240,0:,110:[vnet_connect.c:1020] vnet_cached_getaddrinfo() failed, status=6, OS ret=10109, host=ctscomdbd01, service=nbcs,24:init_remote_connect_recs,1
0,51216,137,362,3261587,1480322689674,7992,15240,0:,60:[vnet_connect.c:939] init_remote_connect_recs() failed 6 0x6,17:init_connect_recs,1
0,51216,137,362,3261588,1480322689674,7992,15240,0:,65:[vnet_connect.c:951] Ignoring some failures. Success count 2 0x2,17:init_connect_recs,1
0,51216,137,362,3261589,1480322689674,7992,15240,0:,22:pbxConnectEx Succeeded,15:vnet_pbxConnect,1
0,51216,137,362,3261590,1480322689877,7992,15240,0:,51:[vnet_vnetd.c:1995] VN_REQUEST_SERVICE_SOCKET 6 0x6,25:vnet_vnetd_service_socket,1
0,51216,137,362,3261591,1480322689877,7992,15240,0:,32:[vnet_vnetd.c:2009] service nbcs,25:vnet_vnetd_service_socket,1

 

Seems to be a bug in NBARS. After restarting nbars or "Netbackup Agent Request Server". The gui and the command line is working again. I havent seen a fix for this in 7.7.3. But after the fix, the registration process is working as designed.

Thanks for all the help and patience guys!

New screenshot for those intrested.NBARS working as designed.NBARS working as designed.