10-21-2010 08:47 AM
Hi all,
As the title suggest, we have a DB backup that has been failing systematically. The file system backup runs on but when we try and backup MSSQL DB we get a status 2 on the parent job and status 25 and 58 on spawned default backup jobs.
I've tested communication from both master/media to from client using bpclntcmd (-hn / -ip) and all resolves correctly. bpgetconfig -M <client> -x returns correct information so I'm pretty sure it's not connection related.
Any ideas? Thanks in advance for any and all help.
Specs
Master: Windows Server 2003 with NB 6.5.5
Media: RHEL 5 with NB 6.5.5
Client: Windows Server 2003 with NB 6.5.5
Cheers,
PL
10-21-2010 09:03 AM
Is your SQL DB running in a cluster? Its almost certainly network, we just need to figure out where. Can you telnet to bpcd from the master > client, media > client, Client > media, client > master. Try with both name and IP.
Do you have more than one NIC/IP configured on the client?
10-21-2010 09:38 AM
Thanks Riaan for the quick reply.
The DB is not runnig in cluster. I've tried telnet from both master/ media <-> client using both name (hostname and FQDN) and IP and I can connect just fine.
As far as NICs go the client has just 1. Both master/media and client are going through a firewall directly with just 1 hop between them.
Like I said before, the filesystem backup runs just fine and I can access the client properties on the on the "Host Properties" view.
Cheers,
PL
10-21-2010 09:52 AM
Interesting, come to think about it, if the child jobs spawn it means that it connected, and then kicked of the default application streams.
Can you post the bpbrm logs from media server, bpcd and dbclient from the client. For a start....
10-21-2010 09:53 AM
Did this work, and then start failing, or did it never work?
10-21-2010 10:18 AM
Please post the following output from client:
bpclntcmd -pn
Please ensure you have bprd log on master. check log for incoming request from client.
10-21-2010 01:50 PM
From the client side:
connect to BAR gui and recheck your SQL login information -
10-21-2010 02:09 PM
the problem was the nblog.conf file corrupted in the master server. Verify this file.
10-21-2010 02:56 PM
@scerreti
just out of interest... were any modifications made to the nblog.conf file - (manually)
or were vxlog... commands run to perhaps change log settings...?
....trying to ascertain if this is a bug on NB7
10-22-2010 12:56 AM
To narrow your problem, if possible, create a test DB in your SQL and try to take the backup.
10-22-2010 03:17 AM
Hi all and thanks for the replies so far.
Here's a quick update on my progress up till now.
I'm not quite sure what happened over night but the filesystem backups that were previously working now give me a status 58 too. We've managed to see that after rebooting all works ok for a few days and then this behaviour creeps up.
@Riaan
Sorry can't get those logs for you because it's not connecting at all now and yesterday's logs were rotated out.
@Marianne
The output of bpclntcmd -pn on the client is:
expecting response from server ecbck01.echiron.com fcgssm01.gulbenkian.pt fcgssm01.gulbenkian.pt 10.11.11.60 4258and the bprd log on the
master bprd log shows:
09:05:49.067 [28932.29868] <2> logconnections: BPRD ACCEPT FROM 10.11.11.60.3047 TO 10.123.15.107.13724 09:05:49.067 [28932.29868] <2> process_request: setsockopt SO_LINGER on 416 succeeded. 09:05:49.067 [28932.29868] <2> connected_peer: Connection from host fcgssm01.gulbenkian.pt, 10.11.11.60, on non-reserved port 3047 09:05:49.067 [28932.29868] <2> db_valid_master_server: fcgssm01.gulbenkian.pt is not a valid server 09:05:49.176 [28932.29868] <2> process_request: command C_CLIENT_ID (45) received 09:05:49.192 [28932.29868] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006 09:05:49.192 [28932.29868] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpdbm 09:05:49.207 [28932.29868] <2> logconnections: BPDBM CONNECT FROM 10.123.15.107.2468 TO 10.123.15.107.13724 09:06:34.004 [18660.20680] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006 09:06:34.004 [18660.20680] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpdbm 09:06:34.004 [18660.20680] <2> logconnections: BPDBM CONNECT FROM 10.123.15.107.2549 TO 10.123.15.107.13724
@Scerreti
I'm supposing nblog.conf is OK otherwise other backups on this master would fail in a similar manner. I tried vxlogview, vxlogmr and all work as expected.
Trying bptestbpcd -verbose -debug -client fcgssm01 I get:
10:41:46.837 [30476.29600] <2> bptestbpcd: VERBOSE = 0 10:41:46.884 [30476.29600] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006 10:41:46.884 [30476.29600] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpcd 10:41:47.056 [30476.29600] <2> logconnections: BPCD CONNECT FROM 10.123.15.107.1871 TO 10.11.11.60.13724 10:41:47.071 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1872 TO 1 0.11.11.60.13724 fd = 1784 10:41:47.087 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746 10:41:47.087 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b 10:41:47.087 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b 10:41:47.087 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b 10:41:47.087 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b 10:41:47.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b 10:41:48.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1873 TO 1 0.11.11.60.13724 fd = 1784 10:41:48.087 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746 10:41:48.087 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b 10:41:48.087 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b 10:41:48.087 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b 10:41:48.087 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b 10:41:48.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b 10:41:49.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1874 TO 1 0.11.11.60.13724 fd = 1784 10:41:49.087 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746 10:41:49.087 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b 10:41:49.087 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b 10:41:49.087 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b 10:41:49.087 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b 10:41:49.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b 10:41:50.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1880 TO 1 0.11.11.60.13724 fd = 1784 10:41:50.149 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746 10:41:50.149 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b 10:41:50.149 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b 10:41:50.149 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b 10:41:50.149 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b 10:41:50.149 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b 10:41:50.149 [30476.29600] <2> vnet_connect_to_vnetd: vnet_vnetd.c.152: Function failed: 43 0x0000002b 10:41:50.149 [30476.29600] <2> nb_vnetd_connect: comm.c.1887: vnet_connect_to_vnetd failed: 43 10:41:50.149 [30476.29600] <2> bpcr_vnetd_connect_forward_socket_begin: nb_vnetd_connect(10.11.11.60) failed: 25 10:41:50.149 [30476.29600] <2> local_bpcr_connect: bpcr_vnetd_connect_forward_socket_begin failed: 25 10:41:50.149 [30476.29600] <2> ConnectToBPCD: bpcd_connect_and_verify(fcgssm01, fcgssm01) failed: 25 <16>bptestbpcd main: Function ConnectToBPCD(fcgssm01) failed: 25 10:41:50.149 [30476.29600] <16> bptestbpcd main: Function ConnectToBPCD(fcgssm01) failed: 25 <2>bptestbpcd: cannot connect on socket 10:41:50.149 [30476.29600] <2> bptestbpcd: cannot connect on socket <2>bptestbpcd: EXIT status = 25 10:41:50.149 [30476.29600] <2> bptestbpcd: EXIT status = 25
with bpcd log on client
10:41:46.622 [5856.3152] <2> bpcd main: offset to GMT 0 10:41:46.622 [5856.3152] <2> bpcd main: Got socket for input 456 10:41:46.622 [5856.3152] <2> logconnections: BPCD ACCEPT FROM 10.123.15.107.1871 TO 10.11.11.60.13724 10:41:46.622 [5856.3152] <2> bpcd main: setup_sockopts complete 10:41:46.622 [5856.3152] <2> bpcd peer_hostname: Connection from host ecbck01.echiron.com (10.123.15.107) port 1871 10:41:46.622 [5856.3152] <2> bpcd valid_server: comparing ecbck01.echiron.com and ecbck01.echiron.com 10:41:46.622 [5856.3152] <4> bpcd valid_server: hostname comparison succeeded 10:41:46.778 [5856.3152] <2> bpcd main: output socket port number = 1 10:41:49.653 [5856.3152] <2> get_long: (2) premature end of file (byte 1) 10:41:49.653 [5856.3152] <2> get_vnetd_forward_socket: get_string ipc_string failed: 5 10:41:49.653 [5856.3152] <16> bpcd main: get_vnetd_forward_socket failed: 23 10:48:00.666 [7012.7984] <2> bpcd main: offset to GMT 0 10:48:00.666 [7012.7984] <2> bpcd main: Got socket for input 460 10:48:00.666 [7012.7984] <2> logconnections: BPCD ACCEPT FROM 10.123.15.107.2294 TO 10.11.11.60.13724 10:48:00.666 [7012.7984] <2> bpcd main: setup_sockopts complete 10:48:00.682 [7012.7984] <2> bpcd peer_hostname: Connection from host ecbck01.echiron.com (10.123.15.107) port 2294 10:48:00.682 [7012.7984] <2> bpcd valid_server: comparing ecbck01.echiron.com and ecbck01.echiron.com 10:48:00.682 [7012.7984] <4> bpcd valid_server: hostname comparison succeeded 10:48:00.838 [7012.7984] <2> bpcd main: output socket port number = 1 10:48:03.651 [7012.7984] <2> get_long: (2) premature end of file (byte 1) 10:48:03.651 [7012.7984] <2> get_vnetd_forward_socket: get_string ipc_string failed: 5 10:48:03.651 [7012.7984] <16> bpcd main: get_vnetd_forward_socket failed: 23
Any ideas guys and gals? Cheers, PL
10-22-2010 03:18 AM
There are network connection errors - have a look at technote
http://www.symantec.com/docs/TECH45382
10-22-2010 04:27 AM
Hi Pal,
check following,
10-22-2010 05:26 AM
10:41:49.653 [5856.3152] <16> bpcd main: get_vnetd_forward_socket failed: 23
See the below link. It might help you
http://www.symantec.com/business/support/index?page=content&id=TECH128319
10-22-2010 06:29 AM
Hi all once again.
I managed to schedule some downtime at lunch and rebooted the client server. As I expected backups both filesystem and DB are working just fine now. If the current trend continues they will work fine for a few days and then just stop working with the errors stated so far.
I'm stumped I must admit. If it were a problem with communication a simple reboot wouldn't solve it so I'm guessing something more obscure is going on that I haven't managed to put my finger on it yet.
I'll keep you all update!
Thanks once again,
PL
10-22-2010 06:31 AM
No problem palobo. Hope it will be fixed :)
10-22-2010 06:47 AM
monitoring this client very closely over the next few days until it starts to fail as you envisage. Maybe resources getting exhausted for one reason or another (space? memory leak? nic errors?) until a point where NB can no longer perform? Don't just look at it from a NetBackup point of view, maybe there are other performance issues or failing components?
Just a suggestion....Good Luck!
10-22-2010 02:03 PM
I would agree with Andy - to monitor other resources as well -
Have a look through the event log in the interim and check if anything stands out.
10-22-2010 10:23 PM
The TN that Zahid posted yesterday is for a Linux client, but if one reads it carefully, it seems that a similar problem is experienced on this Windows client - 'something' was preventing socket setup or the client ran out of available socket connections. That's why the reboot worked.
This section in the TN seems very relevant (although there's no xinetd.conf on Windows):
********************************************
In this case, the xinetd.conf file on the client host was limiting the number of concurrent socket instances with other hosts. The default value of 60 had been lowered to a value of 2.
At least 3 concurrent sockets must be allowed per concurrent multiplex job.
User-directed jobs require an additional socket for the server-bound request and two additional sockets if progress file updates are requested.
Jobs associated with DB2/Informix/Oracle/SAP/SQL-Server/Sybase/Teradata/XBSA-DataStore may utilize up to 10 concurrent sockets per concurrent job.
*********************************************
If the same problem returns, check output of 'netstat -a'.
10-23-2010 10:59 AM
Had an issue with BMR restore previously, turned out to be a Windows problem after Symantec pointed to technote:
Network connectivity tuning to avoid network read/write failures and increase performance
http://www.symantec.com/docs/TECH60844