cancel
Showing results for 
Search instead for 
Did you mean: 

SQL Backups fail with status 2, 25 and 58. Filesystem backups OK

palobo
Level 3

Hi all,

As the title suggest, we have a DB backup that has been failing systematically. The file system backup runs on but when we try and backup MSSQL DB we get a status 2 on the parent job and status 25 and 58 on spawned default backup jobs.

I've tested communication from both master/media to from client using bpclntcmd (-hn / -ip) and all resolves correctly.  bpgetconfig -M <client> -x returns correct information so I'm pretty sure it's not connection related.

Any ideas? Thanks in advance for any and all help.


Specs

Master: Windows Server 2003 with NB 6.5.5
Media: RHEL 5 with NB 6.5.5
Client: Windows Server 2003 with NB 6.5.5

 

Cheers,
PL

23 REPLIES 23

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Is your SQL DB running in a cluster? Its almost certainly network, we just need to figure out where. Can you telnet to bpcd from the master > client, media > client, Client > media, client > master. Try with both name and IP.

 

Do you have more than one NIC/IP configured on the client?

palobo
Level 3

Thanks Riaan for the quick reply.

The DB is not runnig in cluster. I've tried telnet from both master/ media <-> client using both name (hostname and FQDN) and IP and I can connect just fine.

As far as NICs go the client has just 1. Both master/media and client are going through a firewall directly with just 1 hop between them. 

Like I said before, the filesystem backup runs just fine and I can access the client properties on the on the "Host Properties" view.

Cheers,
PL

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Interesting, come to think about it, if the child jobs spawn it means that it connected, and then kicked of the default application streams.

 

Can you post the bpbrm logs from media server, bpcd and dbclient from the client. For a start....

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Did this work, and then start failing, or did it never work?

Marianne
Level 6
Partner    VIP    Accredited Certified

Please post the following output from client:

bpclntcmd -pn

Please ensure you have bprd log on master. check log for incoming request from client.

MOHAMED_PATEL
Level 5
Partner Accredited Certified

From the client side:

connect to BAR gui and recheck your SQL login information -

scerreti
Level 4

the problem was the nblog.conf file corrupted in the master server. Verify this file.

MOHAMED_PATEL
Level 5
Partner Accredited Certified

@scerreti

just out of interest... were any modifications made to the nblog.conf file - (manually)

or were vxlog... commands run to perhaps change log settings...?

....trying to ascertain if this is a bug on NB7

Zahid_Haseeb
Moderator
Moderator
Partner    VIP    Accredited

To narrow your problem, if possible, create a test DB in your SQL and try to take the backup.

palobo
Level 3

Hi all and thanks for the replies so far.

Here's a quick update on my progress up till now. 

I'm not quite sure what happened over night but the filesystem backups that were previously working now give me a status 58 too. We've managed to see that after rebooting all works ok for a few days and then this behaviour creeps up.

@Riaan
Sorry can't get those logs for you because it's not connecting at all now and yesterday's logs were rotated out.

@Marianne
The output of bpclntcmd -pn on the client is:

 expecting response from server ecbck01.echiron.com
fcgssm01.gulbenkian.pt fcgssm01.gulbenkian.pt 10.11.11.60 4258and the bprd log on the 

master bprd log shows:

 09:05:49.067 [28932.29868] <2> logconnections: BPRD ACCEPT FROM 10.11.11.60.3047 TO 10.123.15.107.13724
09:05:49.067 [28932.29868] <2> process_request: setsockopt SO_LINGER on 416 succeeded.
09:05:49.067 [28932.29868] <2> connected_peer: Connection from host fcgssm01.gulbenkian.pt, 10.11.11.60, on non-reserved port 3047
09:05:49.067 [28932.29868] <2> db_valid_master_server: fcgssm01.gulbenkian.pt is not a valid server
09:05:49.176 [28932.29868] <2> process_request: command C_CLIENT_ID (45) received
09:05:49.192 [28932.29868] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
09:05:49.192 [28932.29868] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpdbm
09:05:49.207 [28932.29868] <2> logconnections: BPDBM CONNECT FROM 10.123.15.107.2468 TO 10.123.15.107.13724
09:06:34.004 [18660.20680] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
09:06:34.004 [18660.20680] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpdbm
09:06:34.004 [18660.20680] <2> logconnections: BPDBM CONNECT FROM 10.123.15.107.2549 TO 10.123.15.107.13724 

@Scerreti
I'm supposing nblog.conf is OK otherwise other backups on this master would fail in a similar manner. I tried vxlogview, vxlogmr and all work as expected.   

Trying bptestbpcd -verbose -debug -client fcgssm01 I get:

10:41:46.837 [30476.29600] <2> bptestbpcd: VERBOSE = 0
10:41:46.884 [30476.29600] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
10:41:46.884 [30476.29600] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpcd
10:41:47.056 [30476.29600] <2> logconnections: BPCD CONNECT FROM 10.123.15.107.1871 TO 10.11.11.60.13724
10:41:47.071 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1872 TO 1
0.11.11.60.13724 fd = 1784
10:41:47.087 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746
10:41:47.087 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b
10:41:47.087 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b
10:41:47.087 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b
10:41:47.087 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b
10:41:47.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b
10:41:48.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1873 TO 1
0.11.11.60.13724 fd = 1784
10:41:48.087 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746
10:41:48.087 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b
10:41:48.087 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b
10:41:48.087 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b
10:41:48.087 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b
10:41:48.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b
10:41:49.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1874 TO 1
0.11.11.60.13724 fd = 1784
10:41:49.087 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746
10:41:49.087 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b
10:41:49.087 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b
10:41:49.087 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b
10:41:49.087 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b
10:41:49.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b
10:41:50.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1880 TO 1
0.11.11.60.13724 fd = 1784
10:41:50.149 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746
10:41:50.149 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b
10:41:50.149 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b
10:41:50.149 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b
10:41:50.149 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b
10:41:50.149 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b
10:41:50.149 [30476.29600] <2> vnet_connect_to_vnetd: vnet_vnetd.c.152: Function failed: 43 0x0000002b
10:41:50.149 [30476.29600] <2> nb_vnetd_connect: comm.c.1887: vnet_connect_to_vnetd failed: 43
10:41:50.149 [30476.29600] <2> bpcr_vnetd_connect_forward_socket_begin: nb_vnetd_connect(10.11.11.60) failed: 25
10:41:50.149 [30476.29600] <2> local_bpcr_connect: bpcr_vnetd_connect_forward_socket_begin failed: 25
10:41:50.149 [30476.29600] <2> ConnectToBPCD: bpcd_connect_and_verify(fcgssm01, fcgssm01) failed: 25
<16>bptestbpcd main: Function ConnectToBPCD(fcgssm01) failed: 25
10:41:50.149 [30476.29600] <16> bptestbpcd main: Function ConnectToBPCD(fcgssm01) failed: 25
<2>bptestbpcd: cannot connect on socket
10:41:50.149 [30476.29600] <2> bptestbpcd: cannot connect on socket
<2>bptestbpcd: EXIT status = 25
10:41:50.149 [30476.29600] <2> bptestbpcd: EXIT status = 25

with bpcd log on client

10:41:46.622 [5856.3152] <2> bpcd main: offset to GMT 0
 10:41:46.622 [5856.3152] <2> bpcd main: Got socket for input 456
10:41:46.622 [5856.3152] <2> logconnections: BPCD ACCEPT FROM 10.123.15.107.1871 TO 10.11.11.60.13724
10:41:46.622 [5856.3152] <2> bpcd main: setup_sockopts complete
10:41:46.622 [5856.3152] <2> bpcd peer_hostname: Connection from host ecbck01.echiron.com (10.123.15.107) port 1871
10:41:46.622 [5856.3152] <2> bpcd valid_server: comparing ecbck01.echiron.com and ecbck01.echiron.com
10:41:46.622 [5856.3152] <4> bpcd valid_server: hostname comparison succeeded
10:41:46.778 [5856.3152] <2> bpcd main: output socket port number = 1
10:41:49.653 [5856.3152] <2> get_long: (2) premature end of file (byte 1)
10:41:49.653 [5856.3152] <2> get_vnetd_forward_socket: get_string ipc_string failed: 5
10:41:49.653 [5856.3152] <16> bpcd main: get_vnetd_forward_socket failed: 23
10:48:00.666 [7012.7984] <2> bpcd main: offset to GMT 0
10:48:00.666 [7012.7984] <2> bpcd main: Got socket for input 460
10:48:00.666 [7012.7984] <2> logconnections: BPCD ACCEPT FROM 10.123.15.107.2294 TO 10.11.11.60.13724
10:48:00.666 [7012.7984] <2> bpcd main: setup_sockopts complete
10:48:00.682 [7012.7984] <2> bpcd peer_hostname: Connection from host ecbck01.echiron.com (10.123.15.107) port 2294
10:48:00.682 [7012.7984] <2> bpcd valid_server: comparing ecbck01.echiron.com and ecbck01.echiron.com
10:48:00.682 [7012.7984] <4> bpcd valid_server: hostname comparison succeeded
10:48:00.838 [7012.7984] <2> bpcd main: output socket port number = 1
10:48:03.651 [7012.7984] <2> get_long: (2) premature end of file (byte 1)
10:48:03.651 [7012.7984] <2> get_vnetd_forward_socket: get_string ipc_string failed: 5
10:48:03.651 [7012.7984] <16> bpcd main: get_vnetd_forward_socket failed: 23 

Any ideas guys and gals?   Cheers, PL

MOHAMED_PATEL
Level 5
Partner Accredited Certified

There are network connection errors - have a look at technote

http://www.symantec.com/docs/TECH45382

 

10:41:50.149 [30476.29600] <16> bptestbpcd main: Function ConnectToBPCD(fcgssm01) failed: 25
<2>bptestbpcd: cannot connect on socket
10:41:50.149 [30476.29600] <2> bptestbpcd: cannot connect on socket
<2>bptestbpcd: EXIT status = 25
10:41:50.149 [30476.29600] <2> bptestbpcd: EXIT status = 25

Venkatesh_K
Level 5

Hi Pal,

check following,

  • verify client services are running on client, try restarting them again and test backup.
  • check for port bpcd and vnet open on client and ur able to connect on these ports
  • from master execute
  •               telnet client bpcd
  •               telnet client vnetd
  • try if ur able to resolve the hostname and IP
  •            
  • from master
  •  
  •         bpclntcmd -hn client
  •         bpclntcmd -ip <ip address of client>
    from client
  •         bpclntcmd –pn
  •         bpclntcmd -hn master
  •         bpclntcmd -ip <ip address of master>

  •  
  • to verify bpcd and vnetd are open run following from clienty
  • Netstat –a | grep bpcd                      
    Netstat –a | grep vnetd
  •  

 

Zahid_Haseeb
Moderator
Moderator
Partner    VIP    Accredited

 

10:41:49.653 [5856.3152] <16> bpcd main: get_vnetd_forward_socket failed: 23

 

 

 

See the below link. It might help you

http://www.symantec.com/business/support/index?page=content&id=TECH128319

palobo
Level 3

Hi all once again.

I managed to schedule some downtime at lunch and rebooted the client server. As I expected backups both filesystem and DB are working just fine now. If the current trend continues they will work fine for a few days and then just stop working with the errors stated so far.

 

I'm stumped I must admit. If it were a problem with communication a simple reboot wouldn't solve it so I'm guessing something more obscure is going on that I haven't managed to put my finger on it yet.

I'll keep you all update!

 

Thanks once again,
PL

Zahid_Haseeb
Moderator
Moderator
Partner    VIP    Accredited

No problem palobo. Hope it will be fixed :)

Andy_Welburn
Level 6

monitoring this client very closely over the next few days until it starts to fail as you envisage. Maybe resources getting exhausted for one reason or another (space? memory leak? nic errors?) until a point where NB can no longer perform? Don't just look at it from a NetBackup point of view, maybe there are other performance issues or failing components?

Just a suggestion....Good Luck!

MOHAMED_PATEL
Level 5
Partner Accredited Certified

I would agree with Andy - to monitor other resources as well -

Have a look through the event log in the interim and check if anything stands out.

Marianne
Level 6
Partner    VIP    Accredited Certified

The TN that Zahid posted yesterday is for a Linux client, but if one reads it carefully, it seems that a similar problem is experienced on this Windows client - 'something' was preventing socket setup or the client ran out of available socket connections. That's why the reboot worked.

This section in the TN seems very relevant (although there's no xinetd.conf on Windows):

********************************************

In this case, the xinetd.conf file on the client host was limiting the number of concurrent socket instances with other hosts.  The default value of 60 had been lowered to a value of 2.  

At least 3 concurrent sockets must be allowed per concurrent multiplex job.  

User-directed jobs require an additional socket for the server-bound request and two additional sockets if progress file updates are requested.

Jobs associated with DB2/Informix/Oracle/SAP/SQL-Server/Sybase/Teradata/XBSA-DataStore may utilize up to 10 concurrent sockets per concurrent job.

*********************************************

If the same problem returns, check output of 'netstat -a'.

MOHAMED_PATEL
Level 5
Partner Accredited Certified

Had an issue with BMR restore previously, turned out to be a Windows problem after Symantec pointed to technote:

Network connectivity tuning to avoid network read/write failures and increase performance

http://www.symantec.com/docs/TECH60844