cancel
Showing results for 
Search instead for 
Did you mean: 

<16> dtcp_read: TCP - failure: recv socket (532) (TCP 10053: Software caused connection abort)

captain_harlock
Level 4

We`ve  recently had a problem with our NetBackup server. Server version is 2012 r2, NetBackup version is 8.1.1, build 0103, SQL server - windows 2008R2.
The backup is configured as follows: once a week - full backup and every day - differential. For some time now, the differential has dropped out with the following error:
03:20:51.501 [11548.9348] <16> dtcp_read: TCP - failure: recv socket (532) (TCP 10053: Software caused connection abort)
03:20:51.501 [11548.9348] <16> dtcp_read: TCP - failure: recv socket (528) (TCP 10058: Can't send after socket shutdown)
First, after a full backup was performed once a week, a differential backup was made for a couple of days, and then an error appeared and up to a full backup, the differential was not performed with just such an error.
And now, for a couple of weeks, even after a full backup, the differential has stopped performing altogether.
Since I am newbe to servicing this kind of servers, in 2 weeks I have read many publications on similar topics here and on other sites, but the proposed solutions did not help.
If you need bptm, bpkar or others logs, I'm ready to post them here, and it would be much appratiated if I am told what parts of logs are needed, because they are way too big.
Again, I'm a beginner, therefore, in order not to clutter up the forum with long posts, I ask you to help and tell what logs are needed for analysis. Log level was set  to max - 5.
The bundle is standard - NetBackup + SQL. NetBackup version 8.1.1 and SQL - 2008R2.
The errors are the same in the logs on the SQL server and on the Netbackup server.
I no longer know what to do.
Please advise.

43 REPLIES 43

Nicolai
Moderator
Moderator
Partner    VIP   

Have you set Netbackup TCP_KEEP_ALIVE setting on both client master and media server ?

https://www.veritas.com/support/en_US/doc/18716246-126559472-0/v40569182-126559472

Start with a value of 1800, if that doesn't work try 3600

Do you have a firewall between Netbackup and the client ?

Thank you for your quick responce )

I ve already increased both timeouts to 600 seconds from 300 sec default initially.

I will try to increase up to 1800 and let you know tomorrow.

No, there is no firewall at all. More to say - other backups (including Full one on that particular server) are done quite well on the daily basis...

 

Nicolai
Moderator
Moderator
Partner    VIP   

hi @captain_harlock 

From my experience, neither 300 or  600 seconds is sufficient for SQL backup.

If CLIENT_READ_TIMEOUT of 1800 doesn't work, do test with 3600. 

Please be aware, it will take longer time before backup fails, with regards to missing client responses, when client read timeout is increased.

Marianne
Level 6
Partner    VIP    Accredited Certified

@captain_harlock 

My experience with NBU logging is that level 5 takes up a lot of space and is difficult to read.

I only ever enable level 5 logs when a Support call is logged (and even then will I negotiate a lower level).
Level 3 logs are 99.9 % sufficient to troubleshoot problems.

Please note that NBU legacy logs do not exist by default - the log folders need to be created under <install-path>\netbackup\logs.

Logs that will be useful are bptm and bpbrm on the media server (level 3).
bpbrm will log connections and also metadata received from SQL client and sent to bpdbm on the master.
bptm will log each data buffer received from the client and updates to bpbrm.
These 2 logs are important to see how often data (and metadata) is received from the SQL client and if timeouts are applied due to no data received from client within the timeout period.

On the client, we need level 3 dbclient log in the netbackup\logs folder.
SQL logs that may be useful are SQL ERRORLOG and the VDI log. Ask your SQL dba for assistance.

If you need assistance to look at logs, please copy them to .txt files (e.g. bpbrm.txt, etc) and upload as attachments.

Dear Marianne, thank you for your responce.

Yes, i have created folders for bptm and bpbkar logs as it was instructed in veritas documentation.

As for logs level they advise to rise them to 5, so i did it...

The main problem is that the error is not, how to say, constant. Its floating. It may be today and no more tomorrow, and yet it appears again in two days.

I did as you said - created dbclient folder in the NetBackup logs directory on the client system. Thank you for your advise.

The next diff backup is at tomorrow night, so if it fails again, i will attach required logs in my next post here.

Should i change log level back to 3?

As for SQL servers i manage them in the office )) so there are no corresponding errors that should be mentioned in here.

Marianne
Level 6
Partner    VIP    Accredited Certified

@captain_harlock 

You only need bpbkar log on a client for file-level backup or if your are performing off-host snapshot backups using NBU Snapshot Client feature. Or if this a VMware backup where SQL databases is included as part of VM backup.
For 'regular' agent-based backups like SQL, the dbclient process will log backup activity on the client.
For SQL Intelligent Policy, there are probably other logs that can be enabled, but your issue does not seem to be with initial job setup, rather with backup processing on the SQL client after the backup has started.

On a media server, you need bpbrm and bptm logs.
(You can find process flow diagrams in the Logging Reference Guide (unfortunately no diagrams for SQL) - links to manuals in Handy NBU links in my signature.)

I know Veritas normally asks for level 5 logs, but then you send/upload them to the Support portal.
Veritas Support staff have tools to analyse these massive logs.
Trust me - 80% of issues can be traced with level 0 logs.
I have not in my 20+ years with NBU came across any issue with SQL backup that we could not troubleshoot with level 3 logs.
I have only once experienced a serious problem where Support could only identify the problem in level 5 logs.

If you plan to log a call with Support, then please keep the logs at level 5.

If you want forum members to assist, please drop the logging level to 3 on the media server and the client.

The fact that the problem is intermittent, says to me that 'something else' is happening on the SQL client that is causing SQL and NBU processing to timeout. So, NBU logs may only tell you that a timeout occurred, not what caused it.
Event viewer logs, SQL Errorlog and/or SQL VDI logs may help.
Ask server owner, SQL dba and even the network team to monitor the SQL server for a week or so.
It may even be helpful to monitor processes on the SQL server using Task Manager while the backup is running.
(In my early years with 1st-line customer support, I have often done this - sitting at night at a customer site, staring at NBU server and client screens!)

 

Thanks again for your efforts to help me ))

Here is another error that could help:

03:20:32.647 [18532.15008] <16> non_mpx_backup_archive_verify_import: from client srvdfs2: ERR - Can't open object. Aborting backup: Enterprise Vault Resources:\EV Site (plant (srvSQL1))\EV Vault Store Group (plantStore)\FingerPrint Databases\FingerPrint DB (srvSQL1/EVVSGEVStore_1_1)\EVVSGEVStore_1_1 (BEDS 0xE000FEA9: The Backup Exec data store encountered a problem during the operation. See the job log for details.).

so where should i move next?

ps: dropped the logging level to 3 )

thank you, mam, for your kind advice )

pps: sql server event logs are clean. And i mean really clean - only sccm error messages, but that because it needs restart before applying updates...

Marianne
Level 6
Partner    VIP    Accredited Certified

Wait!

What exactly are we troubleshooting here - MS-SQL or EV backup?
What is the backup policy type?
MS-SQL?
Enterprise Vault?

Troubleshooting steps are different for these 2 policy types because there is more than one server involved in the backup process for EV - there is the SQL server and the EV server.

I think you should start from scratch and tell us as much as possible about the environment -
EV version?
SQL version?
What is the Policy type?
What is in the Backup Selection?

If EV - have you configured EV backup as per the best practice documentation?
Do you have a copy of the NBU for EV Admin Guide?

NB host Server version is 2012 r2, NetBackup version is 8.1.1, build 0103.

Microsoft SQL Server 2008 R2 (SP3-GDR) (KB4057113) - 10.50.6560.0 (X64) Dec 28 2017 15:03:48 Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor) (hosted on windows 2008R2).

as for EV server - Enterprise Vault, Symantec Corporation, Version: 10.0.0.1316 (runs on windows server 2008R2).

What is Policy type and Backup selection, sorry, i cannot quite follow you...

Unfortunately that wasnt me who deployed and tuned these servers, so i am not sure...

But NB backs up EV through the SQL DB involved...

Marianne
Level 6
Partner    VIP    Accredited Certified

The last error showed an Enterprise Vault path.
Can you please post ALL text in the Job Details of the failing backup?

Please also show us the policy config as this will show us all relevant info (Policy type, Backup selection, Schedules, etc) .
Firstly, get the exact name for the policy in the GUI.
On the master server, open cmd, cd to <install-path>\netbackup\bin\admincmd and run this command:

bppllist <policy-name> -U
(Please note that NBU is case sensitive)
Copy the text output and post here.

Errr... it seems that it wouldnt allow me to post the output..

so i attached it )

Marianne
Level 6
Partner    VIP    Accredited Certified

@captain_harlock wrote:


....But NB backs up EV through the SQL DB involved...


@captain_harlock 

Actually, it works the other way round. If the EV policy is configured correctly, the SQL database will be backed up along with EV.

Also EV10 is VERY old - the EOSL page does not even list versions older than 11.0:
https://sort.veritas.com/eosl

@Marianne 
I wouldnt argue with you, mam ) 

This version is old, indeed, but this is all i have and i have to live with that.

May be I can upload here something else to clear the problem?

It worked just fine till the last month...((

Marianne
Level 6
Partner    VIP    Accredited Certified

To troubleshoot right now, you will still need bptm and bpbrm logs on the media server.

EV policy work different to SQL policy, so, you will indeed need bpbkar log folder on the SQL and the EV server.
(No need for dbclient since this is not MS-SQL policy).
My other advice to monitor activity and Event Viewer logs on SQL and EV servers still stands.

In the meantime, please download the NBU for Enterprise Vault manual.
Read through this manual to understand components and configuration.

There is also this TN that you can use to compare your policies against the best practice advice:
https://www.veritas.com/support/en_US/article.100007570

@Marianne 

should i upload all the required logs ?

I will check the configuration according to the article you `ve advised, of course, thank you.

 

Marianne
Level 6
Partner    VIP    Accredited Certified

You may upload the logs.
If time permits, someone will try and assist.
(All of us assisting here are doing so inbetween our fulltime jobs.)

When you upload logs, be sure to include all text in Job Details of the failed job.

Job Details is important because it gives us PIDs and timestamps to trace in the relevant logs.
Please have a look at my post over here about reading logs:
https://vox.veritas.com/t5/NetBackup/How-to-use-logs/m-p/447331#M98462

*** PS:

NBU logs will still only give us the NBU perspective.

You say everything worked until a month ago.
My guess is that NOTHING changed on the NBU side, right?
You now need to do some investigation to find out what is different.

Were any OS and/or AntiVirus updates installed on the NBU server, or the EV server or the SQL server?

Is everything on EV working as expected?
Were logs truncated the last time a Diff Inc backup ran successfully?
Have you checked EV logs and/or Event Viewer for issues?
All okay on the SQL server?
Any errors in Event logs?
Any space issues on any of the servers?

@Nicolai 

@Marianne 

Hello again

Well, increasing the time intervals gave an unexpected result:

Dec 14, 2020 11:40:22 AM backup1.Hyundai.local Warning 0 General failed to connect to bpjobd, status=25 (cannot connect on socket)
 and this error continued to appear since the time i changed timings to today when i changed them back to 600s.

I am not sure...

Please advice

PS: as a result of all of these there are no fresh logs available, as there were no attempts to back something up...

PPS: well. your advise bring bad luck to me...  I got another error : Dec 15, 2020 11:43:54 AM bachkup1.hyundai.local dfs1.hyundai.local Error 0 requestor dfs1.hyundai.local is not a valid server for query 223

Marianne
Level 6
Partner    VIP    Accredited Certified

I have honestly never seen that increasing timeouts caused this error.

Did ALL backups fail with this error?
Did backups run successful when you changed the timeouts back?

Something else seems to be wrong on the master server. Almost if bpjobd is not running... or OS on the server running out of TCP ports...

Can you run 'bpps' from cmd on the master server and post output?
(command is in ...\netbackup\bin)

What do you see in 'netstat -a' output?
(no need to post output)

Well. when i returned figures back to 600s backups started, and everything seems to run normally now, but this could cost me my job position.((( because backups wouldnt run for almost a day...

Yeah, all the backups didnt run...


C:\Program Files\Veritas\NetBackup\bin>bpps
* RS1BKMROA1 12/15/20 12:24:35.333
COMMAND PID LOAD TIME MEM START
nbcssc 1196 0.000% 7.687 27M 11/20/20 20:44:49.069
bpcompatd 1300 0.000% 40.343 14M 11/20/20 20:44:50.535
dbsrv16 1404 0.000% 54:24.421 84M 11/20/20 20:44:50.941
dbsrv16 1516 0.000% 5:46:16.984 240M 11/20/20 20:44:51.554
nbdisco 2004 0.000% 30:08.468 36M 11/20/20 20:44:55.090
nbevtmgr 1352 0.000% 2:09.671 29M 11/20/20 20:44:56.082
vnetd 2184 0.000% 17.828 9.7M 11/20/20 20:44:58.624
vnetd 2212 0.000% 2:33.015 27M 11/20/20 20:44:58.751
vnetd 2220 0.000% 12:47.671 34M 11/20/20 20:44:58.813
vnetd 2228 0.000% 44.187 18M 11/20/20 20:44:58.813
nbrmms 2236 0.000% 3:06.046 33M 11/20/20 20:44:58.829
nbsl 2528 3.068% 34:39.265 58M 11/20/20 20:44:59.489
nbstserv 2656 0.000% 2:49.890 34M 11/20/20 20:45:00.196
nbemm 3952 0.000% 10:57.546 59M 11/20/20 20:45:12.080
nbim 1724 0.000% 14.328 26M 11/20/20 20:45:12.534
nbproxy 7880 0.000% 19.609 22M 11/20/20 20:46:34.831
nbproxy 7988 0.000% 19.593 23M 11/20/20 20:46:35.818
nbproxy 7852 0.000% 1:17.718 23M 11/20/20 20:46:42.039
nbproxy 6184 0.000% 15.093 21M 11/20/20 20:46:45.436
nbproxy 7448 0.000% 15.546 20M 11/20/20 20:46:46.340
bpjava-susvc 8216 0.000% 0.187 16M 11/20/20 20:49:37.110
bpjava-susvc 3020 0.000% 0.078 15M 11/20/20 20:49:37.393
bpjava-susvc 8184 0.000% 23.328 19M 11/20/20 20:49:37.674
bpjava-susvc 7652 0.000% 0.125 16M 11/20/20 20:49:47.394
nbproxy 7624 0.000% 12.234 23M 11/20/20 20:49:54.536
nbproxy 8508 0.000% 12.781 22M 11/20/20 20:49:54.855
bpjava-susvc 19800 0.000% 0.171 17M 12/11/20 14:05:18.031
bpjava-susvc 14348 0.000% 0.062 15M 12/11/20 14:05:18.266
bpjava-susvc 20912 0.000% 0.218 16M 12/11/20 14:05:18.493
nbproxy 3716 0.000% 0.500 23M 12/15/20 10:57:47.275
nbrb 21280 0.000% 0.890 28M 12/15/20 11:07:56.779
nbsvcmon 19472 0.000% 0.625 16M 12/15/20 11:08:02.177
nbwmc 9884 0.000% 0.062 6.6M 12/15/20 11:08:27.393
nbars 19864 0.000% 7.750 37M 12/15/20 11:20:59.094
nbaudit 19300 0.000% 0.765 27M 12/15/20 11:21:04.174
nbatd 20704 0.000% 0.671 13M 12/15/20 11:21:08.878
bpinetd 18460 0.000% 0.625 9.0M 12/15/20 11:22:47.429
bpcd 19352 0.000% 0.375 13M 12/15/20 11:22:48.639
bpdbm 18240 0.000% 0.187 17M 12/15/20 11:22:49.871
bpjobd 10160 0.000% 2.281 25M 12/15/20 11:22:49.935
nbjm 5356 0.000% 1.343 30M 12/15/20 11:22:51.080
nbproxy 14344 0.000% 0.234 22M 12/15/20 11:22:51.993
nbpem 5676 0.000% 2.437 33M 12/15/20 11:22:52.305
nbproxy 4016 0.000% 0.421 24M 12/15/20 11:22:52.798
bpdbm 4020 0.000% 0.328 21M 12/15/20 11:22:53.428
bprd 10948 0.000% 0.781 23M 12/15/20 11:22:53.521
vmd 19648 0.000% 0.171 17M 12/15/20 11:22:54.751
ltid 5212 0.000% 0.265 19M 12/15/20 11:22:55.962
tldd 16580 0.000% 0.078 14M 12/15/20 11:22:56.478
avrd 20316 0.000% 0.078 14M 12/15/20 11:22:56.478
tldcd 18428 0.000% 0.203 16M 12/15/20 11:22:57.655
bpjava-susvc 20732 0.000% 0.093 17M 12/15/20 11:23:19.454
bpjava-susvc 19888 0.000% 0.078 15M 12/15/20 11:23:19.689
bpjava-susvc 4008 0.000% 0.578 19M 12/15/20 11:23:19.924
nbproxy 18732 0.000% 0.250 20M 12/15/20 11:52:53.763

nbrmms 2236 0.000% 3:06.046 33M 11/20/20 20:44:58.829
nbsl 2528 3.068% 34:39.265 58M 11/20/20 20:44:59.489
nbstserv 2656 0.000% 2:49.890 34M 11/20/20 20:45:00.196
nbemm 3952 0.000% 10:57.546 59M 11/20/20 20:45:12.080
nbim 1724 0.000% 14.328 26M 11/20/20 20:45:12.534
nbproxy 7880 0.000% 19.609 22M 11/20/20 20:46:34.831
nbproxy 7988 0.000% 19.593 23M 11/20/20 20:46:35.818
nbproxy 7852 0.000% 1:17.718 23M 11/20/20 20:46:42.039
nbproxy 6184 0.000% 15.093 21M 11/20/20 20:46:45.436
nbproxy 7448 0.000% 15.546 20M 11/20/20 20:46:46.340
bpjava-susvc 8216 0.000% 0.187 16M 11/20/20 20:49:37.110
bpjava-susvc 3020 0.000% 0.078 15M 11/20/20 20:49:37.393
bpjava-susvc 8184 0.000% 23.328 19M 11/20/20 20:49:37.674
bpjava-susvc 7652 0.000% 0.125 16M 11/20/20 20:49:47.394
nbproxy 7624 0.000% 12.234 23M 11/20/20 20:49:54.536
nbproxy 8508 0.000% 12.781 22M 11/20/20 20:49:54.855
bpjava-susvc 19800 0.000% 0.171 17M 12/11/20 14:05:18.031
bpjava-susvc 14348 0.000% 0.062 15M 12/11/20 14:05:18.266
bpjava-susvc 20912 0.000% 0.218 16M 12/11/20 14:05:18.493
nbproxy 3716 0.000% 0.500 23M 12/15/20 10:57:47.275
nbrb 21280 0.000% 0.890 28M 12/15/20 11:07:56.779
nbsvcmon 19472 0.000% 0.625 16M 12/15/20 11:08:02.177
nbwmc 9884 0.000% 0.062 6.6M 12/15/20 11:08:27.393
nbars 19864 0.000% 7.750 37M 12/15/20 11:20:59.094
nbaudit 19300 0.000% 0.765 27M 12/15/20 11:21:04.174
nbatd 20704 0.000% 0.671 13M 12/15/20 11:21:08.878
bpinetd 18460 0.000% 0.625 9.0M 12/15/20 11:22:47.429
bpcd 19352 0.000% 0.375 13M 12/15/20 11:22:48.639
bpdbm 18240 0.000% 0.187 17M 12/15/20 11:22:49.871
bpjobd 10160 0.000% 2.281 25M 12/15/20 11:22:49.935
nbjm 5356 0.000% 1.343 30M 12/15/20 11:22:51.080
nbproxy 14344 0.000% 0.234 22M 12/15/20 11:22:51.993
nbpem 5676 0.000% 2.437 33M 12/15/20 11:22:52.305
nbproxy 4016 0.000% 0.421 24M 12/15/20 11:22:52.798
bpdbm 4020 0.000% 0.328 21M 12/15/20 11:22:53.428
bprd 10948 0.000% 0.781 23M 12/15/20 11:22:53.521
vmd 19648 0.000% 0.171 17M 12/15/20 11:22:54.751
ltid 5212 0.000% 0.265 19M 12/15/20 11:22:55.962
tldd 16580 0.000% 0.078 14M 12/15/20 11:22:56.478
avrd 20316 0.000% 0.078 14M 12/15/20 11:22:56.478
tldcd 18428 0.000% 0.203 16M 12/15/20 11:22:57.655
bpjava-susvc 20732 0.000% 0.093 17M 12/15/20 11:23:19.454
bpjava-susvc 19888 0.000% 0.078 15M 12/15/20 11:23:19.689
bpjava-susvc 4008 0.000% 0.578 19M 12/15/20 11:23:19.924
nbproxy 18732 0.000% 0.250 20M 12/15/20 11:52:53.763
bpjava-susvc 3288 0.000% 1.609 19M 12/15/20 11:54:13.565
nbproxy 12816 0.000% 0.296 22M 12/15/20 11:55:26.580
nbproxy 18916 0.000% 0.468 21M 12/15/20 12:00:52.259
bpbrm 17824 1.534% 0.593 20M 12/15/20 12:02:54.262
bptm 3712 0.000% 0.296 25M 12/15/20 12:03:00.123
bptm 8520 6.135% 1:00.625 19M 12/15/20 12:03:00.435
nbtelesched 17980 0.000% 0.000 5.4M 12/15/20 12:22:52.463
nbtelesched 17380 0.000% 1.281 54M 12/15/20 12:22:52.478
nbtelemetry 4332 0.000% 0.015 7.2M 12/15/20 12:22:54.214
nbtelemetry 18988 96.632% 1:29.812 471M 12/15/20 12:22:54.230
bpps 15620 0.000% 0.078 7.2M 12/15/20 12:24:34.237