03-08-2012 07:25 AM
Hey everyone I just recently started getting error 40 on my Media servers. The exact error is below.
Error bpbrm(pid=2572) could not write FILE ADDED message to stderr network connection broken(40)
The error seems to happen at around 2 hours of the job starts so I though there was a firewall issue connecting back to my media server. Our network team says the timeout for idle sockets is 14 hours so I don't believe that is the problem. I am currently rerunning the back with bpbrm logs enabled on the media server to see if I can get more details. Does anyone have any ideas as to what would cause this. Is there a timeout setting in NetBackup that I can change to help me fix this issue.
I am running NetBackup 6.5.5
My Master server is running Windows Server 2003 X64
My Media server is running Windows Server 2008 R2 and has 10Gb NIC's
03-08-2012 08:03 AM
Do you have CLIENT_READ_TIMEOUT set on that media server to 7200?
How many clients is this affecting? All of them or just some?
If all of them, I suspect a network issue between Media server and clients.
03-08-2012 08:04 AM
The idle sockets by default can be only 2 hours - in fact exactly 2 hours and I have seen that this usually has most effect on things like SQL backups
Try this on your Media Servers (needs a reboot to take effect)
In HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
Add a new DWORD named KeepAliveTime with a Decimal value of 510000
Add a new DWORD named KeepAliveInterval with a Decimal Value of 3
Hope this helps
03-08-2012 08:25 AM
good point by revaroo too
03-08-2012 09:51 PM
'could not write FILE ADDED message' looks to me like bpbrm is unable to add file list to bpdbm on the master.
Same error is described here (where I also shared personal experience):
https://www-secure.symantec.com/connect/forums/backup-failing-status-40
03-09-2012 04:54 AM
Thanks everyone for all the replies. I will make sure to add some of the suggestions. The problem just went away without me doing anything.
Thanks again.
03-09-2012 05:03 AM
You may have had blocked ports on your Master or Media Server if it just went away
Always worth adding these to them, both canhave the first key and just the Windows 2003 ones have the second key:
In HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
Create a new DWORD named TcpTimedWaitDelay with a Decimal Value of 30
Create a new DWORD named MaxUserPort with a Decimal Value 65534
For the Windows 2008 servers, rather than the MaxUSerPort key us an administrative command prompt to run:
netsh int ipv4 set dynamicport tcp start=10000 num=50000
Hope this also helps
04-02-2012 08:20 AM
The problem has started again and now seems to be worse then before. I have done a couple things to try and resolve this.
I realized that I was running version 6.5.5 on my media servers which was running Windows 2008 R2. I just found out that is not supported. I have upgraded my master and media servers to 6.5.6 which is supported on a media server running Windows 2008 R2.
When i upgraded my media servers I also upgraded the NIC cards to 10 Gb. Would this cause any problems? Are there any tests I can run to rule this out of the mix. I also saw were people were getting these errors when the master server is really busy. Is there anyway to test that?
I have a case open with Symantec where they had me run AppCritical. That test showed packet loss but I later found out that all AppCritical is doing is sending ICMP and UDP packets to all the Network devices along that path. Becuase they are doing policing these values are getting skewed and casuing innacurate results.
I did increase the CLIENT_READ_TIMES and so far that has not helped.
04-02-2012 09:07 AM
Is it still after 2 hours?
Is it file system or application backups?
Did you put in place the KeepAlive settings i listed previously?
04-02-2012 09:49 AM
I do not have this is place currently. I do have backups that run longer than 2 hours without any issues. It should like this fix is only if all my backups fail after 2 hours. Is that correct?
04-02-2012 01:31 PM
AppCritical is usally pretty accurate. I have not seen it report erroneous reports personally.
Is this just happening on one media server? If so, what happens when you send the client backups to other media servers. Do they work ok?
What type of backups are you performing on the clients?