cancel
Showing results for 
Search instead for 
Did you mean: 

Error 40 on my media server

kproehl
Level 5

Hey everyone I just recently started getting error 40 on my Media servers.  The exact error is below.

Error bpbrm(pid=2572) could not write FILE ADDED message to stderr network connection broken(40)

The error seems to happen at around 2 hours of the job starts so I though there was a firewall issue connecting back to my media server.  Our network team says the timeout for idle sockets is 14 hours so I don't believe that is the problem.  I am currently rerunning the back with bpbrm logs enabled on the media server to see if I can get more details.  Does anyone have any ideas as to what would cause this.  Is there a timeout setting in NetBackup that I can change to help me fix this issue.

 

I am running NetBackup 6.5.5

My Master server is running Windows Server 2003 X64

My Media server is running Windows Server 2008 R2 and has 10Gb NIC's

10 REPLIES 10

revarooo
Level 6
Employee

Do you have CLIENT_READ_TIMEOUT set on that media server to 7200?

How many clients is this affecting? All of them or just some?

If all of them, I suspect a network issue between Media server and clients.

Mark_Solutions
Level 6
Partner Accredited Certified

The idle sockets by default can be only 2 hours - in fact exactly 2 hours and I have seen that this usually has most effect on things like SQL backups

Try this on your Media Servers (needs a reboot to take effect)

In HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\

Add a new DWORD named KeepAliveTime with a Decimal value of 510000

Add a new DWORD named KeepAliveInterval with a Decimal Value of 3

Hope this helps

Mark_Solutions
Level 6
Partner Accredited Certified

good point by revaroo too yes

Marianne
Level 6
Partner    VIP    Accredited Certified

'could not write FILE ADDED message' looks to me like bpbrm is unable to add file list to bpdbm on the master.

Same error is described here (where I also shared personal experience):

https://www-secure.symantec.com/connect/forums/backup-failing-status-40

kproehl
Level 5

Thanks everyone for all the replies.  I will make sure to add some of the suggestions.  The problem just went away without me doing anything.  

 

Thanks again.

Mark_Solutions
Level 6
Partner Accredited Certified

You may have had blocked ports on your Master or Media Server if it just went away

Always worth adding these to them, both canhave the first key and just the Windows 2003 ones have the second key:

In HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\

Create a new DWORD named TcpTimedWaitDelay with a Decimal Value of 30

Create a new DWORD named MaxUserPort with a Decimal Value 65534

For the Windows 2008 servers, rather than the MaxUSerPort key us an administrative command prompt to run:

netsh int ipv4 set dynamicport tcp start=10000 num=50000

Hope this also helps

kproehl
Level 5

The problem has started again and now seems to be worse then before.  I have done a couple things to try and resolve this.

I realized that I was running version 6.5.5 on my media servers which was running Windows 2008 R2.  I just found out that is not supported.  I have upgraded my master and media servers to 6.5.6 which is supported on a media server running Windows 2008 R2.

 

When i upgraded my media servers I also upgraded the NIC cards to 10 Gb.  Would this cause any problems?  Are there any tests I can run to rule this out of the mix.  I also saw were people were getting these errors when the master server is really busy.  Is there anyway to test that?  

 

I have a case open with Symantec where they had me run AppCritical.  That test showed packet loss but I later found out that all AppCritical is doing is sending ICMP and UDP packets to all the Network devices along that path.  Becuase they are doing policing these values are getting skewed and casuing innacurate results.

I did increase  the CLIENT_READ_TIMES and so far that has not helped.

Mark_Solutions
Level 6
Partner Accredited Certified

Is it still after 2 hours?

Is it file system or application backups?

Did you put in place the KeepAlive settings i listed previously?

kproehl
Level 5

I do not have this is place currently.  I do have backups that run longer than 2 hours without any issues.  It should like this fix is only if all my backups fail after 2 hours.  Is that correct?

revarooo
Level 6
Employee

AppCritical is usally pretty accurate. I have not seen it report erroneous reports personally.

Is this just happening on one media server? If so, what happens when you send the client backups to other media servers. Do they work ok?

What type of backups are you performing on the clients?