cancel
Showing results for 
Search instead for 
Did you mean: 

Media server losing connection randomly

panthony
Level 4
Partner

Hello,

We are using Netbackup 7.6.0.2 with 2 media servers, all of them are on a windows 2008 R2 server.

Randomly, one of the media server lose his connection (always the same), and all the backups fails with error "disk media server is not active". For example, we had this problem at the beggining of this month, then nothing until the last week end and yesterday.

I've read that McAfee could cause this issue (relative to this technote https://support.symantec.com/en_US/article.TECH56658.html), but as I said the two media servers are the same, with the same version of McAfee, and the other one is OK. Moreover the process are in the exception list.

I've check the logs with vxlosview until 6 hours ago, and I've saw this :

"25/08/2015 08:17:15.914 [DA_Thread_Pool::CheckForHeartbeat] Found machine < asaus112 > not sending heartbeat

25/08/2015 08:17:15.914 [Error] V-144-1049 EMMServer generic error = Machine < asaus112 > not sending heartbeat, changing to OFFLINE"

Any idea of what I can check ?

 

Thanks !

10 REPLIES 10

Nicolai
Moderator
Moderator
Partner    VIP   

What abot suspending McAfee compleatly on the host not working to either if the problem could be somthing else.

Could it be peroid full scan and on on-access scans ?

panthony
Level 4
Partner

It's impossible to suspend McAfee on the hosts because of the security policies.

It's only on-access scan.

sdo
Moderator
Moderator
Partner    VIP    Certified

1) Most sites will configure McAfee to ignore IO intensive applications.  On a master or media server, I would exclude "*:\Program Files*\Veritas\" (including sub-folders) from all scanning; write + read + backup.

2) Talk to your Network Team... do you have Cisco LAN switches?  Have they left the default Cisco LAN switch setting of 'Close idle connections' enabled?  NetBackup doesn't like this setting.

panthony
Level 4
Partner

1/ McAfee has all the NBU process listed in the Low Risk Processes tab. The scan run every day at 20hours, and the problem is completly random. For example, last night everything was OK.

2/The switch setting "Close idle connections" is disabled.

sdo
Moderator
Moderator
Partner    VIP    Certified

Are the media servers and/or master really busy with lots of TCP ports open?  Like, over 16,000 ports open?

> netstat -an | find /v /c ""

panthony
Level 4
Partner

The master server have 1635 ports, and the media have 307 ports.

sdo
Moderator
Moderator
Partner    VIP    Certified

What's the keep-alive for each server:

TCP keep alive, reduce from default of 2 hours (i.e. 7,200 seconds, i.e. 7,200,000 ms)...

...some sites choose 5 mins, Sysmantec say 15 mins:

reg query "HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" | sort

reg add   "HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" /v "KeepAliveTime"                /t REG_DWORD /d 300000     #default=7200000

panthony
Level 4
Partner

There was no key for the keep-alive, I have added them on the master and the media.

Elango_G1
Level 4

May be try to check the following forum discussion,

https://www-secure.symantec.com/connect/forums/netbackup-76-antivirus-exclusion

 

sdo
Moderator
Moderator
Partner    VIP    Certified
@panthony - did you solve the problem?