08-25-2015 06:49 AM
Hello,
We are using Netbackup 7.6.0.2 with 2 media servers, all of them are on a windows 2008 R2 server.
Randomly, one of the media server lose his connection (always the same), and all the backups fails with error "disk media server is not active". For example, we had this problem at the beggining of this month, then nothing until the last week end and yesterday.
I've read that McAfee could cause this issue (relative to this technote https://support.symantec.com/en_US/article.TECH56658.html), but as I said the two media servers are the same, with the same version of McAfee, and the other one is OK. Moreover the process are in the exception list.
I've check the logs with vxlosview until 6 hours ago, and I've saw this :
"25/08/2015 08:17:15.914 [DA_Thread_Pool::CheckForHeartbeat] Found machine < asaus112 > not sending heartbeat
25/08/2015 08:17:15.914 [Error] V-144-1049 EMMServer generic error = Machine < asaus112 > not sending heartbeat, changing to OFFLINE"
Any idea of what I can check ?
Thanks !
08-25-2015 08:03 AM
What abot suspending McAfee compleatly on the host not working to either if the problem could be somthing else.
Could it be peroid full scan and on on-access scans ?
08-26-2015 12:54 AM
It's impossible to suspend McAfee on the hosts because of the security policies.
It's only on-access scan.
08-26-2015 01:07 AM
1) Most sites will configure McAfee to ignore IO intensive applications. On a master or media server, I would exclude "*:\Program Files*\Veritas\" (including sub-folders) from all scanning; write + read + backup.
2) Talk to your Network Team... do you have Cisco LAN switches? Have they left the default Cisco LAN switch setting of 'Close idle connections' enabled? NetBackup doesn't like this setting.
08-26-2015 06:37 AM
1/ McAfee has all the NBU process listed in the Low Risk Processes tab. The scan run every day at 20hours, and the problem is completly random. For example, last night everything was OK.
2/The switch setting "Close idle connections" is disabled.
08-26-2015 06:47 AM
Are the media servers and/or master really busy with lots of TCP ports open? Like, over 16,000 ports open?
> netstat -an | find /v /c ""
08-26-2015 07:06 AM
The master server have 1635 ports, and the media have 307 ports.
08-26-2015 07:39 AM
What's the keep-alive for each server:
TCP keep alive, reduce from default of 2 hours (i.e. 7,200 seconds, i.e. 7,200,000 ms)... ...some sites choose 5 mins, Sysmantec say 15 mins: reg query "HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" | sort reg add "HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" /v "KeepAliveTime" /t REG_DWORD /d 300000 #default=7200000
08-26-2015 08:00 AM
There was no key for the keep-alive, I have added them on the master and the media.
10-17-2015 11:55 PM
May be try to check the following forum discussion,
https://www-secure.symantec.com/connect/forums/netbackup-76-antivirus-exclusion
10-18-2015 05:53 AM