Media Servers going offline
After upgrading all my media servers from 6.0MP4 to 7.1.0.2 I'm seeing some of my remote media servers going offline. When I check the services are running, and doing a right click activate them brings them right back up again. Let me fill in some details.
The master and (8) media servers are all Linux, both RHEL 4 and 5. Yes I know they are old. There are plans to replace them, but things around here move slower than syrup uphill in the Artic winter. Of the 8, 3 have small tape libraries and these seem to stay online. Five of them only have basic disk storage units to NFS mounted DataDomain storage. These are the remote media servers and have the most trouble. Not always the same one. When I log into the media server, it looks OK. The services are up and running.
I know the network infrastructure for these are not the best. From what I've been able to find out, we are only using 100 Mbps links from the master to these remote servers. No backups are coming across these, just the meta data to the catalog, and the master/media handshaking.
These are only being used with basic disk groups, so I don't have ltid running that would do heartbeat. I have not seen anything that talks about a media server with just vmd running as the MM process/ I checked the emm logs for missing heartbeat messages, and did not see any. I did see a bunch of "Heartbeat received from host abc123" messages. So I checked each media server, and I found I'm not getting heartbeats from all of them. I found three of the media servers are not providing any heartbeats, and two of them are. This is really strange.
The backups are all working every night. So they must be in some sort of communication with the master. I'm going to dig some more and see what I can find out. Will post any further info here as I discover it. Anyone have any clues or suggestions, please let me know.
On the Master and Media Server servers it is useful to add the following touch file:
/usr/openv/netbackup/db/config/DPS_PROXYDEFAULTRECVTMO
with a value of 800 in it
Try this and see if it helps
If this does not resolve it you can also add:
/usr/openv/netbackup/db/config/DPS_PROXYNOEXPIRE
but I would just try the first one before doing this