03-27-2012 09:58 AM
03-28-2012 12:27 PM
I think I saw something similar back in the early days of the 5000 appliances but thought the issue died out in v1.3. If you look in the "Disk Logs" report and see messages about the disk pool/volume going down and then coming back up 2-5 mins (or so) later, tell your support guy to look up an internal doc of doc id 316680 (or have him contact me).
Long story short, IF this is the problem, NBU is pestering the appliance for an "are you alive" message every 60 seconds and the appliance may be too busy to respond that frequently so we can adjust the timeout with the presence of the following touch file:
Configuration file: <InstallPath>\VERITAS\NetBackup\db\config\DPS_PROXYDEFAULTRECVTMO
You'd want to put a value in the touch file that is larger than the time you're seeing the disk pool as offline in your disk logs. I don't think I've ever seen one go longer than 5 mins, so if you put "600" (for 600 secs) in your touch file you should be more than good. One thing about that is that NBU may not see the storage pool as down for up to 10 mins after it's actually offline, but while your backups will stop writing, it won't cause a problem for any kind of data consistency or anything. NBU just won't see the disk as down for 10 minutes instead of 1 minute, but your false positives on the pool being down should also go to 0.
Hope this helps!
03-29-2012 02:02 AM
03-29-2012 02:41 AM
I have had similar issues on a customers site with a 5200 unit (so slightly different)
It was resolved by three things - it has not had an issue for months since doing this:
1. The DPS_PROXYDEFAULTRECVTMO with a value of 800 but the other two removed - this needs a full service re-start to take effect (I usually reboot)
2. The SIZE and NUMBER DATA_BUFFERS files removed
3. The keep alive setting changed:
# echo 510 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_intvl
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
These need to be kept persisitent though -
The changes would be rendered persistent with an addition such as the following to /etc/sysctl.conf
## Keepalive at 8.5 minutes
# start probing for heartbeat after 8.5 idle minutes (default 7200 sec)
net.ipv4.tcp_keepalive_time=510
# close connection after 4 unanswered probes (default 9)
net.ipv4.tcp_keepalive_probes=3
# wait 45 seconds for reponse to each probe (default 75
net.ipv4.tcp_keepalive_intvl=3
They Don’t need a restart to take effect and then run :
chkconfig boot.sysctl on
Hope this helps
04-03-2012 03:53 AM
hi Mark, thank you for your reply, after applying the parametres with values specified by Symantec Support i did experienced NO MORE stop (even if we still have some quite painfull performance problem) so i still didn't apply your values.
after our env will be up and running with no more performance problems, i think i'll try to apply them.
regards,
Alberto