Troubleshooting MSDP with Disk Pool regularly going down
Master is the Storage Server: NetBackup 8.0 with Windows 2008 R2 SP1
Disk used for MSDP disk pool: External storage connected to the Windows server through a dedicated HBA. Uses a SAN switch and Microsoft MPIO.
Our Puredisk Disk Pool is going down a couple times a month, during backup windows, after weeks of uptime and regular performance. This makes the NetBackup backup jobs directed to the associated MSDP storage unit fail with status 2074. The Disk Pool refuses to go up even restarting NB services but a Windows reboot brings everything back online. Problem started a year and a half ago, before the upgrade to NetBackup 8.0.
Since I haven't been able to catch the incident itself I tried replicating it (with NB services down) by disconnecting the external storage: I got logs in the FC switch and MPIO errors in Windows Event Viewer, which weren't produced before when NetBackup marked down the disk pool. The storage itself has zero errors and is unaware of any problem. Therefore, my theory is that the disk has been always online and something is happening in the software.
Until now the best NetBackup logs I have are from Disk Reports \ Disk Logs, where the following lines are produced right before all the backup jobs start failing with status 2074:
Volume <Disk Pool>:PureDiskVolume monitored by <Storage Server> is down Volume <Storage Server>:<Disk Pool>:PureDiskVolume marked down
I tried looking in <MSDP-path>\log\spoold\spoold.log but coulnd't find the reason why the Disk Pool was down'ed in the first place. What logs should I be looking for and how should I configure their verbosity if required?
Can you confirm that this master/media server has sufficient memory for Master server load plus Media server processes plus Dedupe (1 to 1.5 MB memory per TB) as well as OS requirements?
Have you checked the TN with requirements for Windows MSDP?