Does anyone here have any experience yet using Cisco UCS blades as NBU media servers?
We are migrating our existing Oracle x86 and Solaris Sparc media server hardware connecting to Data Domains over NFS, to Cisco UCS blades running Red Hat Linux and fiber channel attached storage using NBU's Media Server Deduplication Pools. We are running NBU 220.127.116.11 on all servers, and most clients are running 18.104.22.168.
We have been experiencing sporadic status 2074's during heavy backup loads on one MSDP. But the MSDP is not actually down, and the jobs auto-retry 1-2 hours later and are successful. I am not upping the MSDP, and I can't find any indication that anything else is cycling the services or rebooting the server.
I do not find a storaged.log or spoold.log on the server, and I just uncommented the /usr/openv/lib/ost-plugins/pd.conf line to enable PureDisk logging and cycled NBU on that server a little while ago, so I don't have any logs from that yet.
If you have worked with this combination of hardware/OS/NBU, do you have any advice on UCS tuning or settings? We're also having other issues with TCP tuning (socket reads/writes failing).
The DPS (disk polling service) will try to get a status from the disk pool every 60 seconds and if it takes longer than 60 seconds to get a reponse then it will mark the disk as down. As soon as a few jobs die and the load on hte server reduces then DPS is able to connect and return a respose timely enough and backups work correctly again. Usually this is only about 2 or 3 minutes that the disks are showing as down,
This is quite common when allot of backups are running at once and so there is a delay due to the network interface on the server being under load. Also the media server may be taking longer to respond due to the disks subsystem being under load.
Check out this technote and check out the solution at the bottom ffor modifying the DPS timeouts.
This will give DPS polling a longer time to respond before marking the disk pool as down. Also we have upped the defaults on the more recent releases of netbackup by default for jsut this reason.
If you have any further questions please ask
This is a good indication that you are pushing too hard on the MSDP server. Please check your limit I/O stream setting on the Disk Pool and set it to an appropriate value. Depending on the speed of your disk, you can start at 50 and slowly increase it until you start seeing the breaking point.