Forum Discussion

RonCaplinger's avatar
12 years ago

MSDP status 2074, but volume is not down

Does anyone here have any experience yet using Cisco UCS blades as NBU media servers?

We are migrating our existing Oracle x86 and Solaris Sparc media server hardware connecting to Data Domains over NFS, to Cisco UCS blades running Red Hat Linux and fiber channel attached storage using NBU's Media Server Deduplication Pools.  We are running NBU 7.5.0.5 on all servers, and most clients are running 7.5.0.4.

We have been experiencing sporadic status 2074's during heavy backup loads on one MSDP.  But the MSDP is not actually down, and the jobs auto-retry 1-2 hours later and are successful.  I am not upping the MSDP, and I can't find any indication that anything else is cycling the services or rebooting the server. 

I do not find a storaged.log or spoold.log on the server, and I just uncommented the /usr/openv/lib/ost-plugins/pd.conf line to enable PureDisk logging and cycled NBU on that server a little while ago, so I don't have any logs from that yet.

If you have worked with this combination of hardware/OS/NBU, do you have any advice on UCS tuning or settings?  We're also having other issues with TCP tuning (socket reads/writes failing).

2 Replies

Replies have been turned off for this discussion
  • Hello 

    The DPS (disk polling service) will try to get a status from the disk pool every 60 seconds and if it takes longer than 60 seconds to get a reponse then it will mark the disk as down. As soon as a few jobs die and the load on hte server reduces then DPS is able to connect and return a respose timely enough and backups work correctly again. Usually this is only about 2 or 3 minutes that the disks are showing as down,

    This is quite common when allot of backups are running at once and so there is a delay due to the network interface on the server being under load. Also the media server may be taking longer to respond due to the disks subsystem being under load. 

    Check out this technote and check out the solution at the bottom ffor modifying the DPS timeouts.

    http://www.symantec.com/docs/TECH156743

    This will give DPS polling a longer time to respond before marking the disk pool as down. Also we have upped the defaults on the more recent releases of netbackup by default for jsut this reason.

    If you have any further questions please ask

    Thanks