Forum Discussion

RonCaplinger's avatar
12 years ago

MSDP status 2074, but volume is not down

(Reposting here, didn't mean for this to be posted under PureDisk...)   Does anyone here have any experience yet using Cisco UCS blades as NBU media servers? We are migrating our existing...
  • Brook_Humphrey's avatar
    12 years ago

    This was written to handle things exactly like this:

    http://www.symantec.com/docs/TECH156743

    The reason for this if they are intermittent is because the Disk Polling Service checks the status of the disk and then marks it up or down depending on weather or not it gets a response. The default timeout is 60 seconds. So when the disk is under load, the network interface is overwhelmed, or the system is in any other way experiencing a performance issue that delays the response to DPS then it gets marked as down( status 2074 but can be other status codes reported as well). 

    If this is due to performance issues you can easily tell because the device will come back up within a few minutes and backups will start running again. Which of course makes sense. As soon as all the backups fail then the performance returns to normal and it returns the response to DPS within 60 seconds again and NetBackup marks the device as up again.

    I'm not going to get into infrastructure issues that can cause this but anything that has a shared architecture like blade servers, VM's, etc. are going to be more suspect to this type of behavior unless they have dedicated resources(dedicated bandwidth, network interface, raw mapped devices, etc). Also as noted the manufacturer of the network interface you are using can make a huge difference. And it has been known for some time that some cards do require you to shut off offloading or other features to get it to perform as expected.

    This is why the DPS timeouts work so well. When under a load and not returning a response as quickly as you would like it allows for a longer period of time to get a response from the server before marking it as down. 

    The above technote tries to cover a large amount of territory in a small amount of space but the solutions down at the bottom cover quite a few if not all the reasons you may see this. I need to update it as it is a little out of date. We do now allow iSCSI for MSDP as long as it's a 10GB dedicated iSCSI network. I also need to add 2074 to the list of status codes that can cause this.

    Thanks 

    Let me know if you have further questions.