Forum Discussion

backup-botw's avatar
9 years ago

Buffer Size Data for My Site

I am sure this has been answered 100 times over and I have read a bunch of different documents, forum posts, whitepapers and anything else I could find. In my environment we are having issues where w...
  • sdo's avatar
    9 years ago

    @backup-botw - in another recent post, user deeps posted a link to this:

    NetBackup Media Server Deduplication (MSDP) in the Cloud

    https://www.veritas.com/support/en_US/article.000004584

    ...which contains a bit of useful info about the requirements of MSDP media servers, which clearly is still relevant whether the MSDP media server is situated in a cloud estate (or not).

    How do the CPUs in your MSDP media servers match up to the minimum requirements?  (i.e. 8 cores of 2 GHz are better for MSDP than 2 cores of 8 GHz)

    Is your VNX array capable of delivering sustained minimum of 3000 IOPS for MSDP?  The linked doc doesn't delineate whether this 3000 IOPs is purely for MSDP backup ingest - or whether 3000 IOPs is enough to sustain bulk MSDP ingest and bulk MSDP re-hydration - to me it would seem unlikely that anyone would be using tape in the cloud - and so, to me, it would seem that 3000 IOPs is a minimum requirement for MSDP ingest only, and thus the addition of an MSDP re-hyration workload would therefore demand a higer level of sustained IOPs.  Then again Symantec/Veritas could be playing a very conservative game by quoting quite a high minimum.

    .

    As a separate point, here's something that I've noticed when monitoring the disk underneath MSDP... that the disk IO queue depth never gets very high... it's as if the IO layer right at the bottom of MSDP is sensitive to the response time and latency and queue depth and actively avoids swamping the disk driver/interface/HBA layer with lots of outstanding pending incomplete queued disk IO... which says to me that you could have a situation where the disks don't look overly strained, and respond at what appears fairly nice levels, but because the VNX IO is actually not that responsive (from your graphs), i.e. not sub 3ms and more around 10ms, then the CPU isn't going to get swamped - because in all honesty it's spending most of it's time waiting for the few disk IO (which have been queued/requested) to complete - and so... the issue would appear to not be a disk issue... because you can't see oodles of disk IO that are being delivered late or queueing up.  It's as if MSDP is actively trying to avoid tripping itself up.  I see MSDP make disks 100% busy, but the queue depth never gets very high - to me the software is avoiding requesting too many in-flight disk IOs. - with the visible net effect of not looking overstained, and not looking like a problem... but at the same time not doing very much... and yet so very capable of doing so much more if only the disk sub-system were more responsive (<3ms).

    .

    Anotther thing to look for... Are the media servers re-hydrating and duplicating/sending between themselves - a quick way to check is to look for "media-server-A -> media-server-B' in the 'Media Server' column in the Activity Monitor - for the duplication jobs.   If you see '->' then the re-hydrated data is being sent, full fat, across the LAN from one media server (MSDP source) to another media server (tape writer) - and this could slow things down horribly and could potentially swamp your LAN NICs and or LAN switches - which are at the same time trying to move oodles of incoming backup data from clients.