MSDP Compaction slow
What goes on behind the scenes when a crcontrol --compactstart is issued?
My complaint is that it is really slow, and doesn't seem to make full use of the system's available resources. We hit a wall when our remote AIR target filled up and replication stopped, we did some cleanup and removed a bunch of dev/test images and kicked off a compaction that is going painfully slow.
In the compactd_node_0.log I see the following types of entries:
- Container ####### is prepared for compaction.
- Container ####### is referred by too many POs (127), dsicard it.
- release space 0 from container ########
- release space 61669520 from container #########
- 1350 PO references from 46 containers are merged into 615 PO references.
- Single container released(MB): 1000.53 Time (sec): 2966.17
- update_refdb_for_compaction 10 refdbs in total were updated for compaction
I observe that when its doing prepare for compaction cycles, its slowly walking the containers, and is not taxing the system.
When I do see a string of "release space" cycles in the log, I do see a correlation of disk IO and memory util% spike. However for the majority of the time the process is "preparing" a container for compaction... it seems it could be making better use of available system resources to get the job done faster. Does anyone have if any information, such as active replication jobs perhaps, that would trigger a compaction to throttle itself down? Is there a way to speed it up?
Thanks
Cycling netbackup services on the target host appears to have fixed the issue.
The only error I've found so far is a log entry inside the /<msdp>/log/spad/spad.log file.
March 11 20:31:05 WARNING [140351687415584]: 25062: NetWaitForRemote: getnameinfo(): Temporary failure in name resolution (err = -3)
I suspect this was caused due to the network being down when the host rebooted last, and NBU is set to autostart. The network problem was fixed, shortly after boot up and the host was verified to have name resolution and network connectivity, but NBU was not restarted. I'm guessing this is sort sort of bug in NBU.