MSDP Compaction slow

Question

What goes on behind the scenes when a crcontrol --compactstart is issued?

My complaint is that it is really slow, and doesn't seem to make full use of the system's available resources. &nbsp;We hit a wall when our remote AIR target filled up and replication stopped, we did some cleanup and removed a bunch of dev/test images and kicked off a compaction that is going painfully slow.&nbsp;

In the compactd_node_0.log I see the&nbsp;following types of entries:

Container ####### is prepared for compaction.
	Container #######&nbsp;is referred by too many POs (127), dsicard it.
	release space 0 from container ########
	release space 61669520 from container #########
	1350 PO references from 46 containers are merged into 615 PO references.
	Single container released(MB): 1000.53 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Time (sec): 2966.17
	update_refdb_for_compaction 10 refdbs in total were updated for compaction

I observe that when its doing prepare for compaction cycles, its slowly walking the containers, and is not taxing the system.

When I do see a string of "release space"&nbsp;cycles in the log, I do see a correlation of&nbsp;disk IO and memory util% spike. &nbsp;However for the majority of the time the process is "preparing" a container for compaction... it seems it could&nbsp;be making better use of available system resources to get the job done faster. &nbsp; Does anyone have if any information, such as active replication jobs perhaps, that&nbsp;would trigger a compaction to throttle itself down?&nbsp; &nbsp;Is there a way to speed it up?

&nbsp;

Thanks

&nbsp;

rrsiemers · Accepted Answer

Cycling netbackup services on the target host appears to have fixed the issue.&nbsp;

The only error&nbsp;I've found so far is a&nbsp;log entry inside the /&lt;msdp&gt;/log/spad/spad.log file.

March 11 20:31:05 WARNING [140351687415584]: 25062: NetWaitForRemote: getnameinfo(): Temporary failure in name resolution (err = -3)

&nbsp;

I suspect this was caused due to the network being down when the host rebooted last, and NBU is set to autostart. &nbsp;The network problem was fixed, shortly after boot up and the host was verified to have name resolution and network connectivity, but NBU was not restarted. &nbsp;I'm guessing this is sort sort of bug in NBU.

watsons · Answer

You didn't mention what version of Netbackup / Appliance?

Look like you&nbsp;have multiple partitions (compact_node_0.log) so I guess it's 7.6.1 (or 2.6.1) ??&nbsp; Was it only slow on this partition or all?

I would suggest going to 7.7.1 (or 2.7.1) or above, or log a support case.

rrsiemers · Answer

Correct 7.6.1.2 MSDP servers running under RedHat for targets, the source is an Appliance 2.6.1.2.

I am not familair with partitions, try to google to learn more, but came up empty handed. &nbsp; I would presume we only have 1 partition, and node_0 must be the default. &nbsp;My air replication is slow, my compaction is slow. &nbsp; I hit a gold mine of space recovered earlier today... got back about 10 tb in the span of an hour (this has been compacting for 3 full days now, and still going). &nbsp; I have plenty of space now, so I enabled another SLP policy. &nbsp; The new replications appaear to be trickling data over... the WAN pipe is less than 20% util, &nbsp;the target disk is under utilized, %memused has been hovering at 80% (144g total for a 54 TB MSDP pool). &nbsp; I am having trouble identifying the system &nbsp;bottleneck. &nbsp;&nbsp;

watsons · Answer

Oh.. good to know and thanks for sharing the update.

My bad about the multiple partitions, just found out compact_node_0.log is always there regardless of how many partitions you have, as the primary partition is&nbsp;always node 0.

Forum Discussion

MSDP Compaction slow

4 Replies

Related Content

Compactation taking too long?

MSDP -> MSDP SLP Duplications slow - what to check?

MSDP Down

Netbackup Console slow after deleting MSDP

MSDP 250TB

Recent Discussions

command: bperror

MS-SharePoint policy restore error (2804) .

How to restore a backup

How to configure RBAC

10 years old netbackup appliance database service down, ssl certification out date