Forum Discussion

rrsiemers's avatar
rrsiemers
Level 2
9 years ago

MSDP Compaction slow

What goes on behind the scenes when a crcontrol --compactstart is issued?

My complaint is that it is really slow, and doesn't seem to make full use of the system's available resources.  We hit a wall when our remote AIR target filled up and replication stopped, we did some cleanup and removed a bunch of dev/test images and kicked off a compaction that is going painfully slow. 

In the compactd_node_0.log I see the following types of entries:

  • Container ####### is prepared for compaction.
  • Container ####### is referred by too many POs (127), dsicard it.
  • release space 0 from container ########
  • release space 61669520 from container #########
  • 1350 PO references from 46 containers are merged into 615 PO references.
  • Single container released(MB): 1000.53          Time (sec): 2966.17
  • update_refdb_for_compaction 10 refdbs in total were updated for compaction

I observe that when its doing prepare for compaction cycles, its slowly walking the containers, and is not taxing the system.

When I do see a string of "release space" cycles in the log, I do see a correlation of disk IO and memory util% spike.  However for the majority of the time the process is "preparing" a container for compaction... it seems it could be making better use of available system resources to get the job done faster.   Does anyone have if any information, such as active replication jobs perhaps, that would trigger a compaction to throttle itself down?   Is there a way to speed it up?

 

Thanks

 

  • Cycling netbackup services on the target host appears to have fixed the issue. 

    The only error I've found so far is a log entry inside the /<msdp>/log/spad/spad.log file.

    March 11 20:31:05 WARNING [140351687415584]: 25062: NetWaitForRemote: getnameinfo(): Temporary failure in name resolution (err = -3)

     

    I suspect this was caused due to the network being down when the host rebooted last, and NBU is set to autostart.  The network problem was fixed, shortly after boot up and the host was verified to have name resolution and network connectivity, but NBU was not restarted.  I'm guessing this is sort sort of bug in NBU.

4 Replies

  • You didn't mention what version of Netbackup / Appliance?

    Look like you have multiple partitions (compact_node_0.log) so I guess it's 7.6.1 (or 2.6.1) ??  Was it only slow on this partition or all?

    I would suggest going to 7.7.1 (or 2.7.1) or above, or log a support case.

  • Correct 7.6.1.2 MSDP servers running under RedHat for targets, the source is an Appliance 2.6.1.2.

    I am not familair with partitions, try to google to learn more, but came up empty handed.   I would presume we only have 1 partition, and node_0 must be the default.  My air replication is slow, my compaction is slow.   I hit a gold mine of space recovered earlier today... got back about 10 tb in the span of an hour (this has been compacting for 3 full days now, and still going).   I have plenty of space now, so I enabled another SLP policy.   The new replications appaear to be trickling data over... the WAN pipe is less than 20% util,  the target disk is under utilized, %memused has been hovering at 80% (144g total for a 54 TB MSDP pool).   I am having trouble identifying the system  bottleneck.   

  • Cycling netbackup services on the target host appears to have fixed the issue. 

    The only error I've found so far is a log entry inside the /<msdp>/log/spad/spad.log file.

    March 11 20:31:05 WARNING [140351687415584]: 25062: NetWaitForRemote: getnameinfo(): Temporary failure in name resolution (err = -3)

     

    I suspect this was caused due to the network being down when the host rebooted last, and NBU is set to autostart.  The network problem was fixed, shortly after boot up and the host was verified to have name resolution and network connectivity, but NBU was not restarted.  I'm guessing this is sort sort of bug in NBU.

  • Oh.. good to know and thanks for sharing the update.

    My bad about the multiple partitions, just found out compact_node_0.log is always there regardless of how many partitions you have, as the primary partition is always node 0.