cancel
Showing results for 
Search instead for 
Did you mean: 

MSDP -> MSDP SLP Duplications slow - what to check?

BirtyB
Level 4

Hi all,

I wonder if someone might be able to help us with this.  I can always open a supprt case if required but thought I'd ask here first.

First a little about our configuration:  We have a 7.6.1 Windows 2012 R2 Master server which is also a Media server hosting a MSDP.  This server processes all overnight backups (accelerator enabled vmware policies) to it's own MSDP.  These backups all complete easily within the defined window.  SLP duplication job commences the following day at 08:00 that copies all backups to another 7.6.1 Windows 2012 R2 Media server also hosting a MSDP.

The problem we have is that the SLP Duplication window is closing before all images have been been processed causing error 196.  During duplication window we observe lower than expected resource usage on both media servers.  RAM, CPU and page file usage are minimal.  Also the disk subsystem doesn't appear to be at all stressed. How can we check that everything is working to it's full potential?

Any help at all would be great.  Apologies if I have forgotten to add something.

1 ACCEPTED SOLUTION

Accepted Solutions

sdo
Moderator
Moderator
Partner    VIP    Certified

Other things to check:

5) Are the MSDP DB and MSDP Data locations on the same drive volume letter?

6) Are they both on the same type of storage?

7) Are the drives local within the server, perhaps on a SAS RAID HBA, or external via SAS, or external via SAN?

8) If SAN, are they iSCSI or FC?

9) If an external array is being used, does the array have any tools to let you monitor the throughput of the array front end controllers, or array parity groups?

10) For the NTFS volumes, is 'disablelastaccess' set?  It should be disabled by default in Windows 2012 R2.  Use the 'fsutil' tool to confirm.

11) Is 'disable8dot3' names selected?  Again, use 'fsutil' to check.

12) And again use 'fsutil' to check what the NTFS disk cluster blocking factor is?  usually it's a good idea to use 64KB on the volume hosting the MSDP Data, instead of teh typical 4KB for NTFS.

13) Check that the NTFS volume itself is not a 'NTFS dedupe volume''.

14) Is NTFS Change Journal enabled on the volume?  Check for the presence of VxCJ*.dat files at the root of the drive?  If these files are seen then you may have, for some reason, enabled 'Use Change Journal' in teh NetBackup Client properties of the 'media server', and so all file changes are also being tracked by NTFS - which is probably not required.

.

The idea of all of the above is to gain a perspective, and understanding of the configuration of the storage layer.

View solution in original post

8 REPLIES 8

sdo
Moderator
Moderator
Partner    VIP    Certified

During duplications, on both source and target servers, open TaskMgr, click Performance tab, click Open Resource Monitor, maximize, click the Disk tab, exapnd the lower "Storage" panel...

...if you see Active Time (%) near 100% and a Disk Queue Length > 1.0  then this implies that the disk storage underneath the Windows NTFS drive/volume letter is not able to respond quickly enough to the reading/writing process(es).

If you see the above, then use "PerfMon.msc" to view the disk queues in more detail, i.e. check for long (i.e. > 1) read disk queue and/or long (i.e. > 1) write disk queues.

Michael_G_Ander
Level 6
Certified

Might be an idea to look at the SLP Parameters under the master server propertiers, would guess Disk resource multiplier is what you want to change.

nbdevquery can show you the number IO streams on the dedup volumes, was told that 98 was max for a 64 GB/64 TB system under NBU 7.5.

Think there are some technotes about improving the duplication speed between MSDPs

 

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

BirtyB
Level 4

@sdo

Thanks for that.  I did what you said - results below:

Source Media server

Current Disk Queue Length on MSDP Volume: 8 (Average)

Active Time: 100%

Target Media server

Current Disk Queue Length on MSDP Volume: 9 (Average)

Active Time: 100%.

OK, so this tells us that the disk subsystems on both servers is possibly underperforming.  Is there anything we can do within NBU to address or help reduce these figures?

sdo
Moderator
Moderator
Partner    VIP    Certified

Quick checks:

1) Is anything else using the volumes (drive letters) for anything else?  i.e. any other applications on the backup servers?

2) Is AV enabled on the server?  Scan on write?  Scan on read?  AV really should be disabled for MSDP DB and MSDP data.

3) Has Windows 2012 R2 auto 'defrag' maintenance scheduled been left enabled?  (i.e. has it not been manually disabled?).

4) Is Windows 2012 R2 performing auto scheduled 'maintenance' which involves scans for malware etc?

.

Those disk queue lengths are quite long.  Any number over "1.0" is the number of outstanding disk IO reads/writes that the OS has issued and is waiting for the underlying disk to service/complete them.

What you could try next is to use "Perfmon.msc" to graph the 'secs/read' and 'secs/write' on the logical disk and physical disk objects - which will reveal the 'latency' or 'response times' to disk IO.  Any sustained number over 10ms can sometimes be construed as a problem.

sdo
Moderator
Moderator
Partner    VIP    Certified

Q) Is there anything we can do within NBU to address or help reduce these figures?

A) Well, I probably wouldn't fiddle with anything until I understood teh problem in minute detail.

.

Take a look at this post:

https://www-secure.symantec.com/connect/forums/duplication-tape-0

...which shows where the MSDP 'IO' buffer sizes are specified.  But I really do NOT want to steer you towards modifying anything.  It could be made a lot worse.

What I think you may find useful is to gain an understanding of how MSDP works, and how to do a simple IO load test - so gain a simple base line maybe?  If so, do this when the backup server is very quiet, or better still shutdown NetBackup for the duration of any IO base lining that you do.

You can kind of confirm the disk IO packet sizes by using perfmon to look at the 'average' IO size for read and writes - and you'll see the actual disk patterns do match the settings from the content touter cfg file.

Ahhh - I've just realised that the post linked above is for NetBackup v7.5 - yet you are running v7.6.1 - and we know that MSDP has changed somewhat in v7.6.   So, the link above might no longer be relevant for v7.6.

sdo
Moderator
Moderator
Partner    VIP    Certified

Be very careful with any disk IO excerciser tools... if you get the construct of the parameters wrong you can end up accidentally writing random junk all over the disk / volume / partition - and destroy not only your backup data but also destroy the volume meta-data / volume constructs.

sdo
Moderator
Moderator
Partner    VIP    Certified

Other things to check:

5) Are the MSDP DB and MSDP Data locations on the same drive volume letter?

6) Are they both on the same type of storage?

7) Are the drives local within the server, perhaps on a SAS RAID HBA, or external via SAS, or external via SAN?

8) If SAN, are they iSCSI or FC?

9) If an external array is being used, does the array have any tools to let you monitor the throughput of the array front end controllers, or array parity groups?

10) For the NTFS volumes, is 'disablelastaccess' set?  It should be disabled by default in Windows 2012 R2.  Use the 'fsutil' tool to confirm.

11) Is 'disable8dot3' names selected?  Again, use 'fsutil' to check.

12) And again use 'fsutil' to check what the NTFS disk cluster blocking factor is?  usually it's a good idea to use 64KB on the volume hosting the MSDP Data, instead of teh typical 4KB for NTFS.

13) Check that the NTFS volume itself is not a 'NTFS dedupe volume''.

14) Is NTFS Change Journal enabled on the volume?  Check for the presence of VxCJ*.dat files at the root of the drive?  If these files are seen then you may have, for some reason, enabled 'Use Change Journal' in teh NetBackup Client properties of the 'media server', and so all file changes are also being tracked by NTFS - which is probably not required.

.

The idea of all of the above is to gain a perspective, and understanding of the configuration of the storage layer.

BirtyB
Level 4

1) Is anything else using the volumes (drive letters) for anything else?  i.e. any other applications on the backup servers?

No, it's dedicated for MSDP onn both servers.

2) Is AV enabled on the server?  Scan on write?  Scan on read?  AV really should be disabled for MSDP DB and MSDP data.

No AV installed on target media server.  Source media server has AV installed but on-access scanner disabled!

3) Has Windows 2012 R2 auto 'defrag' maintenance scheduled been left enabled?  (i.e. has it not been manually disabled?).

Both media servers it was enabled however showed 0% fragmented.  I have since disabled it.

4) Is Windows 2012 R2 performing auto scheduled 'maintenance' which involves scans for malware etc?

Was enabled but no set to disabled on both media servers.

Indexing and shadow copy is also disabled on both MSDP volumes.