Solved: SQL COMPRESSION with slow transfer

klunkySQL · ‎07-14-2015

Hello,

I'm with a problem about slow backup's using SQLCOMPRESSION. My database runs on a SQL Server 2012 Enterprise edition x64 with 40 CPU's and 256GB RAM.

If i start a backup without compression, my database is like 7.8TB size, and my NIC adapter ( 10Gbit ) reaches like 70% ( between 650mb\s and 750mb\s )

If i use my backup with compression, my database get's like 1.8TB which is very good, but my NIC adapter just reaches 12% ( between 100mb\s and 150mb\s ) maximum, and always transfer in a constant speed, with no up's and downs.

Part of my "batch" file:

STRIPES 24
MAXTRANSFERSIZE 6
BLOCKSIZE 7
NUMBUFS 32
SQLCOMPRESSION TRUE
ENDOPER TRUE

This behaviour is normal ? I don't expect with compression to get the same speeds, but 12% of my 10GBIT NIC is very low...

Thanks !

sdo · ‎07-14-2015

Do both types of backups take roughly the same amount of time? if so, then your backup client may be source disk IO bound. And so using compression just makes the backup smaller, which in turn uses less network bandwidth, but a little more CPU - but because the rest of the system is so capable, this exposes the source disk as the bottleneck no matter which backup job configurations you choose. If this is true then NetBackup Client, your network, and your backup target are capable of moving more data and more quickly but are unable to do so because the source disk is not fast enough to put pressure on orher resources. If this is true then you might consider yourself lucky... because... you appear to have only one bottleneck... instead imagine the life of the unfortunate backup admin having to care for a configuration situation where you know you have a bottleneck, but you're not sure where it is... but really one has two different bottlenecks both of which cause a similar IO throughput limit yet one doesn't know it... then what can happen is that... one resolves one bottleneck yet is left with the same IO throughput bottleneck. What I'm trying to get across is that sometimes there are hidden bottlenecks. Remember, there will always be at least one bottleneck somewhere - but sometimes there can be two or more bottlenecks all of which result in the same IO throughput limitations. Also remember that bottlenecks are not a problem worthy of investigation or investment (in effort, or kit/software) unless you are failing to meet your SLAs. You can prove whether your backup job is source disk IO bound by using Resource Monitor and the "Disk" tab. If you see disk volumes at 100% busy with an average sustained IO queue above 1.0 then those volumes are likely to be IO bound. if you see thisnthen you should use PerfMon.msc to investigate further. Before then perhaps investigating infrastructure elements outside of the backup client.

View solution in original post

DG-2005 · ‎07-14-2015

how much of the CPU is being utilized by SQL ? does it have enough compute for the compression threads?

klunkySQL · ‎07-14-2015

Hello DG-2005

When my backup starts, my server is using 30% CPU (average) and i have like 40GB of free RAM.

Thanks !

sdo · ‎07-14-2015

Do both types of backups take roughly the same amount of time? if so, then your backup client may be source disk IO bound. And so using compression just makes the backup smaller, which in turn uses less network bandwidth, but a little more CPU - but because the rest of the system is so capable, this exposes the source disk as the bottleneck no matter which backup job configurations you choose. If this is true then NetBackup Client, your network, and your backup target are capable of moving more data and more quickly but are unable to do so because the source disk is not fast enough to put pressure on orher resources. If this is true then you might consider yourself lucky... because... you appear to have only one bottleneck... instead imagine the life of the unfortunate backup admin having to care for a configuration situation where you know you have a bottleneck, but you're not sure where it is... but really one has two different bottlenecks both of which cause a similar IO throughput limit yet one doesn't know it... then what can happen is that... one resolves one bottleneck yet is left with the same IO throughput bottleneck. What I'm trying to get across is that sometimes there are hidden bottlenecks. Remember, there will always be at least one bottleneck somewhere - but sometimes there can be two or more bottlenecks all of which result in the same IO throughput limitations. Also remember that bottlenecks are not a problem worthy of investigation or investment (in effort, or kit/software) unless you are failing to meet your SLAs. You can prove whether your backup job is source disk IO bound by using Resource Monitor and the "Disk" tab. If you see disk volumes at 100% busy with an average sustained IO queue above 1.0 then those volumes are likely to be IO bound. if you see thisnthen you should use PerfMon.msc to investigate further. Before then perhaps investigating infrastructure elements outside of the backup client.

sdo · ‎07-14-2015

Is your backup job failing to meet its SLA?

.

See also some Windows type IO thoughts:

https://www-secure.symantec.com/connect/forums/msdp-msdp-slp-duplications-slow-what-check

.

Is everything on just few volumes (e.g. less than two or three drive letters)? Or is your OS (1), application binaries (2), database (3), transaction logs (4), and application text logs (5) all assigned/installed to different drive letters? More than five different volumes?

The more I think about this, the less I/we know about your configuration. It's very hard to comment on what changes/tweaks/improvements to make. All I can suggest are some things to check and/or consider.

Is the SQL database on local disk within the server? Or on a SAN? If it's from a SAN, then is the NTFS volume that the database resides upon from one big SAN LUN, or many smaller SAN LUNs? Usually an NTFS volume made from many smaller LUNs will usually result in more possible IO in flight (i.e. more IO queued pending completion), i.e. more disk IO request can be in progress at any one time... but then there are also always limits on the far side of storage - i.e. at the storage array itself. And there's buffer to buffer credit exhaustion to consider inside a SAN fabric. Hmmm, this list is near endless. It all depends how far you are willing to go to investigate all of this?

We haven't identified where the bottleneck is (and there is always at least one bottleneck). Maybe your backup media server (and/or its target storage) cannot ingest any quicker. If your source backup client disk volumes are not 100% busy and have low IO queues (less than 1.0), and your total CPU usage is quiet-ish (but then have you checked for a single max'ed out logical CPU?) and your LAN (backup client to LAN switch) is not maxed out, then this hints at a bottleneck outside of the SQL server - so.. where is it? Are there multipel LAN switches with ISL (inter-switch link) between them? Are they maxed out? The link from switch to media server HIC - is that maxed out? Is the NetBackup media server maxed out? Search this forum for the terms 'waited delayed' for examples on how to spot where delays are occuring.

VOX

SQL COMPRESSION with slow transfer