cancel
Showing results for 
Search instead for 
Did you mean: 

Poor performances with a Data Domain DD boost OST device on Solaris

Gdd
Level 6
Partner Accredited

Our configuration is:

= NetBackup 6.5.6 on a Solaris 10 serveur based on T2+ processors

= A Data Domain OST device with DD Boost

= Network is 10 Gb/s Ethernet

Maximum performance of a single stream is about 30 MB/s... we can set a lot of streams, but performance of each individual stream is remains low !

On Linux processors, or on Solaris processors but without DD Boost, performance is good (between 100 MB/s and 200 MB/s for each stream).

Does anyone has experienced this?

18 REPLIES 18

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Turn boost off and have a look at the performance..........

Gdd
Level 6
Partner Accredited

It is better without boost...

"On Solaris processors without DD Boost, performance is good (between 100 MB/s and 200 MB/s for each stream)."

However, as some of my Solaris media servers have only 1 Gb/s Ethernet attachements, I need DD boost to reach the target throughput !

 

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

I saw the exact same thing (on windows). The problem is that the performance figures you might have been promised relate more to a environment with mulitple SAN media servers, and not a few media servers with lots of clients sending data to them.

 

What you must remember is that DD boost is NOT clients side dedupe, so what you're doing is turning on a deduplication engine on the media server, which talks to the DD device, and then sends the deduped data from the media to the DD. The problem is when you turn it on, the time taken to process the data on the media server (by boost) cause a delay. Your clients still send there FULL data set to the media server, whether you use boost or not. The only difference is when you use boost, it takes longer to process the same piece of data, so it can send it (deduped) to the DD device. This might alleviate the load on the DD, but it does absolutely nothing for your backup window, in fact, with boost on, it increases the window.

 

Thats my understanding of it from what I've seen, I look forward to hear your views.

Gdd
Level 6
Partner Accredited

Yes, I agree but it works much faster on Linux (X86) and Solaris (Sparc T2 +).
I try to:
1 / Check that I am not alone in having problems on Solaris (I have may have forgotten a prerequisite?)
2 / Check that there is no trick to go faster anyway?

Data Domain documents says that Solaris T+ is almost as fast as X86, but they don't say if it is on a small or on a large number of streams !

Mouse
Moderator
Moderator
Partner    VIP    Accredited Certified

I'd check couple of things.

First, the version of the OST plug-in. Recent versions (at least 2.2.3 and 2.3.1) are able to leverage built-in cryptographic accelerator in T2+ processors (if libpkcs11.so is available). This lowers load on the media server comparing to traditional BOOST as well as comparing to transferring the entire stream to the DD box, thus speeding up backup if link between DD and Media Server is the bottleneck.

Second, network buffering settings. The OST plug-in admin guide explains in details how system should be tuned for best performance.

Mouse
Moderator
Moderator
Partner    VIP    Accredited Certified

Client side dedupe doesn't shrink the backup window in NBU with built-in dedupe client either. It will still read the entire data set from disk without caching, therefore the only tangible result you could expect is reduction in traffic between the client and the media server.

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Not on a single client level, yes, but a reduction in traffic could allow you to send more data at the same time. The logical (call it that if you will) bandwidth increases, which means clients that might have been scheduled later, can run earlier as the media server will be able to accept their data (network not congested).

 

Same as a incremental really (just on another level).

 

What I actually wanted to get across is that the DD does not offer this, even though people are led to believe the 5 TB/hour quoted speed is possible in a LAN client environment....

Mouse
Moderator
Moderator
Partner    VIP    Accredited Certified

DD does offer this, but with SAN media server environments (where you have an option to install the OST plug-in and use local storage unit). There are also other backup apps out there whose clients are able to use boost without having to use full-blown media server, but those are offtopic for this forum :)

Mouse
Moderator
Moderator
Partner    VIP    Accredited Certified

By the way, there is no technical limitation of putting an OST plug-in into NBU client, given the pure disk plug-in for the client works using this way, but SYMC are too greedy to allow OST-on-client and combinations like boost-on-client along with DD, just because they won't be able to charge for SAN media server licenses anymore

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Yeah, sorry, should have phrased it different. Its possible, but they should tell the client that he needs to redesign his NetBackup setup to use SAN media's (or as you said, DD boost on client :P). Which these days isn't really an issue with the use of per TB / Platform licensing. It just they don't tell them that when they're selling it (in my experience), and this causes unhappiness later when a redesign is called for.

 

It would all work well if we all talk to each other and not be so competitive and cryptic, LOL

 

Thanks for the chat :)

RonCaplinger
Level 6

In short, if you get low dedupe rates, you will not see much advantage using Boost.

We also tried Boost, but because we didn't have a lot of duplicate data yet, the media server was taking a longer time to process the same amount of data and it still had to send it anyway, negating the use of Boost.

The OST plugin first sends a request to the Data Domain appliance to check the hash of the data packet.  If the DD has already seen it, the full data packet is discarded and the block's expiration is updated on the Data Domain.  This is where Boost comes in handy.  BUT, when the Data Domain does NOT find a duplicate block already stored, the media server must then send the second packet with all of the data.  So, twice the number of data packets if your data does not dedupe well, which leads to poor performance and high CPU utilization all around.

Gdd
Level 6
Partner Accredited

I  use the last version of DD Boost, Release 2.3.1.0.

I test with GEN_DATA - and with NetBackup 6.5.6, generated data have a very high deduplication ratio, so there is very little data sent over the network.

The optimization parameters (system and network) have been positioned, but there is very little data sent,so it is probably not the reason of poor performance ! In addition, Netbackup data buffer size and number have been increased.

However, as the network throughput and the CPU load are low, I wonder about cryptographic accelerator: How can I check that cryptographic accelerator is running, and that libpkcs11.so is available ?

Mouse
Moderator
Moderator
Partner    VIP    Accredited Certified

Ok, let's do the following:

Check how many SHA1 your box has done so far with help of the crypto accelerator:

 kstat -n n2cp
OR 
kstat -n n2cp0 

this will give you an idea whether your DD BOOST uses this module or not. As far as the libpkcs11.so is concerned, you can search for this library on your system using find and check whether it present.

 

Certain options could be disabled in the libpkcs11.so library, run

 cryptoadm list -p

to list enabled ones

The following command displays all possible counters available for monitoring in real-time for tracking

 cputrack -h

use cputrack with required counter to track sha-1 calculations in real time

Gdd
Level 6
Partner Accredited

I did the following test:

Several " kstat -n n2cp0 " during a backup: 

SHA1 counter becomes greater, so I think crypto accelerator is used ?

    Line 16883:         sha1                            13329276
    Line 38492:         sha1                            13383773
    Line 38644:         sha1                            13891339
    Line 38779:         sha1                            13999579

According to "Cryptoadm" resuklt, aal the options seems enabled:

root# cryptoadm list -p

User-level providers:
=====================
/usr/lib/security/$ISA/pkcs11_kernel.so: all mechanisms are enabled. random is enabled.
/usr/lib/security/$ISA/pkcs11_softtoken_extra.so: all mechanisms are enabled. random is enabled.

Kernel software providers:
==========================
des: all mechanisms are enabled.
aes256: all mechanisms are enabled.
arcfour2048: all mechanisms are enabled.
blowfish448: all mechanisms are enabled.
sha1: all mechanisms are enabled.
sha2: all mechanisms are enabled.
md5: all mechanisms are enabled.
rsa: all mechanisms are enabled.
swrand: random is enabled.

I have not yet tried "cputrack" but I found the following command:

root# /usr/sfw/bin/openssl speed -engine pkcs11 sha1
engine "pkcs11" set.
Doing sha1 for 3s on 16 size blocks: 68954 sha1's in 2.70s
Doing sha1 for 3s on 64 size blocks: 65603 sha1's in 2.72s
Doing sha1 for 3s on 256 size blocks: 58471 sha1's in 2.76s
Doing sha1 for 3s on 1024 size blocks: 37216 sha1's in 1.53s
Doing sha1 for 3s on 8192 size blocks: 27082 sha1's in 1.32s
OpenSSL 0.9.7d 17 Mar 2004 (+ security fixes for: CVE-2005-2969 CVE-2006-2937 CVE-2006-2940 CVE-2006-3738 CVE-2006-4339 CVE-2006-4343 CVE-2007-5135 CVE-2007-3108 CVE-2008-5077 CVE-2009-0590 CVE-2009-3555)
built on: date not available
options:bn(64,32) md2(int) rc4(ptr,char) des(ptr,risc1,16,long) aes(partial) blowfish(ptr)
compiler: information not available
available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha1               408.62k     1543.60k     5423.40k    24907.96k   168072.53k

So, with larger blocks, it is possible to go faster... but I don't know which block size is used by DD Boost ?

Mouse
Moderator
Moderator
Partner    VIP    Accredited Certified

I guess this means the crypto accelerator is ok and doing its job.

DD uses variable segment size, it is not fixed block, anyway.

What about the ingestion rate, can you please verify values for NET_BUFFER_SZ and others, so we can be sure you're allocating enough memory buffers to feed the box with data?

Gdd
Level 6
Partner Accredited

I only backup local data on the SUN media server, and all the buffer sizes (SIZE_DATA_BUFFER;...) have been increased according to "boost" recommandations... and the EMC support has no idea to solve this problem !

So, I am afraid it never goes faster because NetBackup 6.5 uses a "plugin" 32-bit?

 

 

Yasuhisa_Ishika
Level 6
Partner Accredited Certified
SPARC T2+ consists of many *slow* core, and not suitable for stream processing like encryption, encoding,... Each backup stream is processed in one thread, so stream speed is limited by processor core speed. Worse than that, each core has 4 or 8 thread(like hyper-threading). This lead CPU usage lower than real. For example, when you run 4 heavy tasks on 4 core processor of 8 thread simultaneously, cores are fully working but CPU usage is displayed as 13%.

Claudio_Veronez
Level 6
Partner Accredited

let assume that your environment is a common one like

Clients -> ETH -> Media/Master Server -> SAN -> DD

possible bottleneck:

Client -> processor load, how big the file is, how many files,

ETH -> Switches (iSCSI, jumbo frames)

Master/Media Server -> Where Dedup starts (processor, simultaneous jobs) your dedup gain will be from media server to the DD. maybe your bootleneck is ETH / Client

DD -> Depending on connections, FC 8gbps, cifs nfs, throughput may vary.

Are U using DD as a VTL?

(Try to use CIFS/NFS map at the master server and make a Backup to Disk or copy something from this server directly to your DD)

 

I hope it helps.