Forum Discussion

sdo's avatar
sdo
Moderator
10 years ago

Duplication, performance of "bpdm.exe"...

Env: NetBackup v7.6.0.1 on RHEL 6.4.  Master/media server (single node).  Fairly new build, about 3 weeks old.

Issue: Duplication to tape (over FC) from MSDP disk pool (itself on SAN/FC storage) seems slow.

One small MSDP disk pool of c. 10TB, which has several weeks of backups now within it, and is about 55% occupied.  Currently duplicating a set of full backups (circa 300 backup images, totalling about 7TB of backup data) via two 'bpduplicate' commands.  The backups being duplicated, are the first set of full backups ever ingested by this environment).

No other backups, replications or duplications are running.

No legacy log folders exist.  VxUL logging was the typical usual default of Debug=1 and Diagnostic=6.

Nothing scary seen in /var/log/messages.

I'm seeing about 50 MB/s to each of two tape drives.

Also seeing two "bpdm.exe" processes regularly hitting 99%/100% of CPU, and a total of 25% of machine CPU used.  RHEL reports server has 8 CPU cores.

NetBackup KMS is configured, and the duplications are writing to an "ENCR_" volume pool for KMS.

NetBackup application daemons were restarted recently (a few days ago).

ulimit and semaphores checked and are ok.

CPU(s) appears to be...  two physical CPU, each quad core, each single thread.

[user@server ~]# cat /proc/cpuinfo | egrep "MHz|physical id|core id|^$"
cpu MHz         : 1200.000
physical id     : 0
core id         : 0

cpu MHz         : 1800.000
physical id     : 0
core id         : 1

cpu MHz         : 1200.000
physical id     : 0
core id         : 2

cpu MHz         : 1800.000
physical id     : 0
core id         : 3

cpu MHz         : 1200.000
physical id     : 1
core id         : 0

cpu MHz         : 1200.000
physical id     : 1
core id         : 1

cpu MHz         : 1800.000
physical id     : 1
core id         : 2

cpu MHz         : 1200.000
physical id     : 1
core id         : 3

(I think the CPUs must be 1.2 GHz with boost to 1.8GHz, because, for each CPU two cores show at 1.2 GHz, and two cores show at 1.8 GHz).

I wasn't involved in the environment sizing, design, build or implementation - so I've no idea (yet) what the server tin actually is.

I suspect that the fact that each "bpdm.exe" maxes out a CPU core is causing the limitation of each duplication to tape to run at around 50MB/s.

I would have expected "bpdm.exe" to simply be a data mover, receiving data from spoold, and forwarding to bptm, and I wouldn't have expected to see each process consuming 100% of a CPU core.

Can anyone offer any reason as to why bpdm maxes out a CPU core?

Thanks in advance.

  • As a real simple test, backup some data to a basic disk STU, then duplicate that, how fast is it ?

  • 1: We had a in-house Linux admin to look at our setup. And this was his advice

    2: You can see if hyper-threading is enabled in the BIOS. You enable/disable in BIOS as well.

    Regarding the use of swap space set 

    vm.swappiness = 1 in /etc/sysctl.conf

    Explained here : https://lonesysadmin.net/2013/12/11/adjust-vm-swappiness-avoid-unneeded-disk-io/

     

  • I would increase NUMBER_DATA_BUFFERS to 128 or 256

    MSDP encryption is documented in the deduplication guide

    http://www.symantec.com/docs/DOC6466

    Page 34. As I read it, its Netbackup that does the encryption/decryption. But Netbackup may very well uses library call to other modules in the OS. The manual say "Deduplication uses the Blowfish algorithm for encryption"

  • strace can do it - but you need kernel skills in decoding the output.

    http://en.wikipedia.org/wiki/Strace

15 Replies

  • Yes - I tried 128 and 256 number_data_buffers, for not much gain.  I'll see how it goes with 64 for this config, and then perhaps try 128 next month.

    Re: where is encryption performed, I found this re Intel CPUs:

    http://en.wikipedia.org/wiki/AES_instruction_set

    ...so it looks like certain Intel CPUs can assist AES encryption, but no mention of blowfish encryption.

    And this paper describes the possibility of extending Intel MMX and SSE instructions to provide hardware assistance to other encryption algorithms:

    http://www.dii.unisi.it/~giorgi/papers/Bartolini08c.pdf

    And this short note lists how AVX and AVX2 intsrcution set of some Intel CPUs can be used by blowfish algorithm:

    https://www1.cs.fau.de/avx.crypto

    All way beyond me.

    It would be really cool if there was a NetBackup command (maybe undocumented until now) which reports whether NetBackup will make use of hardware (within CPU) acceleration for blowfish encryption within MSDP, or whether it has to use a (probably) slower implementation fully in software.

    Maybe Symantec could supply a standalone tool so that customers can check/report their server tin for appropriateness for encryption?  (for MSEO and/or MSDP even?).

    And I still haven't found out why the "bpdm" processes that I saw were flat-lining at 100% of a CPU core.

    Does anyone know of any commands I can run agianst the PID of a bpdm process to reveal what it's doing inside?

    Thanks.

  • Hi Ken,

    The quickest would be to literally take the 'actual elapsed duration' run time and the KB from the activity monitor and drop these in to Excel, and apply some simple cell formulae - but this can be a bit crude, and doesn't take in to consideration tape mount delays, resource contention etc.  But this should not require any scripting.

    The second method might be to script up:  grab the contents of the activity monitor job log record (via bpdbjobs), extract it, parse it (use programming code to detect and handle resource delays), and spit what's required in to an Excel CSV file.  Extracting the 'job log' can be tricky if you've not done it before - but easy once you know how.

    The third, would be to parse the actual bpdm and bptm (+other) logs, and programmatically collate the stats from the 'sets' of related process IDs (PIDs) and munge the data to produce truely accurate performance stats on a per media server, or per policy, or per policy type, or per client, or per client type, or per day, etc etc etc.... basicaly write your own performance reporting tool.  Not easy, but definitely doable, as the NetBackup logs are nicely structured and fairly consistent.

    As for testing - here's some thoughts:

    In a NetBackup installation I usually like to create two 'sets' of test policies, which permanently remain configured within the list of policies, but are usually de-activated, and have policy names that clearly distinguish them as either 'functionality' test policies, or 'performance' test policies, e.g. ZZZ_FUNC_TEST_MSDP_1, ZZZ_FUNC_TEST_MSDP_2,...   ZZZ_PERF_TEST_TAPE_BEST, ZZZ_PERF_TEST_TAPE_WORST, ZZZ_PERF_TEST_TAPE_TYPICAL, etc etc...  The trick is to thibk about what it is your are trying to achieve and to name the polcies such that they relate to each other.

    For the functional test policies, I usually use GEN_DATA to generate small amounts of test data (around 1GB) and use very short 1 day retentions on the schedules.  If your media servers are Windows, then the trick is to leave a folder called ?:\NBU-DO-NOT-DELETE on a drive on each media server, drop a 1GB file in there and specify that folder path in your functional test policies.

    You could create some functional test policies configured such that they perhaps test all tape drive paths, or test cross-site conectivity.  If you have a site with two or more media servers, then you can also do something like, have one test policy per media, which uses the previous media server as its client, e.g.

    policy 1 - storage is MSDP on media server 1, but client is media server 3

    policy 2 - storage is MSDP on media server 3, but client is media server 1

    policy 3 - storage is MSDP on media server 3, but client is media server 2

    And perhaps another bunch of SSO functional test policies, configured in such a way that all media servers all test IO to all tape drives in all tape libraries.  If configured correctly, these should not consume more scratch media than there are drives (i.e. make sure all schedule in all test policies always use the same short 1day retention).

    This way I always have a set of functional test policies to hand ready to prove that NetBackup appears to be ok - simply by running them.  Ideally a complete run of all 'functional' test policies should take no longer than a few minutes to run.

    For performance tests:

    If the master, or master/media, or media server is Unix/Linux based - or if I have decently powerful and well connected (>= 4 Gb/s) Unix/Linux client - then for the performance test policies (again using 1 day retention on the schedule) I like to use the GEN_DATA policy include directives to generate medium sized amounts of data (around 20GB).  I tend to have three sets of performance policies, best case compression and de-dupe, worst case compression and de-dupe, and then typical.  Specifying the best and worst case is easy - just use GEN_RANDOM=0 or =100, and GEN_DEDUPE=100 or 0.

    For simulated real world tests:

    It's getting the 'typical' performance GEN_DATA settings for something useful which can be difficult.  At the end of the day - it's probably best to use real data if you can.

    To try to generate data patterns that look like real data - it's a case of sitting down - looking at the number of files in a typical client, looking at how well it de-dupes, or how well it compresses to tape, and trying to create a GEN_DATA profile which gets close to that.

    HTH.

  • strace can do it - but you need kernel skills in decoding the output.

    http://en.wikipedia.org/wiki/Strace

  • SDO,

    Thank you for your detailed reply on duplication rates and testing. Your willingness (and others) to share info is commendable

    In addition to the duplication rate methods you stated, I have used two other (rough) methods...

    1. When tape drives are dedicated to "tape-out", look at the "Tape Drive Throughput" report in Ops Center. It will show an average KB/sec, for each, and for all tape drives. Use that number as the duplication rate.

    2. Divide " Total MB" output of 'nbstlutil -report', by the elapsed time of all duplications  (start time of 1st job, finish time of last job). May have to 'diff' the nbstlutil output, before and after, if it is a running total.

     

    Lastly, there is a Tech Note on MSDP Read Performance (you probably know about it, but for benefit of others):

    http://www.symantec.com/business/support/index?page=content&id=HOWTO101014

    KW

  • I hadn't seen that tech note.  So thank you for that.

    Personally, I'm not sure about the validity of using sequential IO to test raw performance of the volumes upon which MSDP resides, as to my mind, MSDP IO will be entirely random.

    With v7.6.0.x it appears to me that MSDP keeps itself busy re-organizing - all the time - whilst there is no IO fromt/to NetBackup.  So I suspect all of ths background IO is entirely random.

    Does anyone else have a vew on whether normal backup data arriving at MSDP results in IO that is all sequential, or mostly sequential, or mostly random, or entirely random?