How ZFS Cache Impacts NBU Performance

Omar_Villa · ‎08-01-2013

Problem:

Solaris 10 ZFS ARC Cache configured as default can gradually impact NetBackup performance at Memory level, forcing NBU to use a lot of Swap memory even when there are several Gig's of RAM "Available", in the following Solaris 10 server we initially see that 61% of the memory is own by ZFS File Data (ARC Cache)

# echo ::memstat | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                    1960930             15319   24%
ZFS File Data             5006389             39112   61%
Anon                       746499              5832    9%
Exec and libs               37006               289    0%
Page cache                  22838               178    0%
Free (cachelist)           342814              2678    4%
Free (freelist)            103593               809    1%

Total                     8220069             64219
Physical                  8214591             64176

Using ARChits.sh script we can see how often the OS hits or requests memory from ARC Cache, in our sample is in a 100%, meaning we have a middle man between NBU and the Physical Memory.

# ./ARChits.sh
        HITS       MISSES   HITRATE
  2147483647       692982    99.99%
         518            4    99.23%
        2139            0   100.00%
        2865            0   100.00%
         727            0   100.00%
         515            0   100.00%
         700            0   100.00%
        2032            0   100.00%
        4529            0   100.00%
        1040            0   100.00%
     …
     …

To know which processes are the ones hitting ARC Cache or requesting memory we use dtrace to count the number of positive and missed hits.

# dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @[execname] = count() }'
...
...
  nbproxy                                                        1099
  nbpem                                                          1447
  nscd                                                           1649
  bpstsinfo                                                      1785
  find                                                           1806
  fsflush                                                        2065
  bpclntcmd                                                      2257
  bpcompatd                                                      2394
  perl                                                           2945
  bpimagelist                                                    4019
  bprd                                                           4268
  avrd                                                           8899
  grep                                                           9249
  dbsrv11                                                       20782
  bpdbm                                                         37955

As We can see dbsrv11 and bpdbm and the main consumers of ARC Cache memory. Next step is to know the memory requests sizes in order to measure the impact of ARC Cache to NBU requests this because ARC Cache nature of slicing the memory in small blocks.

# dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @["bytes"] = quantize(((arc_buf_hdr_t *)arg0)->b_size); }'

  bytes
           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@@@@@                                    10934
            1024 |                                         1146
            2048 |                                         467
            4096 |                                         518
            8192 |@@@@                                     9485
           16384 |@                                        1506
           32768 |                                         139
           65536 |                                         356
          131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            67561
          262144 |                                         0

Majority of memory requests are 128KB (131072) block sizes and some few are very small, this when there are no major requests at NBU level. Things change when a lot of NBU requests come in, suddenly raising small blocks requests. Following output shows a Master pulling some data running several vmquery commands.

# dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @["bytes"] = quantize(((arc_buf_hdr_t *)arg0)->b_size); }'
 
  bytes
           value  ------------- Distribution ------------- 	count
             256 |                                             	0
             512 |@@@@@@@@@@@@      				78938
            1024 |@                                        	7944
            2048 |                                           	1812
            4096 |@                                       	3751
            8192 |@@@@@@@@@@@@    				76053
           16384 |@                                       	9030
           32768 |                                          	322
           65536 |                                          	992
          131072 |@@@@@@@@@@@@  				77239
          262144 |                                         	0

vmquery drains all the memory requests plus the OS is force to rehydrate the memory in to bigger blocks in order to meet NBU block sizes requirements, impacting the application performance mainly at NBDB or EMMDB levels.

# dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @[execname] = count() }'
...
...
  avrd                                                           1210
  bpimagelist                                                    2865
  dbsrv11                                                        2970
  grep                                                           4971
  bpdbm                                                          6662
  vmquery                                                       94161

The memory rehydration forces the OS to use a lot of Swap memory even when there is a lot available under "ZFS File Data"

# vmstat 1
kthr      memory            page            disk          faults      cpu
r b w   swap  free  re  mf pi po fr de sr s1 s2 s3 s4   in   sy   cs us sy id
0 0 0 19244016 11342680 432 1518 566 604 596 0 0 8 -687 8 -18 8484 30088 9210 10 5 84
0 2 0 11441128 3746680 44 51 8 23 23 0  0  0  0  0  0 6822 19737 7929 9  3 88
0 1 0 11436168 3745440 14 440 8 23 23 0 0  0  0  0  0 6460 18428 7038 9  4 87
0 2 0 11440808 3746856 6 0 15 170 155 0 0  0  0  0  0 6463 18163 6996 9  4 87
0 2 0 11440808 3747000 295 822 15 147 147 0 0 0 0 0 0 7604 27577 8989 11 5 84
0 1 0 11440552 3746872 122 683 8 70 70 0 0 0  0  0  0 5926 20430 6444 9  3 88


In this case there are 39GB of RAM Allocated for ZFS File Data (ARC Cache) that are supposed to be free in case any App needs it, problem is ARC Cache nature to slice the memory in small pieces and when the OS takes away some of the memory it takes long time to respond to any memory request.

# echo ::memstat | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                    1960930             15319   24%
ZFS File Data             5006389             39112   61%
Anon                       746499              5832    9%
Exec and libs               37006               289    0%
Page cache                  22838               178    0%
Free (cachelist)           342814              2678    4%
Free (freelist)            103593               809    1%

Total                     8220069             64219
Physical                  8214591             64176


When the Master is rebooted initially there is no ZFS File Data allocation and NBU runs perfectly, the master performance degrades slowly depending on how fast the ARC Cache eats the memory.

# echo ::memstat | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     479738              3747    6%
Anon                       422140              3297    5%
Exec and libs               45443               355    1%
Page cache                  83530               652    1%
Free (cachelist)          2200908             17194   27%
Free (freelist)           4988310             38971   61%

Total                     8220069             64219
Physical                  8214603             64176

Solution:

     We ran into this problem quite often with heavily loaded systems using Solaris 10 & ZFS.   To address the problem, we limited the ZFS ARC cache on each problematic system. To determine the limit value we followed below procedure.
    
     NOTE: As with any changes of this nature, please bear in mind that the setting may have to be tweaked to accommodate additional load and/or memory changes.  Just monitor and adjust as needed.

     1. After system is fully loaded and running backups, sample the total memory use:
     Example:
          prstat -s size -–a
          NPROC USERNAME  SWAP   RSS MEMORY      TIME  CPU                           
              32 sybase     96G   96G    75%  42:38:04 0.2%
              72 root      367M  341M   0.3%   9:38:11 0.0%
               6 daemon   7144K 9160K   0.0%   0:01:01 0.0%
               1 smmsp    2048K 6144K   0.0%   0:00:22 0.0%
     2. Compare percentage of memory in use to total physical memory:
          prtdiag | grep -i Memory
          Memory size: 131072 Megabytes
     3. In the above example, approx 75% of the physical memory is used under typical load.  Add a few percent for headroom (let’s call it 80).
     4. 20% of 128GB is 26GB = 27917287424 bytes
     5. Configure ZFS ARC Cache limit in /etc/system
           set zfs:zfs_arc_max=27917287424
     6) Reboot system


References:
https://forums.oracle.com/thread/2340011 
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache 
http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/

Nicolai · ‎08-06-2013

Thanks for sharing the information !

anthony11 · ‎08-15-2013

1) Where can ARChits.sh be found?

2) The dtrace invocations seem to run indefinitely until one interrupts with ^C, might want to mention that

3) You do only mention Solaris 10, but it'd be nice to know the equivalent for Solars 11:

# dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @["bytes"] = quantize(((arc_buf_hdr_t *)arg0)->b_size); }'
dtrace: invalid probe specifier sdt:zfs::arc-hit,sdt:zfs::arc-miss { @["bytes"] = quantize(((arc_buf_hdr_t *)arg0)->b_size); }: syntax error near ")"

4) A similar analysis for ZoL on RH-family distributions would be very welcome.

I currently have NBU running on an RHEL system with storage accessed via NFS to a Sol11 system running ZFS over Coraid storage. Performance is not what I'd like, which is what drew me to this post. In the future I may factor out the Sol11 system and run ZoL locally on the RHEL system currently with 128GB; this would of course change the memory equation quite a bit.

Omar_Villa · ‎08-15-2013

Hi Anthony,

ARChits.sh can be found in http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/ or here is the script just to make it handy for you:

# cat -n archits.sh
#!/usr/bin/sh

interval=${1:-5} # 5 secs by default

kstat -p zfs:0:arcstats:hits zfs:0:arcstats:misses $interval | awk '
        BEGIN {
                printf "%12s %12s %9s\n", "HITS", "MISSES", "HITRATE"
        }
        /hits/ {
                hits = $2 - hitslast
                hitslast = $2
        }
        /misses/ {
                misses = $2 - misslast
                misslast = $2
                rate = 0
                total = hits + misses
                if (total)
                        rate = (hits * 100) / total
                printf "%12d %12d %8.2f%%\n", hits, misses, rate
        }
'

Regarding Solaris 11, dont have any Domain yet on that OS version, but considering dtrace is a readonly tool I'm sure you can run the scripts and share the output, maybe we can see if the impact is the same.

Regards.

Jeds7755 · ‎08-25-2013

Thank you for sharing this information.

We discovered this issue following this tech note and applied the fix.

We discovered that our 75% of our memory is grabbed by ZFS file data.

32GB RAM in server

EMEA::: echo ::memstat | mdb -k

Page Summary Pages MB %Tot

------------ ---------------- ---------------- ----

Kernel 383910 2999 9%

ZFS File Data 3059244 23900 75%

Anon 122603 957 3%

Exec and libs 43187 337 1%

Page cache 52134 407 1%

Free (cachelist) 223203 1743 5%

Free (freelist) 217671 1700 5%

Total 4101952 32046

Physical 4096473 32003

Since applying the fix this has now drop to 4% usage.

Thank you !

VOX

How ZFS Cache Impacts NBU Performance