on 08-01-2013 03:07 PM
Problem: Solaris 10 ZFS ARC Cache configured as default can gradually impact NetBackup performance at Memory level, forcing NBU to use a lot of Swap memory even when there are several Gig's of RAM "Available", in the following Solaris 10 server we initially see that 61% of the memory is own by ZFS File Data (ARC Cache) # echo ::memstat | mdb -k Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 1960930 15319 24% ZFS File Data 5006389 39112 61% Anon 746499 5832 9% Exec and libs 37006 289 0% Page cache 22838 178 0% Free (cachelist) 342814 2678 4% Free (freelist) 103593 809 1% Total 8220069 64219 Physical 8214591 64176 Using ARChits.sh script we can see how often the OS hits or requests memory from ARC Cache, in our sample is in a 100%, meaning we have a middle man between NBU and the Physical Memory. # ./ARChits.sh HITS MISSES HITRATE 2147483647 692982 99.99% 518 4 99.23% 2139 0 100.00% 2865 0 100.00% 727 0 100.00% 515 0 100.00% 700 0 100.00% 2032 0 100.00% 4529 0 100.00% 1040 0 100.00% … … To know which processes are the ones hitting ARC Cache or requesting memory we use dtrace to count the number of positive and missed hits. # dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @[execname] = count() }' ... ... nbproxy 1099 nbpem 1447 nscd 1649 bpstsinfo 1785 find 1806 fsflush 2065 bpclntcmd 2257 bpcompatd 2394 perl 2945 bpimagelist 4019 bprd 4268 avrd 8899 grep 9249 dbsrv11 20782 bpdbm 37955 As We can see dbsrv11 and bpdbm and the main consumers of ARC Cache memory. Next step is to know the memory requests sizes in order to measure the impact of ARC Cache to NBU requests this because ARC Cache nature of slicing the memory in small blocks. # dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @["bytes"] = quantize(((arc_buf_hdr_t *)arg0)->b_size); }' bytes value ------------- Distribution ------------- count 256 | 0 512 |@@@@@ 10934 1024 | 1146 2048 | 467 4096 | 518 8192 |@@@@ 9485 16384 |@ 1506 32768 | 139 65536 | 356 131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 67561 262144 | 0 Majority of memory requests are 128KB (131072) block sizes and some few are very small, this when there are no major requests at NBU level. Things change when a lot of NBU requests come in, suddenly raising small blocks requests. Following output shows a Master pulling some data running several vmquery commands. # dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @["bytes"] = quantize(((arc_buf_hdr_t *)arg0)->b_size); }' bytes value ------------- Distribution ------------- count 256 | 0 512 |@@@@@@@@@@@@ 78938 1024 |@ 7944 2048 | 1812 4096 |@ 3751 8192 |@@@@@@@@@@@@ 76053 16384 |@ 9030 32768 | 322 65536 | 992 131072 |@@@@@@@@@@@@ 77239 262144 | 0 vmquery drains all the memory requests plus the OS is force to rehydrate the memory in to bigger blocks in order to meet NBU block sizes requirements, impacting the application performance mainly at NBDB or EMMDB levels. # dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @[execname] = count() }' ... ... avrd 1210 bpimagelist 2865 dbsrv11 2970 grep 4971 bpdbm 6662 vmquery 94161 The memory rehydration forces the OS to use a lot of Swap memory even when there is a lot available under "ZFS File Data" # vmstat 1 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s1 s2 s3 s4 in sy cs us sy id 0 0 0 19244016 11342680 432 1518 566 604 596 0 0 8 -687 8 -18 8484 30088 9210 10 5 84 0 2 0 11441128 3746680 44 51 8 23 23 0 0 0 0 0 0 6822 19737 7929 9 3 88 0 1 0 11436168 3745440 14 440 8 23 23 0 0 0 0 0 0 6460 18428 7038 9 4 87 0 2 0 11440808 3746856 6 0 15 170 155 0 0 0 0 0 0 6463 18163 6996 9 4 87 0 2 0 11440808 3747000 295 822 15 147 147 0 0 0 0 0 0 7604 27577 8989 11 5 84 0 1 0 11440552 3746872 122 683 8 70 70 0 0 0 0 0 0 5926 20430 6444 9 3 88 In this case there are 39GB of RAM Allocated for ZFS File Data (ARC Cache) that are supposed to be free in case any App needs it, problem is ARC Cache nature to slice the memory in small pieces and when the OS takes away some of the memory it takes long time to respond to any memory request. # echo ::memstat | mdb -k Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 1960930 15319 24% ZFS File Data 5006389 39112 61% Anon 746499 5832 9% Exec and libs 37006 289 0% Page cache 22838 178 0% Free (cachelist) 342814 2678 4% Free (freelist) 103593 809 1% Total 8220069 64219 Physical 8214591 64176 When the Master is rebooted initially there is no ZFS File Data allocation and NBU runs perfectly, the master performance degrades slowly depending on how fast the ARC Cache eats the memory. # echo ::memstat | mdb -k Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 479738 3747 6% Anon 422140 3297 5% Exec and libs 45443 355 1% Page cache 83530 652 1% Free (cachelist) 2200908 17194 27% Free (freelist) 4988310 38971 61% Total 8220069 64219 Physical 8214603 64176 Solution: We ran into this problem quite often with heavily loaded systems using Solaris 10 & ZFS. To address the problem, we limited the ZFS ARC cache on each problematic system. To determine the limit value we followed below procedure. NOTE: As with any changes of this nature, please bear in mind that the setting may have to be tweaked to accommodate additional load and/or memory changes. Just monitor and adjust as needed. 1. After system is fully loaded and running backups, sample the total memory use: Example: prstat -s size -–a NPROC USERNAME SWAP RSS MEMORY TIME CPU 32 sybase 96G 96G 75% 42:38:04 0.2% 72 root 367M 341M 0.3% 9:38:11 0.0% 6 daemon 7144K 9160K 0.0% 0:01:01 0.0% 1 smmsp 2048K 6144K 0.0% 0:00:22 0.0% 2. Compare percentage of memory in use to total physical memory: prtdiag | grep -i Memory Memory size: 131072 Megabytes 3. In the above example, approx 75% of the physical memory is used under typical load. Add a few percent for headroom (let’s call it 80). 4. 20% of 128GB is 26GB = 27917287424 bytes 5. Configure ZFS ARC Cache limit in /etc/system set zfs:zfs_arc_max=27917287424 6) Reboot system References: https://forums.oracle.com/thread/2340011 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/
Thanks for sharing the information !
1) Where can ARChits.sh be found?
2) The dtrace invocations seem to run indefinitely until one interrupts with ^C, might want to mention that
3) You do only mention Solaris 10, but it'd be nice to know the equivalent for Solars 11:
# dtrace -n 'sdt:zfs::arc-hit,sdt:zfs::arc-miss { @["bytes"] = quantize(((arc_buf_hdr_t *)arg0)->b_size); }'
dtrace: invalid probe specifier sdt:zfs::arc-hit,sdt:zfs::arc-miss { @["bytes"] = quantize(((arc_buf_hdr_t *)arg0)->b_size); }: syntax error near ")"
4) A similar analysis for ZoL on RH-family distributions would be very welcome.
I currently have NBU running on an RHEL system with storage accessed via NFS to a Sol11 system running ZFS over Coraid storage. Performance is not what I'd like, which is what drew me to this post. In the future I may factor out the Sol11 system and run ZoL locally on the RHEL system currently with 128GB; this would of course change the memory equation quite a bit.
Hi Anthony,
ARChits.sh can be found in http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/ or here is the script just to make it handy for you:
# cat -n archits.sh
#!/usr/bin/sh
interval=${1:-5} # 5 secs by default
kstat -p zfs:0:arcstats:hits zfs:0:arcstats:misses $interval | awk '
BEGIN {
printf "%12s %12s %9s\n", "HITS", "MISSES", "HITRATE"
}
/hits/ {
hits = $2 - hitslast
hitslast = $2
}
/misses/ {
misses = $2 - misslast
misslast = $2
rate = 0
total = hits + misses
if (total)
rate = (hits * 100) / total
printf "%12d %12d %8.2f%%\n", hits, misses, rate
}
'
Regarding Solaris 11, dont have any Domain yet on that OS version, but considering dtrace is a readonly tool I'm sure you can run the scripts and share the output, maybe we can see if the impact is the same.
Regards.
Thank you for sharing this information.
We discovered this issue following this tech note and applied the fix.
We discovered that our 75% of our memory is grabbed by ZFS file data.
32GB RAM in server
EMEA::: echo ::memstat | mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 383910 2999 9%
ZFS File Data 3059244 23900 75%
Anon 122603 957 3%
Exec and libs 43187 337 1%
Page cache 52134 407 1%
Free (cachelist) 223203 1743 5%
Free (freelist) 217671 1700 5%
Total 4101952 32046
Physical 4096473 32003
Since applying the fix this has now drop to 4% usage.
Thank you !