08-13-2017 03:17 PM
Just wanted to see what others' experiences might be regarding swap usage on their appliances - I have one *very* busy appliance which recently was exhibiting performance issues... and upon investigation I saw that there was over 30GB of swap in use (of 64GB configured).
Upon investigation of my other appliances, I do see swap in use, but it is quite a bit lower (10-12GB).
I'm actually pretty surprised that there is *any* reported swap in use, since there always seems to be a bit of free memory reported on each of the appliances.
5230 Media Servers, Appliance Version 3.0
08-15-2017 02:23 PM
It would require closer monitoring but at some point your appliance is running low and memory and seeking swap.
Too many concurrent jobs can cause that. Also maybe you have a job that requires more memory than the standard job.
See if you can identify why it is getting exhausted what is running and maybe reduce the number of concurrent operations.
Otherwise engage NetBackup support for a closer investigation.
08-15-2017 02:58 PM
I forgot to mention you can look into vm.swappiness value on the appliance.
swappinessvalue is recommended for database workloads. For example, for Oracle databases, Red Hat recommends a
08-15-2017 03:06 PM
Like I said - all of my appliances are using swap space. The average seems to be around 10GB of swap space that is reported being used.
I've been keeping an eye on them, with a focus on the swap usage. The busy appliance that had the 30+ GB of use is pretty steady right now at 12GB. I'm going to see if at any point that starts to head up...
08-16-2017 03:06 AM
@elanmbx could you tell us exactly what and exactly how you are measuring, so that we can also feedback any figures, and compare apples with apples. Thank you.
08-16-2017 08:15 AM
Is your 'busy' appliance doing VADP backups? Those are rather memory intensive.
In my experience, swap usage isn't unusual, especially on a system that is deduping and has a high concurrent job count. 'Excessive' is very subjective and a high level of swap use over time does NOT always indicate a problem, though it can be a symptom. I don't see appliances running over 50% of allocated swap very often, but 10% - 20% isn't unusual, especially if the systems have been up for a long period of time.
If you're running into backup window issues on the 'busy' appliance, I'd suggest logging a support ticket to see if there are balancing recommendations or other tuning recommendations to be made in your specific scenario/config.
08-16-2017 08:46 AM
As far as monitoring goes - I'm simply watching "Top" and/or "MemoryStatus" from the CLISH. Nothing terribly scientific.
Mem: 131929932k total, 114189420k used, 17740512k free, 4436k buffers
Swap: 68272124k total, 14399800k used, 53872324k free, 14337192k cached
08-16-2017 08:49 AM
Yes, sir - the busy appliance is doing a LOT of VADP backups - including lots of duplications and replications of those VM images. It's quite busy - almost all day long.
I am going to continue monitoring this appliance - it doesn't seem to have issues wtih meeting our regular backup window until it has been up for awhile (last couple times was 7-10 days after a reboot) and then we start to see quite a few 196s as things just generally seem to slow down.
I'm going to keep an eye on this behavior and see if the swap usage and appliance bogging down are correlated...
08-16-2017 09:17 AM
Just out of curiosity - how big of a memory footprint does spad use up on your appliance(s) - in my environment it seems to be all over the map:
I presume it may be related to dedupe/rehydration tasks going on?
08-17-2017 09:58 AM
It's interesting - swap usage has been crawling up on the busiest appliance. Currently I'm north of 16GB of used swap.
All my other appliances are staying steady at 10-12GB of swap.
I'm going to continue to monitor to see if performance goes into the weeds as this swap increases.
08-17-2017 10:18 AM
Please review documentation on swappiness:
Try changing the value to 10 and see how it affects your swap usage as:
Swappiness is the kernel parameter that defines how much (and how often) your Linux kernel will copy RAM contents to swap. This parameter's default value is “60” and it can take anything from “0” to “100”. The higher the value of the swappiness parameter, the more aggressively your kernel will swap.
08-17-2017 11:05 AM
looking at two appliances near me:
appl. vers model shelves RAM uptime swap size swap used 1 2.7.3 5230 4 x 32GB 125GB 94days 65GB 21GB 2 2.7.3 5230 4 x 32GB 125GB 92days 65GB 21GB
08-17-2017 12:23 PM
I'm generally a bit hesitant to make changes such as this, since this is in my production NetBackup environment...
I shall take it under advisement, however.
08-18-2017 06:01 AM - edited 08-18-2017 06:05 AM
Well - overnight the appliance swap usage went right out the window. Currently over 35GB of reported swap use. And it appears to be due to a number of HUGE VADP jobs (bpbkar processes specifically).
I'm going to try to kill those particular VM jobs and see if the swap space recovers.
08-18-2017 06:04 AM
08-18-2017 06:14 AM
Using NBU/Appliance 8.0/3.0 version on all systems.
The only EEB applied is the fix for the Apache Struts vulnerability: ET3913599
08-18-2017 06:22 AM
Contact NetBackup support to see if this issue applies to your environment.
Your top should 6GB for one backup, it might be worth identifying what VM PID 279420 was backing up to determine why it was using so much memory.
08-18-2017 08:50 AM
I'd agree with this recommendation. That definitely falls into my definition of 'excessive' now :-).
I can't directly comment on whether it is applicable, but Support should be able to make the determination. At first glance it seems that it would apply to your situation, IMO.
09-06-2017 02:16 PM
Just as a follow-up. I have not had a recurrence of this issue since my last reboot (about 3 weeks ago). I continue to monitor the situation, however, and will probably open a case should the issue return.
Thanks for everyones' input.
01-15-2018 03:58 PM
Well, this has been an on-again, off-again issue for awhile. We finally got some extra capacity (a new 5240) and I have moved some large bits of the environment over to this new media server - including a few large policies from the above-affected 5230.
And guess what? The problem has now moved from the 5230 to the 5240. Once this system gets up to about 25GB of swap in use *ALL* operations seem to get horrendously slow - until I restart both NetBackup services and Infrastructure Services (mongod, tomcat, rabbitmq) and get the swap use back down to reasonable levels.
But the swap usage seems to always crawl up and eventually cause issues.
I've got a case open - the 1st thing that was suggested was to reduce the cache % in contentrouter.cfg from 75% to 50%. I was initially optimistic, but it did not seem to address the issue of excessive swap usage.
Will keep this thread updated as I learn more.