5230: Excessive swap usage?

elanmbx · ‎08-13-2017

Just wanted to see what others' experiences might be regarding swap usage on their appliances - I have one *very* busy appliance which recently was exhibiting performance issues... and upon investigation I saw that there was over 30GB of swap in use (of 64GB configured).

Upon investigation of my other appliances, I do see swap in use, but it is quite a bit lower (10-12GB).

I'm actually pretty surprised that there is *any* reported swap in use, since there always seems to be a bit of free memory reported on each of the appliances.

5230 Media Servers, Appliance Version 3.0

eduncan · ‎08-15-2017

It would require closer monitoring but at some point your appliance is running low and memory and seeking swap.

Too many concurrent jobs can cause that. Also maybe you have a job that requires more memory than the standard job.

See if you can identify why it is getting exhausted what is running and maybe reduce the number of concurrent operations.

Otherwise engage NetBackup support for a closer investigation.

eduncan · ‎08-15-2017

I forgot to mention you can look into vm.swappiness value on the appliance.

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Gui...

swappiness

A value from 0 to 100 which controls the degree to which the system favors anonymous memory or the page cache. A high value improves file-system performance, while aggressively swapping less active processes out of physical memory. A low value avoids swapping processes out of memory, which usually decreases latency, at the cost of I/O performance. The default value is 60.

A low swappiness value is recommended for database workloads. For example, for Oracle databases, Red Hat recommends a swappiness value of 10.

elanmbx · ‎08-15-2017

Like I said - all of my appliances are using swap space. The average seems to be around 10GB of swap space that is reported being used.

I've been keeping an eye on them, with a focus on the swap usage. The busy appliance that had the 30+ GB of use is pretty steady right now at 12GB. I'm going to see if at any point that starts to head up...

sdo · ‎08-16-2017

@elanmbx could you tell us exactly what and exactly how you are measuring, so that we can also feedback any figures, and compare apples with apples. Thank you.

vtas_chas · ‎08-16-2017

Is your 'busy' appliance doing VADP backups? Those are rather memory intensive.

In my experience, swap usage isn't unusual, especially on a system that is deduping and has a high concurrent job count. 'Excessive' is very subjective and a high level of swap use over time does NOT always indicate a problem, though it can be a symptom. I don't see appliances running over 50% of allocated swap very often, but 10% - 20% isn't unusual, especially if the systems have been up for a long period of time.

If you're running into backup window issues on the 'busy' appliance, I'd suggest logging a support ticket to see if there are balancing recommendations or other tuning recommendations to be made in your specific scenario/config.

Charles
VCS, NBU & Appliances

elanmbx · ‎08-16-2017

As far as monitoring goes - I'm simply watching "Top" and/or "MemoryStatus" from the CLISH. Nothing terribly scientific.

Mem: 131929932k total, 114189420k used, 17740512k free, 4436k buffers
Swap: 68272124k total, 14399800k used, 53872324k free, 14337192k cached

elanmbx · ‎08-16-2017

Yes, sir - the busy appliance is doing a LOT of VADP backups - including lots of duplications and replications of those VM images. It's quite busy - almost all day long.

I am going to continue monitoring this appliance - it doesn't seem to have issues wtih meeting our regular backup window until it has been up for awhile (last couple times was 7-10 days after a reboot) and then we start to see quite a few 196s as things just generally seem to slow down.

I'm going to keep an eye on this behavior and see if the swap usage and appliance bogging down are correlated...

elanmbx · ‎08-16-2017

Just out of curiosity - how big of a memory footprint does spad use up on your appliance(s) - in my environment it seems to be all over the map:

Media Server 1: 330MB
Media Server 2: 2.8GB
Media Server 3: 3.4GB
Media Server 4: 3.5GB
Media Server 5: 160MB

I presume it may be related to dedupe/rehydration tasks going on?

elanmbx · ‎08-17-2017

It's interesting - swap usage has been crawling up on the busiest appliance. Currently I'm north of 16GB of used swap.

All my other appliances are staying steady at 10-12GB of swap.

I'm going to continue to monitor to see if performance goes into the weeds as this swap increases.

eduncan · ‎08-17-2017

Please review documentation on swappiness:

https://www.howtoforge.com/tutorial/linux-swappiness/

Try changing the value to 10 and see how it affects your swap usage as:

Swappiness is the kernel parameter that defines how much (and how often) your Linux kernel will copy RAM contents to swap. This parameter's default value is “60” and it can take anything from “0” to “100”. The higher the value of the swappiness parameter, the more aggressively your kernel will swap.

sdo · ‎08-17-2017

looking at two appliances near me:

appl.  vers    model  shelves    RAM     uptime   swap size   swap used
1      2.7.3   5230   4 x 32GB   125GB   94days   65GB        21GB
2      2.7.3   5230   4 x 32GB   125GB   92days   65GB        21GB

elanmbx · ‎08-17-2017

I'm generally a bit hesitant to make changes such as this, since this is in my production NetBackup environment...

I shall take it under advisement, however.

elanmbx · ‎08-18-2017

Well - overnight the appliance swap usage went right out the window. Currently over 35GB of reported swap use. And it appears to be due to a number of HUGE VADP jobs (bpbkar processes specifically).

I'm going to try to kill those particular VM jobs and see if the swap space recovers.

eduncan · ‎08-18-2017

What NetBackup version are you using? Do you have any EEBs installed?

elanmbx · ‎08-18-2017

Using NBU/Appliance 8.0/3.0 version on all systems.

The only EEB applied is the fix for the Apache Struts vulnerability: ET3913599

eduncan · ‎08-18-2017

Please see:

https://www.veritas.com/support/en_US/article.000125566

Contact NetBackup support to see if this issue applies to your environment.

Your top should 6GB for one backup, it might be worth identifying what VM PID 279420 was backing up to determine why it was using so much memory.

vtas_chas · ‎08-18-2017

I'd agree with this recommendation. That definitely falls into my definition of 'excessive' now :-).

I can't directly comment on whether it is applicable, but Support should be able to make the determination. At first glance it seems that it would apply to your situation, IMO.

Charles
VCS, NBU & Appliances

elanmbx · ‎09-06-2017

Just as a follow-up. I have not had a recurrence of this issue since my last reboot (about 3 weeks ago). I continue to monitor the situation, however, and will probably open a case should the issue return.

Thanks for everyones' input.

elanmbx · ‎01-15-2018

Well, this has been an on-again, off-again issue for awhile. We finally got some extra capacity (a new 5240) and I have moved some large bits of the environment over to this new media server - including a few large policies from the above-affected 5230.

And guess what? The problem has now moved from the 5230 to the 5240. Once this system gets up to about 25GB of swap in use *ALL* operations seem to get horrendously slow - until I restart both NetBackup services and Infrastructure Services (mongod, tomcat, rabbitmq) and get the swap use back down to reasonable levels.

But the swap usage seems to always crawl up and eventually cause issues.

I've got a case open - the 1st thing that was suggested was to reduce the cache % in contentrouter.cfg from 75% to 50%. I was initially optimistic, but it did not seem to address the issue of excessive swap usage.

Will keep this thread updated as I learn more.

VOX

5230: Excessive swap usage?