A closer look at shared memory and buffers

mph999 · ‎09-21-2015

Data is sent from the client to a 'buffer'. From here it is sent to storage (tape of disk).

Think of data as 'water' and the buffer as a 'bucket' (in fact multiple buckets). The data (water) fills up the bucket, as you say, a tempoary storage area, before the water is tipped out of the bucket to the storage. Fairly simple.

The buffer is in fact 'shared memory' actual RAM. The size of these buffers is controlled by SIZE_DATA_BUFFERS config file, and the number of them by NUMBER_DATA_BUFFERS. The larger these values, the more memory you need.

Shared memory is not unique to NBU, it is one of the ways that processes can use to talk to each other. Other ways include message queues, sockets and named pipes.

The config files above are for tape, there are other files for disk eg. SIZE_DATA_BUFFERS_DISK which usually works well with higher values, and also files for fibre transport (SAN clients).

A misconception I will mention is restore issues. Quite often changed are made to SIZE_DATA_BUFFERS when a block size error is seen. This has no effect as it is impossible to change the block size once the data is written. The block size used for restore, is the same as used for backup.

Which processes write / read from the data buffers changes, depending on a couple of factors - this is where it gets a bit confusing.

1. Remote client (eg over the network). bptm child and parent process

bpbkar sends the data from the client to a tcp socket on the media server. A child bptm process sends it from here to the shared memory. A separate bptm process then reads it from shared memory and sends it to the OS, which then sends it to storage (I could resist mentioning it - NBU does NOT write to storage, the OS does, which is why I've mentioned bptm sends the data to the OS). For the record, NBU does read data from storage either, guess what, yep - it's that pesky OS again ...

2. Local client (eg media server backing itself up) (Only bptm parent process)

Here bpbkar sends the data direct to the shared memory. There is no need to goto a tcp socket first as we're not going over the network, so this is more efficient. Then, like before, bptm reads it from shared memory.

If you look in the manuals, you will see we don't talk about process names (bpbkar etc), we talk about a data 'provider' and 'data consumer'. In (1), bptm is the provider and consumer, in (2) bpbkar is the provider and bptm the consumer (so which process 'provides the data to shared memory and which process 'consumes' the data from shared memory.

There is a 3rd possibility.

3. Local client (eg media server backing itself up) but with NOshm touch file.

There is a belief that the NOshm means shared memory is not used, which is understandable, given the name of the touchfile. Unfortunately, that is not true.

What NOshm does, is make a local client behave like a remote client. It means in other words, that you will get another bptm process. So like a remote client, bpbkar sends the data to a tcp socket (even though we're not going over a network) and a child bptm process is used to send it to the shared memory- it is usually used for troubleshooting.

There are a few other misconceptions about shared memory, or buffers as we generally call it. The size and number of buffers affects performance. Generally for LTO drives, a size of 262144 is good (= 256k) and a number of 256 buffers.

Increasing these generally is a waste of memory (you need more memory remember, if these increase) as if it's running at max performance (that is the data is filling the buckets about the same rate as it is leaving the buckets) then having more, or bigger buckets isn't going to help, as either the storage can't write any quicker than it is, or the client (bpbkar) can't send the data any quicker (usually a network limitation, less usually, a disk read speed limitation).

The amount of shared memory required can be calculated:

Total memory = (size data buffer x number data buffer) x mpx_value x number_drives - so we use a chunk of memory for each data stream sent to each tape drive, it soon adds up if you start using mpx.

In fact, this equation is not quite correct (it's close enough, and is what is in the manuals), but for the full story ...

NBU rounds up the number of streams to the nearest factor of 4..

It's easier to show by example:

From my bptm log, we see I have x12 data buffers, each with a size of 131072 ...

08:55:45.665 [18200] <2> io_init: using 131072 data buffer size

08:55:45.665 [18200] <2> io_init: CINDEX 0, sched Kbytes for monitoring = 60000

08:55:45.665 [18200] <2> io_init: using 12 data buffers

... therefore, each tape drive, or each stream to a tape drive will require 131072 x 12 = 1572864

Now this example is actually from a MPX backup with 2 streams (mpx = 2), so you might think that the amount of shared memory will be 1572864 x2 = 3145728

Now, here is the catch ...

Looking down my bptm log I find these lines ...

08:55:45.748 [18200] <2> mpx_setup_shm: buf control for CINDEX 0 is ffffffff79a00000

08:55:45.748 [18200] <2> mpx_setup_shm: shared memory address for group 0 is ffffffff76800000, size is 6291456, shmid is 117440636

08:55:45.748 [18200] <2> mpx_setup_shm: shared memory address for CINDEX 0 is ffffffff76800000, group 0, num_active 1

So we see the amount of memory is 6291456

Now, 6291456/ 1572864 = 4

So, what has happened, is even though I have one tape drive, and 2 streams the amount of memory NBU will allocate is the amount of memory required by x4 streams - NBU has 'rounded it up' . In fact, it will round up to the nearest factor of 4, so it you have 5 streams, it will allocated the same amount of memory as if it was 8 streams. NBU has always worked this way and it is done for efficiency reasons.

kwakou · ‎01-20-2016

Thanks for this very handful article.

I have a question though: Can the config files that control the buffer settings work in AIX ? I've read that they have no effect due to the dynamic management of the memory in AIX.

Do you know anything about it ?

mph999 · ‎01-28-2016

Sorry for the delay in my response.

The settings NUMBER/ SIZE_DATA_BUFFERS should work fine in AIX.

VOX

A closer look at shared memory and buffers