Solved: Why the NET_BUFFER_SZ dramatically improve the bac...

Gavin_Tu · ‎05-09-2014

Hi All,

I have a real world issue and I managed to resolve it, but I am more insteresting in the why the NET_BUFFER_SZ works.

Below is the description:

Issue

There is a newly installed NBU client at 7.5 level in the HP 11.31, IA64 platform.

When I try to backup the oracle database I found out the backup speed is slow and this scenario also happens for the files backup.

The file backup policy is defined to backup the path at /stand which has about 366M.

Error

The job detailed shows the speed is at 1mb/s for the file backup, but manually copy the files between the client and the master is faster.

The bpbrm, bptm ,bpbkar process run fine without any issue, but we see the bpbkar spend 7 minutes to send out 366M data, the speed is 366M/7=52M/minute=0.87M/second

13:00:33.197 [20457] <2> logparams: bpbkar32 -r 604800 -ru root -dt 0 -to 0 -bpstart_time 1394686961 -clnt domsgs-BackupDB -class test_stand -sched full -st FULL -bpstart_to 300 -bpend_to 300 -read_to 300 -blks_per_buffer 512 -use_otm -fso -b domsgs-BackupDB_1394686661 -kl 28 -use_ofb

13:00:33.301 [20457] <2> bpbkar SelectFile: INF - Resolved_path = /stand
13:00:33.301 [20457] <4> bpbkar SelectFile: INF - VxFS filesystem is /stand for /stand
13:00:33.335 [20457] <2> bpbkar resolve_path: INF - Actual mount point of /stand is /stand
13:00:33.353 [20457] <2> fscp_is_tracked: disabled tla_init
13:07:22.596 [20457] <2> bpbkar resolve_path: INF - Actual mount point of / is /
13:07:22.596 [20457] <4> bpbkar expand_wildcards: end backup for filelist /stand
13:07:22.596 [20457] <4> bpbkar main: INF - Client completed sending data for backup

However, run the bpbkar to just read the data from the /stand speed 5 seconds:

bpbkar -nocont -dt 0 -nofileinfo -nokeepalives /stand

15:25:00.411 [22525] <2> bpbkar SelectFile: INF - Resolved_path = /stand
15:25:00.411 [22525] <4> bpbkar SelectFile: INF - VxFS filesystem is /stand for /stand
15:25:00.412 [22525] <2> bpbkar resolve_path: INF - Actual mount point of /stand is /stand
15:25:00.419 [22525] <2> fscp_is_tracked: disabled tla_init
15:25:05.063 [22525] <2> bpbkar resolve_path: INF - Actual mount point of / is /
15:25:05.063 [22525] <4> bpbkar expand_wildcards: end backup for filelist /stand
15:25:05.063 [22525] <4> bpbkar main: INF - Client completed sending data for backup

15:25:05.063 [22525] <2> bpbkar main: INF - Total Size:366915603

Reading the bptm log, found out:

12:57:46.126 [4748.8120] <2> io_set_recvbuf: setting receive network buffer to 1049600 bytes

Job detailed shows:

bptm(pid=2572) waited for full buffer 2370 times, delayed 92131 times

Environment:

Master/media: windows 2008R2 NBU7.5

Client: HP unix 11.31 IA 64 NBU7.5

Solution:

In the client:

echo "0" > /usr/openv/netbackup/NET_BUFFER_SZ

In the master:

Create the file NET_BUFFER_SZ at /install_path/netbackup and assign the value 0 into this file.

The NET_BUFFER_SZ=0 means the OS negotiates with each other and chooses a better TCP parameters to send out the data between the client and the master.

My question is that why the NET_BUFFER_SZ=0 dramatically improve my backup speed from 1m/s to 40m/s?

Please kindly let me know deeply about this including the TCP/IP related thing, etc.

Thanks for your help in advance!

Gavin

Mark_Solutions · ‎05-09-2014

It is all about hitting the sweet spot on all of your buffer settings - the values could be different for every backup type and environment and there is a correlation between the network and data buffers (size and number)

The 7.1 tuning guide goes through it all - gets pretty heavy but shows how all of the buffers interact with each other - sounds like you have found the perfect set of buffers for your setup

The tuning guide is here:

http://www.symantec.com/docs/DOC4483

View solution in original post

Mark_Solutions · ‎05-09-2014

It is all about hitting the sweet spot on all of your buffer settings - the values could be different for every backup type and environment and there is a correlation between the network and data buffers (size and number)

The 7.1 tuning guide goes through it all - gets pretty heavy but shows how all of the buffers interact with each other - sounds like you have found the perfect set of buffers for your setup

The tuning guide is here:

http://www.symantec.com/docs/DOC4483

ontherocks · ‎05-09-2014

When a backup is initiated, the client packages data of the amount specified by the Buffer_size value, and then transfers the information to the media server, which in turn buffers that data in the NET_BUFFER_SZ. When the NET_BUFFER_SZ is full, it transfers data to the array of space created by a combination of NUMBER_DATA_BUFFERS and SIZE_DATA_BUFFERS. As soon as at least one of the SIZE_DATA_BUFFERS is full, the information is written to the tape drive.

NET_BUFFER_SZ

Note the NET_BUFFER_SZ value is the size of the buffer on the media server that receives data from the client server.

To change the Net_Buffer_SZ:

1. Create the NET_BUFFER_SZ file in the <INSTALL_PATH>\NetBackup directory.

Note: The file name is case sensitive and must have no extension. The value in the file must be a multiple of 1024. If the file is not present, the default value 262144 (or 256K) is used.

2. Add the appropriate value to this file.

mph999 · ‎05-09-2014

From http://www.symantec.com/docs/TECH28339

The NET_BUFFER_SZ file is configured on a media server or UNIX/Linux client, but appears to be ignored.

Similarly, the HKEY_LOCAL_MACHINE\SOFTWARE\VERITAS\NetBackup\CurrentVersion\Config\Buffer_Size

registry key (Communication Buffer Size GUI setting) is configured for a Windows client, but appears to be ignored.

Historically, these settings have been used to adjust the TCP SO_SNDBUF and SO_RCVBUF on media server and client hosts. Those adjustments allowed the sending TCP stack to accept additional outbound application data while waiting for TCP acknowledgements, and allowed the receiving TCP stack to buffer a larger amount of inbound data while either waiting to properly sequence missing frames or waiting for the receiving application to read the already sequenced data.

Is there a reason these settings sometimes do not work or cause unexpected behaviors?

Error

A review of the debug logs, or the TCP windows size in the packets of a network trace, suggest the operating system (O/S) is ignoring the setsockopt API call to adjust the TCP memory for the socket.

The bptm debug log shows that NetBackup detected the new setting and called the setsockopt API to change the TCP send space to 256 KB, but the subsequent call to the getsockopt API indicates that the value remains unchanged.

<2> io_set_sendbuf: setting send network buffer to 262144 bytes

<2> io_set_sendbuf: send network buffer is 65536 bytes

The results are similar during the restore.

<2> io_set_recvbuf: setting receive network buffer to 262144 bytes

<2> io_set_recvbuf: receive network buffer is 65535 bytes

Similar log entries will appear in the Job Details at NB 7.1+ and also in the client side debug logs.

To see the debug log entries, turn up TCP logging on Windows clients and VERBOSE logging on media servers and UNIX/Linux clients.

Environment

NetBackup 3.x - 7.x

Cause

A bit of history and evolution

These configurable settings were added to an early version of NetBackup because operating systems (O/S) at that time did not commonly allocate very much TCP memory. The O/S was concerned with conserving an expensive and limited resource (RAM) and most network traffic was small bursts of data compared to the huge amount of bandwidth consumed by a full system backup. It was useful to have NetBackup request the larger amount of TCP memory to smooth out the data flow and provide better performance if a connection had significant latency or dropped frames frequently [requiring a wait for retransmission].

In addition, the early versions of NetBackup used what is now called legacy callback from the client to bptm, this allowed bptm to adjust the TCP memory before the socket was used.

Since those days, networking and NetBackup have evolved significantly. Network stacks now make use of complex algorithms to shape data transmission based on real time network latency, to smoothly and more accurately recover when frames are lost, and dynamically adjust the TCP memory based on overall real time system load.

Similarly, to accommodate firewalls, NetBackup has new processes (PBX and vnetd) which listen on the behalf of other processes and exchange a small amount of protocol over the connection before it is transferred to the end processes; bptm and the client processes specifically.

The resulting potential conflicts

Some newer operating system versions (Linux especially) do not allow TCP memory to be adjusted by the application once a connection has been established. This negates the efforts of bptm to adjust the TCP memory. See the TCP(7) man page for details.

Some operating systems (Linux and Windows especially) disable memory auto tuning if an application manually adjust the TCP memory thus preventing the system from automatically making the data transfer even more efficient when additional memory is available.

Network algorithms are increasingly complex and some driver versions are not well written and do not react well when applications requested to resize the TCP memory.

The results can be unexpected and/or undesirable. In some cases the operating system

silently ignores the setsockopt request (as in the examples above).
tries to adjust the memory but under-performs thereafter (see Related Articles).
tries to adjust the memory but has timing and/or data tracking problems and drops the connection after some time (see Related Articles).

Solution

A modern, well configured, operating system with properly written TCP drivers is unlikely to need TCP memory tuning by NetBackup. Accordingly the best NetBackup configuration is to disable tuning by placing a zero (0) into the NET_BUFFER_SZ file on media servers and UNIX/Linux clients. Simply deleting the file is not equivalent because some NetBackup process have default setsockopt API calls configured to overcome past external problems with various platforms and drivers.

Gavin_Tu · ‎05-12-2014

Hi mph999, thanks a lot for your reply and I know you referred the internal KB TECH28339 which I already review it.

I will update this topic later.

Thanks

Gavin

Gavin_Tu · ‎05-12-2014

Hi All,

There is another KB TECH189457 introduce the strace method of the bpbkar process shows the dealy.

You may need to review it in Inquria manager

Thanks

Gavin

VOX

Why the NET_BUFFER_SZ dramatically improve the backup speed?

Environment

Cause