cancel
Showing results for 
Search instead for 
Did you mean: 
AAlmroth
Level 6
Partner Accredited
  


Introduction

Symantec NetBackup is an enterprise class data protection product with a huge portfolio of features, platform and application support. Being used in heterogeneous environments can however lead to that the out-of-box operating system and NetBackup settings are not sufficient. NetBackup supports many platforms to run the master, media servers, and clients on, including Windows 2003 and Windows 2008 (also R2 in version 7). Unix and Linux platforms seems to be able to almost always better cope with the I/O load, whereas the Windows platform is not equally suited.

Fortunately, there are many tuning parameters available in the NT kernel, and some are relevant to NetBackup. This article will cover a few of these parameters and settings that I have discovered over the years working with NetBackup. Microsoft decided, for some reason, not to use good default values for high I/O load, although with Windows 2008, a lot more parameters are auto-tuned for this type of load. This article covers Windows 2003 as well as Windows 2008, and differences in tuning will be pointed out.

Being a backup product, I/O paths are the primary concern, as we need to optimize how the data can be moved in the best way to minimize infrastructure strains and keep the backup windows to a minimum. Also, what not everyone considers is that we also have to provide efficient means of restoring all the data, thus having fast I/O from tape and disk is an important part as well.

Note: All numeric values used for Windows registry parameters are in decimal mode, and not the default hexadecimal.


General considerations

I/O paths

For an application such as NetBackup a key factor for success is to design the I/O paths in such a way that the maximum throughput is made possible on the server’s backplane. Typically the data I/O enters through the network interfaces, goes via the CPU, and then sent on the tape drives or disks.

In regards to network interfaces, it is preferred to team multiple interfaces for ingoing traffic. This does require configuration of the network switch to allow IEEE802.3ad link aggregations. It is important to allow the switch to distribute incoming packets in order to fully utilize the bandwidth.

Normal host-based teaming usually only support failover and outbound traffic load balancing. For NetBackup, outbound traffic is seldom useful, unless vaulting between media servers or using a network based Disk Pool appliance such as PureDisk, NetApp or DataDomain.

In regards to HBA for SAN connectivity, the I/O for disk and tape should be split. Tape I/O is synchronous and can impact the disk I/O severely. Also, use several HBA ports in order to distribute traffic to the tape drives. E.g. a 4GBit HBA port can serve up to four LTO3 drives, but real world experience show that a maximum of two drives per port works better, due to I/O interrupt handling and other hardware and kernel constraints. Also, if possible rather use several single port HBA, and distribute over the available I/O slots in the server. This typically improves the balancing of the I/O on the backplane, CPU, and memory.


Persistent bindings

It is of outmost importance to configure any HBA with persistent bindings in order not to “confuse” the Windows kernel if a path disappears and then later becomes active again, or if the server is rebooted. In many cases the kernel will allocate a new internal path name, making the path NetBackup is using non-functional. The symptom seen in NetBackup is MISSING PATH in the Device Monitor. The various HBA vendors have their own tools for configuring the settings on the HBA, so refer to respective vendor’s documentation tools when configuring persistent bindings.


Software

Windows 2003 R2 with all applicable updates is preferred. Additional software for SAN connectivity is required when disk and/or tape drives are SAN attached, and should be the latest recommended versions
from the respective vendors.

Common on Windows is anti-virus software, and in order to maintain some performance, these must be configured to exclude NetBackup processes and directories. For instance, see tech note 295599 (Symantec, 2008b) for further information on excluding directories and processes for McAfee.


Services

On any Windows server there are many services started automatically. Some can safely be left started, but one service in particular should always be disabled; the Removable Storage service.
This service tends to interfere with NetBackup’s device management and should be disabled in the Services section of the Computer Management Console. Follow the instructions in tech note 245559 (Symantec, 2003).

As a consequence of disabling the Removable Storage service, the system may log events in the system log regarding DCOM errors. This errors are harmless and the workaround is presented in tech note 240378 (Symantec, 2008a).

Another consequence is when backing up the NetBackup servers, bpbkar process logs an error as the RSM is not running. This can be solved by excluding the <system_drive>:\WINDOWS\ system32\ntmsdata directory on those servers. Please see tech note 247001 (Symantec, 2004) for more information.


Device drivers

Always strive to use the latest supported combination of device drivers and firmware for network adapters and HBAs. Today, most drives come with default settings that are pretty much optimum, we can however further configure the OS kernel, in order to remove some overhead and issues.


Disable device driver verification

Windows 2003 and 2008 comes default with random testing of device drivers, and by disabling this we can gain better performance, as we really don't want the kernel to spend time on randomly testing drivers for debugging, which we know are working fine . This parameter is documented for Windows 2003 and 2008, but no conclusive evidence found yet for Windows 2008R2.


Parameter
Value
HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\Memory Management\DontVerify RandomDrivers 1
 


Disable Test Unit Ready

When using tape libraries and tape drives on a SAN where the tape drives are shared among several media servers, it is highly recommended to disable Test Unit Ready (TUR) functionality for the tape device drivers. Follow the procedure documented by Microsoft (2009a). The impact is primarily where NetBackup is configured for Shared Storage Option (SSO) for tape drives, as any Windows based media server potentially will send SCSI commands to the drives to check whether they are ready. In SSO configurations, a tape drive may very well be in use by another host, and any SCSI commands sent from another server would interfere, and backups and restore operations will experience problems such as slow performance or even failures.


Virtual memory

It is important to properly size the virtual memory swap file prior installing NetBackup. A general recommendation is to have a swap file at least two times the size of physical memory and it must be preset to that size, and not auto-extended.

The reason to this is when a swap file must be extended automatically, the I/O operation in memory will be denied and a failure reported in NetBackup (most likely a status 81 for the jobs). This in turn will effectively abort the backup job on the media server. This is a behavior of the Windows operating system, and can only be avoided by pre-sizing the swap file.


Storage system

Cache

By default the Windows 2003 operating system is optimized for file services, and thus will prioritize the file system cache in memory. For Media servers sending data directly to tape, NAS device, or other OpenStorage devices it may be better to optimize the kernel for applications instead. Media servers having Disk Storage Units (DSU) of Basic or Enterprise type may be better off with the default setting though, in order to have a file cache.

Two registry variables are of interest in tuning file system cache;

Parameter
Value
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\Size 3
 
The default for Size variable is 3 which will maximize throughput for both file sharing as well as network applications in general.

Parameter
Value
HKLM\System\CurrentControlSet\Control\Session Manager\
Memory Management\LargeSystemCache
0
 
The LargeSystemCache variable should be set to 0 in order to minimize the file system cache and thus allow more memory for network applications. On servers with plenty of memory, say 8GB or more, the settings may very well be left unchanged.


Disabling Last accessed

The NTFS file system records the last accessed time for each file and directory, adding to the I/O operations required when accessing files. An access is defined to any type of operation, such as directory listing, reading or writing or otherwise updating the file or directories.

If the last access information is not required by company or audit policies, the NetBackup master server can benefit from disabling it. As the catalog database consists of many thousands if not millions of files, having the kernel to update each file access adds overhead.

Parameter
Value
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\NTFSDisableLastAccessUpdate 1
 
The variable has to be added and use the DWORD type. Set the value to 1 in order to disable last access time stamping.


Disabling 8.3 file names

The NTFS file system keeps a short name for every file in order to maintain compatibility with older operating systems. However, this setting is not required for a NetBackup master server, and by disabling it, we decrease the number of necessary I/O operations per file creation. By disabling it, no 16-bit applications must run on the master server.
Parameter Value
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\NTFSDisable8dot3NameCreation 1
 


Networking

There are a number of TCP parameters that can be tuned in order to accommodate for typical NetBackup I/O. The I/O pattern for a Windows server is normally not a sustained data-transfer, but rather short bursts of I/O.


TCP keepalive time

There may be a delay in detecting the loss of a connection from a NetBackup master server to a media server. In certain situations, there can be a delay on a NetBackup master server before it detects that the connection to a media server has been aborted. For example, if a media server goes down while running a backup, there may be a delay on the master server before it detects that the media server is no longer available. While at first it may appear that there is a problem with the NetBackup master server, this delay is actually a result of a certain TCP/IP configuration parameter called KeepAliveTime that is set to 7,200,000 (two hours, in milliseconds) by default. Decrease the value to 900000 (15 minutes).

The effect of this delay is that NetBackup jobs running on that media server appear to be active for a period of time after the connection to the media server has gone down. In some cases this can result in an undesirable delay before the current backup job fails and is subjected to the normal NetBackup retry logic for execution on a different media server, if one is available.

Another scenario where it is important to use a low timeout is where a firewall is in the I/O path. Typically this is the case in secure networks or when taking backup of servers in a DMZ or otherwise untrusted network.

Firewalls typically drop the session if no traffic occurs for a set time. NetBackup does not respond very well to this, and the jobs will fail. This usually happens during incremental backups, as there could potentially take a very long time before the client sends data to the media server. Set KeepAliveTime to a value lower than the firewall's timeout, and this problem is solved.
Parameter Value
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime 0xDBBA0
 


TCPWindowSize and Window Scaling

In Windows 2003, the use of a larger TCPWindowSize for gigabit network interfaces should be set to the maximum value 65535.
Windows 2008 (and R2):this parameter is obsolete and disregarded by the kernel.

Parameter
Value
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpWindowSize 65535
 
For Windows 2003, it may also be useful to allow TCP window scaling in order to allow larger than 64KB size. Tuning this may actually not be necessary, but the trial method will have to prove whether it improves the I/O throughput. Windows supports the RFC1323 option.

Parameter
Value
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\Tcp1323Opts 1
 
The TCPWindowSize variable can set up to a value of 1GB. Once the variables is set and system rebooted the TCP/IP stack will support large windows.
Windows 2008/2008R2:As TCPWindowsSize is deprecated in Windows 2008 (and R2), this also holds true for Tcp1323Opts.


MaxHashTableSize

On media servers with many concurrent connections such as high multiplexing and many concurrent sessions to disk at the same time, it may be useful to set the variable to a higher value than default. The default is calculated as 128 * CPUs^2. Maximum value is 65535 (DWORD).

Windows 2008/2008R2:this parameter is obsolete and disregarded by the kernel.

Parameter
Value
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\MaxHashTableSize 65535
 


NumTcbTablePartitions

By default this variable is calculated on CPU^2. This may not be the best setting for servers with 8 or more CPUs. For most large servers it is better to use a value equal to 4 x CPU.

Windows 2008/2008R2:this parameter is obsolete and disregarded by the kernel.

Parameter
 Value
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\NumTcbTablePartitions 16
 


MaxUserPort

The default number of ports per IP address is only 5000. For a large NetBackup domain it may possibly not be sufficient in order to allow large amount of concurrent connections between Master server, Media servers and clients. The variable is really only useful on Master and Media servers, unless the client is heavily loaded as well, such as in cases when it serves as a web or database server.

Windows 2003 support up to 65534 concurrent ports per IP address. The variable does not exist by default, and must be created manually. The first 1024 ports are reserved, thus it makes little sense to set to max value. If a host has more than 60000 concurrent connections, we probably have other problems such as CPU and disk bottlenecks, but a value of 60000 would at least leave us ample room.

Parameter
 Value
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort 60000
 
In Windows 2008, including Windows 2008R2, the way of setting this has change and we use the netsh command to configure start port and the range. By default, the start port is 49152, and the end port is 65535. This leaves us with 16383 usable dynamic ports. If the NetBackup environment is very large, we may still have to tune the available range. This is done by entering following commands to allow 60000 connections;

netsh int ipv4 set dynamicport tcp start=10000 num=50000
netsh int ipv4 set dynamicport udp start=10000 num=50000
netsh int ipv6 set dynamicport tcp start=10000 num=50000
netsh int ipv6 set dynamicport udp start=10000 num=50000
 
The UDP ports are just set to have the same range, but NetBackup does not really use UDP ports.


Processes

Kernel threads

By default, the Windows operating system does not optimize the kernel settings for many concurrent threads. When the OS is started the kernel allocates structures for the kernel worker threads which will carry out the actual work that the running processes require, such as device driver I/O, the kernel itself and other internal components.

NetBackup put a very high load on the master and media servers as many processes are started on the master and media servers for each active job. Typically, the master server is maxed out with the default kernel threads settings when reaching a domain of approximately 300 clients.

We could spread the backup window for the clients, but that may not always be possible due to other constraints. What we can do is to allocate the maximum possible kernel threads, so that the kernel can serve as many processes as possible at any time.

We are interested in three variables covering kernel threads;

• DefaultNumberofWorkerThreads
• AdditionalDelayedWorkerThreads
• AdditionalCriticalWorkerThreads

The DefaultNumberofWorkerThreads control the number of threads allocated for each work queue in the kernel. Note: Allocating too many threads may use more system resources than what is optimal.
Delayed work threads are used for work which are not real-time or otherwise time-critical. Memory for these threads may be swapped out from CPU cache and memory while in queue.
Worker threads for time-critical processes have high priority and the memory pages must stay in CPU cache or memory.

Parameter
Value
HKLM\SYSTEM\CurrentControlSet\Services\RpcXdr\Parameters\DefaultNumberofWorkerThreads 64
HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\Executive\AdditionalDelayedWorkerThreads 16
HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\Executive\AdditionalCriticalWorkerThreads 16
 
All three variables use DWORD as type. The AdditionalDelayedWorkerThreads and AdditionalCriticalWorkerThreads variables should already exist, but the RpcXdr\Parameters\DefaultNumberofWorkerThreads path and variable will have to be created.

The AdditionalDelayedWorkerThreads and AdditionalCriticalWorkerThreads variables should be set to a value of 16, and DefaultNumberofWorkerThreads to 64.


CPU affinity

On media servers with many CPUs it can be beneficial to the I/O throughput to control which CPU’s handle network I/O and which CPUs handle tape or disk I/O. By controlling this we can tell the OS kernel thread scheduler not to do unnecessary context switches, but let the various I/O threads sit on their respective CPU. Context switching and memory page faults are very expensive in high I/O load applications such as NetBackup.

The CPU affinity can be configured by using the Interrupt Filter Configuration Tool (intfiltr.exe) available in the Windows 2003 Resource Kit Tools.
NOTE: Use great care when using this tool!!! And be on the physical console. The tool allows selecting the various devices present in the system. Select a network device and add it to the interrupt filter. Note: It may be necessary to select the “Don’t Restart Device when Making Changes” prior adding it to the filter in order to avoid service interruption or a crashed system.

Once the device is present in the filter, the CPU masking can be set by clicking on the “Set Mask” button in the “Interrupt Affinity Mask box”.

NOTE: Some devices may not work with the affinity setting. A reboot may be necessary, and if the device still does not work after a reboot, removal of the filter is required, and no CPU affinity can be used for that device.

On Windows 2008R2, the kernel provides a better control of resources using the NUMA (non-uniform memory access) architecture. Applications which demand high performance can be written so that the threads are distributed to several cores or maintained on a CPU. In general, using the principle of locality generates less context switches on the CPUs.

In Windows 2008, the intfiltr.exe tool has been replaced by the IntPolicy tool (Microsoft, 2007).



References

Microsoft (2003) Performance Tuning Guidelines for Windows Server 2003. [Online]. Available from: http://download.microsoft.com/download/2/8/0/2800a518-7ac6-4aac-bd85-74d2c52e1ec6/tuning.doc (Accessed: 22 July, 2010)

Microsoft (2007) Interrupt-Affinity Policy Tool. [Online]. Available from: http://www.microsoft.com/whdc/system/sysperf/IntPolicy.mspx (Accessed: 2 August, 2010)

Microsoft (2009a) Microsoft (2009) Windows Server 2003 cannot perform backup jobs to tape devices on a storage area network. [Online]. Available from: http://support.microsoft.com/kb/842411 (Accessed: 22 July, 2010)

Microsoft (2009b) Performance Tuning Guidelines for Windows Server 2008 R2. [Online]. Available from: http://www.microsoft.com/whdc/system/sysperf/Perf_tun_srv-R2.mspx (Accessed: July 21, 2010)

Symantec (2003) How to disable the Removable Storage Manager service to avoid conflict with VERITAS NetBackup. [Online]. Available from: http://seer.entsupport.symantec.com/docs/245559.htm

Symantec (2004) Problems report showing Removable Storage Management Win32 1058 error. [Online]. Available from: http://seer.entsupport.symantec.com/docs/247001.htm

Symantec (2008a) GENERAL ERROR: After disabling Removable Storage Management (RSM) services on Windows 2000 and 2003, the system event viewer log reports Evt ID: 10005. NtmsSvc DCOM errors. [Online]. Available from: http://seer.entsupport.symantec.com/docs/240378.htm

Symantec (2008b) 3RD PARTY: NetBackup Services are randomly shutting down on Windows servers. [Online]. Available from: http://seer.entsupport.symantec.com/docs/295599.htm

Symantec (2010a) Symantec NetBackup ™ Backup Planning and Performance Tuning Guide - UNIX, Windows, and Linux - Release 6.5. [Online]. Available from: ftp://exftpp.symantec.com/pub/support/products/NetBackup_Enterprise_Server/307083.pdf (Accessed: July 21, 2010)
 
Comments
TROE
Level 4
Thanks for the article!  One quick note - it appears that Removable Storage Manager is not installed by default with Server 2008 (at least not in Server 2008 x64 - haven't checked x32).  It does show up as an optional Feature in Server Manager.  One less thing to worry about if you're running on Server 2008.
StefanosM
Level 6
Partner    VIP    Accredited Certified
2008 (and windows 7) does not support tape devices for the embedded backup application. That's why there is no RSM.
Verry good
Nicolai
Moderator
Moderator
Partner    VIP   

Nice Article Andreas - Will we see one for Linux/Unix also ?
AAlmroth
Level 6
Partner Accredited

Isn't Unix and Linux already top tuned!?! :)

No, seriously, I do have a document for Solaris and Linux, and working on a section for HP/UX and AIX. I've decided to wait with publishing that article until those sections are in place.

/A

AAlmroth
Level 6
Partner Accredited
Thanks for the feedback. I will make a note and update the article in a future revision regarding the RSM in 2008.

/A
Saran_P
Not applicable
This article will help the Windows Netbackup administator to improve their Netbackup performance!
alazanowski
Level 5

Out of curiosity, what about Windows 2008R2 master/media servers with LARGE amounts of memory? like 16GB or more? I've noticed the memory utilization barely ever peaking 3GB, and was curious if more memory could be assigned for Netbackup to use.

Any idea what the network_buffer_sz setting should be?
AAlmroth
Level 6
Partner Accredited
In most environments, NBU doesn't use much memory really. Anything above 8GB is usually a waste. But there are exceptions...

In very large environments where each media server concurrently drives a lot of backup jobs, you should have enough memory to cover each bptm couple and their shared memory queue. If you use 64 data bufers 'a 256KB, each bptm couple allocate at least 16MB memory. for disk based backups I use 1MB buffers, so here it would be 64MB. Lets say a media server runs 150 concurrent jobs to disk, we would have ~2.5GB used.

When using local attached disk, be it BasicDisk, OST, then Windows can cache a lot of file system blocks. This is particularly useful when duplicating off from disk just after a backup.


I usually set NET_BUFFER_SZ to 256KB on master and media, and if multiplexing, I choose 32-64KB on the clients, else just 256KB on clients as well.

/A
UFO
Level 6

Like the article. Adding to Favorites. References is also useful.
Keep it that way, thank you!

riva11
Level 6

Nice article, thanks for sharing.

Pravs
Level 4
Employee

Nice peice of information. Thanks!

Kiran_Bandi
Level 6
Partner Accredited

Nice Article. Thanks for sharing.

JayDhillon
Level 4

Great article.Thanks

Zahid_Haseeb
Moderator
Moderator
Partner    VIP    Accredited

 

Request to Symantec:

if this is really a useful way of tuning Veritas Netbackup Server and there may be some more ways to tune the Netbackup . So I request to Symantec that design a utility (or add this functionality in the next release of Netbackup) which may run on Netbackup and the Netbackup set all these parameters if all or few are required on Veritas Netbackup Server to tune it

Just like select a parameter and click on SET

AAlmroth
Level 6
Partner Accredited

Hi,

 

It could very well be that SYMC is searching in all the forums, articles, etc, but I think the best way for you to request this, is to create an idea in the Idea section on Connect. People can then vote if they like the idea, and I know that SYM product management use the Ideas section to find new features and improvements for future releases.

 

/A

Zahid_Haseeb
Moderator
Moderator
Partner    VIP    Accredited

hmmm thats a good idea. Mr. AALMROTH i am giving you a VOTE

teiva-boy
Level 6

This idea would be awesome!  A GUI wrapper that you could change the variable, and hit SET or APPLY...  It would be the "NetBackup Accelerator Accelerator."  Okay, I'm sure the PM's could come up with a better name...

One that worked cross platform with BackupExec would be nice too....

LucSkywalker195
Level 4
Certified

Thank you ! This was such an awesome help!

AAlmroth
Level 6
Partner Accredited

Hi all,

 

I wrote this article back in 2010, and since then you will find the bulk part, and some additional information in the Symantec NetBackup™ Backup Planning and Performance Tuning Guide, chapter 11.

I'm glad that Symantec used this input as a foundation for further tuning as documented in their documentation.

/A

 

 

Version history
Last update:
‎08-02-2010 08:59 AM
Updated by: