Netbackup 5.1 MP5 Storage Unit Group optimization ...

Jason_Voyles · ‎11-09-2006

Quick description: We have a large environment about 160 IP clients backing up to two media servers that we tried to setup in a Storage Unit Group. Once this was done, the jobs take over twice as long to complete with two systems vs. just ONE media server doing all the work. I know, pretty incredible, but true.

The problem comes when a job finishes writing and before it closes out the job. The backup then proceeds to wait about 4-5 minutes per job when about 3000+ jobs are queued. As backups start to run through, the job completion time's improve to the point when it's only about 5 seconds wait time. This will happen on all jobs, even excluded directories that only backup the 32k mount path.

Once the policies are switched back to using only one storage unit on either system, the problem goes away and backup jobs complete within seconds.

Also to note, we have two other smaller master server environments that use storage unit groups that do not appear to be having this problem.

** Question **
Has anyone seen problems with overall run-times on jobs that run with storage unit groups vs. directly assigned storage units?

Our environment is highly tuned and optimized as noted below. We have been fighting this issue for over a week with Symantec Support and no real progress has been made. Any suggestions would be highly appreciated. Thanks!

So here are the details of the environment:

**** Configuration Notes: ****
Software: Netbackup 5.1 MP5
Master OS: Solaris 9 with patches.
Media server OS (backs up IP clients): (2) Solaris 9 with patches.
Additional dedicated media servers(backs them selves up): (24) Solaris 8 and 9

Implementation of the following have been done on all systems:
http://support.veritas.com/docs/238063 (/etc/system settings)
Our Shared Memory settings are half of physical memory (8GB) on both IP media servers.

Testing and logs show that the problem does not appear to be related to:
http://support.veritas.com/docs/275976 (SLAVE_CONNECT_TIMEOUT)

Further tuning of the tcp connect timeout value has been tuned lower on the the Master Server to help with available socket connections.

The SSO Scan host is currently set to a media server that is not backing up IP clients and is not on our master server. Our network utilization is expectedly high during backups and no network issues appear to be the cause of the issue. The two IP media servers are also on separate subnets.

Dennis_Strom · ‎11-09-2006

I have no idea. Looks like you have done your homework. If nother else do not let this die. when you find the answer please post. I could setup a storage unit but have not, just to keep things simple.

h_m · ‎11-10-2006

If you've got a call opened ask what level of support you are dealing with, sounds like by now it should be quite high and they should be talking to the back line engineers. Dont be afraid of asking to speak to the duty manager and explaining your situation, and that you need the issue resolved asap, then they should give it more of a priority.

VOX

Netbackup 5.1 MP5 Storage Unit Group optimization problems?