We have a server 2003 R2 machine, which primarily is running as our NetBackup Master server.
In addition to this we have a second machine which is/was setup as in a cluster using Veritas Storage Foundations VVR to replicate the data and VCS to handle the application side of things.
About three weeks ago we noticed that the disk write performance was very bad. Using a tool provided by NetBackup to write zero's to the file systems, we noted that it would start off fast, and then performance would drop significantly to the point where it was writing about 0Mb/s or 1Mb/s. This is present on both SAN volumes and local volumes. The same write profile is observed.
We also see disk queue lengths go up to 130, but no higher than this
This is the case on both of the servers running VCS/VSF, but is not present on any other servers running NBU but not running the storage foundation software.
There is no VxCache memory config enabled on the server or any disks.
All the disks are reporting as being misaligned.
Some time ago VVR was disabled/broken and has not been reinstated as we do not have Symantec support on the product, and noone here knows how to use it.
So I am reaching out to anyone who may be able to point me in the right direction. I have read the admin guide and it has given me a bit more understanding of the components in the software, but has not provided much in the way of troubleshooting this write performance problem.
Today I will run IOMeter to test reads/writes and a combination of both.
VSC/VSF version is 5.1 SP1
When it comes to performance related items, typically the issue comes from somewhere further down the disk storage stack as Storage Foundation does not control or manage I/O.
You mentioned that this began 3 weeks ago. Did anything change before then on the servers or the environments? Any patches or updates?
Also, you mentioned that the disks are reporting as "misaligned". I am not familiar with this term. Exactly what is reporting this?
Also, you mentioned that VVR is disabled/broken. Can you be a little more specific? What state is replication in? In VEA > Replication Network, what do you see? Make sure and connect to both servers within VEA when checking this.
Once it is Windows, have you checked for the fragmentation?
It should be good to put VVR up again and switch the NBU to the other to check if the problem still stands, this can be used to validate if the perfom issue is or not within SF stack.
Thanks to you both for your replies.
The disk is not fragmented, we were testing this out with a brand new SAN attached disk to the server with no data on it. We thought originally it might be the tier of disk we were using, however we presented a tier1 disk (7+1 raid5 on 15k SAS) and it experienced the same performance profile.
Misaligned file systems are when the starting offset of the filesystem is not a multiple of 8 (most arrays have 32k blocks) so starting offsets of 32.5 means a single filesystem block will pass this boundary and result in two blocks being written for every single file system block. (causing a 100% write overhead)
We checked this on the other NBU server (at the other end of the VVR) by connecting a new disk to it and running the same test - it yeilded the same results. We then performed the same test on a server without SF and it performed very well - without the same write profile!
Supposedly 'three weeks ago' a failover test between the VVR nodes was performed (but failed) and we think this is when the performance issues started. I did not perform the test, and so what was done exactly is not known.