G'day all,
we're having some fun trying to figure out where our bottleneck is in our backup environment; in particular backing up our NetApp NAS box.
Our master server is 5.1MP5 on 2K3 Ent Ed SP1 with 8GB RAM, dual 3.0GHz single core CPU's (HP DL580 G2) with 2 x LSI Logic U320 SCSI HBA's attached to 6 LTO2 drives in a Quantum P4000 (2 drives per channel, plus one channel controlling the robot). There are another 2 media servers, but lower spec'd and with a total of 4 drives between the two of them, all inside the same P4000 unit (I have more on that config if you need it).
The master server is the server/client that we use to backup our NAS CIFS shares through; that is, in our policy the master server is what we use to connect to the shares and the STU we use is on the master server also...
The STU is setup for 3 drives, 20 MPX and no max frag size. We are using HP LTO2 drivers at the OS level.
The policy in question is setup so that we have 14 streams (CIFS shares ranging from a few hundred GB to nearly 2TB's), MPX 5 and no limit jobs per policy; meaning we actually have 14 streams running at one time across 3 drives (5/5/4 split).
NUMBER_DATA_BUFFERS on the master server is set to 224, no SIZE_DATA_BUFFERS so default is 65536 and no NET_BUFFER_SZ meaning default 256Kb (or is it KB - I know it makes a difference!).
Having said that, under the client properties of the master server I have set "Communications buffer" to 257 kilobytes as suggested by Veritas (and a Veritas engineer).
Currently this little scenario results in backups taking anywhere from 20 odd hours (for the few hundred GB) to 3 days plus for the nearly 2TB. That bytes! (pardon the pun).
Here is my question - I have looked at our bptm and bpbkar logs from the master server, and I'm getting a lot of what I think is a good result in the bptm log of:
"fill_buffer: socket is closed, waited for empty buffer 0 times, delayed 0 times, read 747618304 bytes"
BUT for bpbkar logs, I get a lot of this:
"<4> tar_backup::OVPC_EOFSharedMemory: INF - bpbkar waited 7727 times for empty buffer, delayed 11643 times" and worse!
I've done some reading, but I can't get my head around where the problem acutally lies - with the client or the server?? My interpretation (and I'm probably wrong) is that we can't get the data from the client (NAS) quick enough to feed the tapes?
If someone could clarify for me that would be fantastic and MUCH appreciated. Any other thoughts/suggestions are truly welcomed also.
Thanks for listening!
Cheers
Mike