cancel
Showing results for 
Search instead for 
Did you mean: 

TS3200 : 4 Drives SAS : How many HBA's ?

Matthew_Cowen
Level 3
Partner Accredited

 

Hi All,

Pretty much all in the title, but how many HBA's are necessary to drive 4 LTO drives in a TS3200 simultaneously ?

Visibly, 1 3Gbs SAS HBA with an interposer doesn't work... lot's of errors when trying to run 2 simultaneous backups using NetBackup 7 :(

Any pointers would be helpful as I haven't found anything explicit in the docs, rebooks etc.... unless my blindness has caught up with me again ;)

Cheers.

Matthew

 

6 REPLIES 6

mph999
Level 6
Employee Accredited

LTO-4 at full speed is (approx) 120 MB/s

= about 980 Mb/s

 

So you need 980 Mb/s throughput availble per drive.

 

Martin

jim_dalton
Level 6

LTO4 typically come with 4gbit HBA...so somewhere between mph999's numbers and mine.

Naturally you'll need to be able to drive these with data incoming at the same rate, otherwise the point is moot.

My concern would be why you get errors..seems like an unusual way for a system to react. More normal would be to a throughput bottleneck or plateau but then I'm not familair with your configuration or your hardware.

If you think the errors are relevant, dont hesitate to reveal.

Jim

Matthew_Cowen
Level 3
Partner Accredited

Martin,

 

Cheers for the reply but unfortunately that doesn't really help.

 

The HBA is 3Gbs and should be plenty fast enough, no ?

But when 4 drives are connected to it (using the 4:1 SAS Interposer designed for the job) there are errors in NBU, like:

 

"allow overwrite operation failed to media id NU0106, drive index 2, address 2548826"

 

Apologies for not including the error mesage (just got it from my client)...

 

Matthew

mph999
Level 6
Employee Accredited

Hi Matt,

OK, I see now.

OK, if you had performance issues, then I think you would have exactly that, poor performance, but no error.

This - "allow overwrite operation failed to media id NU0106, drive index 2, address 2548826

does not suggest a performance issue ...

That looks to me like some issue overwriting the labels, (various types of tape labels can be overwritten by NBU).

You say though, that the issue only happens when running 2 simultaneous backup jobs.  At various times, NBU does actually overwrite the media header with a new one, so this might explain why it is attepmting to do this.

Can you fill in a bit more detail:

We understand that you see the errors with 2 jobs running.  Is this happening on tapes that have already been successfully used, or, can be successfully used after 'a failure' if they are run as a single job.

Also, how are the drives attached, are mutiple drives going into one port on the HBA - I'm wondering if this is a firmware problem of some sort, BUT, only if the issue is exactly as you describe (which I am sure it is).

It makes little sense why such an error would appear if 2 jobs run, but not for 1 job (which kinda rules out a config issue).

Martin

 

 

Matthew_Cowen
Level 3
Partner Accredited

Martin,

Thanks again for the quick reply.

I'll get onto my client right away to get the details you need and post here ASAP.

 

Cheers,

 

Matthew

mph999
Level 6
Employee Accredited

Hmmm 

This is from another post I found on Connect ...

 

4/17/2011 5:35:59 PM - positioning 3144L1 to file 64

4/17/2011 5:38:24 PM - Error bptm(pid=6068) allow overwrite operation failed to media id 3144L1, drive index 1, address 3139778

So - we see in this example it is failing when positioning to a tape that has previously been used.

So - although I don't know the details of your environment at the moment, this is useful.  I haven't seen this error before, and believe it or not, I can't find it in the Symantec database ...

At first I thought it was related to the media header, maybe it can be, but we see now that it is also related to when a media that has images, is 'positioned'.  ( it is positioning to file 64 ).

There are a couple of tapemaks at the end of the last backup on the tape.   The tape is positioned to the end,  then if I recall correctly, it rewinds 2 and goes forward 1 - so it ends up positioned between the two tape marks.  The backup is then written from that point.  It then 'overwrites' the last tapemark

All these operations (positioning) are carried out by the operating system, not actually by netbackup - though it is seen in the bptm log.  (we'll need this as well please at VERBOSE = 5 / GENERAL = 2).

I can't honestly suggest this is going to be NBU ... why ...

1.  Apparently so I am told, NBU has the odd bug here and there ,,,

2.  Tape positioning isn't one of them ... why ...

a)  We aren't positioning anything, the OS is (on behalf of NBU)

This isn't to say NBU cannot cause issues with positioning, it can, but it is very very rare (when you consider that almost every one who has NBU positions tapes, then consider the fact that I can't find this 'error' in the Symantec database, suggests we haven't seen it very often ...

b)  As everyone who uses tape has to position them, it is one of the most frequently used features of NBU, we've had pleanty of practice ...  As mentioned we don;t actually do the positioning, but we do request it, so I guess there is room for error, however small ...

3.

As I am sure you will undestand, 'how likely' something is to be caused by NBU is a valid part of troubleshooting.  If NBU is the least likely cause, it is not very sensible to try and find the error is NBU as ...

It probably isn't there 

Even if t is, it will probably be very hard to find

It makes sense to eliminate the most likely causes first ...

 

 

It would actually be easier If I could just blame NBU - that was I could find a quick solution.

However, when tapes are postioned (or requests sent ..) the same code is used it one drive is in us, or more than one .. 

I have however seen very very strange issues caused by HBAs (usually a firmware issue, but also when faulty) so this is where I would start looking.

So, lets go with this ...

 

Details I previously requested

bptm log (at verbose 5 (and general 2 if windows media server)

tpcommand log (put VERBSOE into vm.conf and create empty file /usr/openv/volmgr/DRIVE_DEBUG )

(+ restart ltid)

system message log 

 

I don't think the logs are going to show much (will probably just show the error we already know) but we'll look anyway ...

Certainly an odd one ...

Martin