cancel
Showing results for 
Search instead for 
Did you mean: 

NBU Encryption issues - Quantum and Symantec

HoldTheLine
Level 4

I could use some advice here - currently have cases open with the vendors but wanted to get some outside perspectives as well.

 

We are using NetBackup for a data migration, nothing fancy: Backup the current data at site A, ship the tapes to site B and restore.

Info for both sites:

site A:  Spectra T120, LTO5 drives, NBU 7.0 (I could not convince them to upgrade and we are doing catalog recoveries, so stuck at 7.0)

site B:  Quantum i500, LTO5 drives, NBU 7.0

 

It's been problematic - since day one the restore have been painfully slow, using LTO5 drives and 2-8gb connections I was seeing restore speeds no better than 5mb/sec.  There was a lot of troubleshooting done, cases opened and closed, optimizing, etc.  Nothing seems to break that barrier of horrible throughput.  We tested outside of NBU, network and disk speeds are just fine, there is no reason those restores should perform so poorly.

Then I had the idea to rule out the incoming tapes, try a local backup and restore with fresh tapes.  Bingo!  Decent speeds, no errors, etc.  Until I tried to use KMS - trying to write encrypted backups fails with these types of errors:

 

 Error bptm(pid=2272) FREEZING media id <Media ID>, Encryption unavailable for an ENCR pool 

Now this is where it gets odd - KMS has been replicated from site A to site B, and from the begining there were never any indications that anything was misconfigured; the tapes coming from site A were always able to be read, and running nbkmsutil on both sites shows identical info.  I followed the simple instructions on exporting/importing keys so no surprises there.

The case I opened with Symantec found errors in bptm that point to the hardware being the issue - Quantum got involved and are confused as well.  Heck, I am confused too - the tapes I am not able to create encrypted backups on are from the same pool we use in our production site, there are no issues with them.  I looked at them personally, they are in fact LTO5 tapes, no damage, and so far six of them give me that Encr error, yet I can run a non encrypted backup and restore with  them with no problems at respectable speeds,

 

While I wait for Symantec and Quantum to review logs I am still poking around trying to find clues, if anybody has seen anything like this please let me know.

 

 

 

 

 

43 REPLIES 43

jim_dalton
Level 6

WR is confused, me too! Its catching.

I think we all need a rundown of what does what where, followed by a stiff drink.

I had another idea: if the restore does work, is there a chance that you have a monumentally large MPX setting, combined with a huge number of small files? This scenario could slow thing down majorly but it would need to be mpx 100 and sub 1k files. Im guessing at the numbers but both slow things down so in combo you'd get the (un)desired effect.

Jumping ahead a bit tell us about the lifecycle of a single large file from backup through to restore, preferably carried out in isolation to everything else.

Jim 

HoldTheLine
Level 4

Wait a sec... Does this mean you can't write a new tape?

 

Sorry if it's confusing - I am still able to write to new tapes.  Just not encrypted - i.e. if set the volume pool to anything but ENCR_ it works.  As soon as I try to wrtie to the encrypted pool which matches the keys in KMS I get the encryption errors.

HoldTheLine
Level 4

I had another idea: if the restore does work, is there a chance that you have a monumentally large MPX setting, combined with a huge number of small files?

 

At first the backups were configured at an MPX of 8 - which was my first suspicion when the restores were going so slow.  Since that time they have been kicked down to 2 - about as low as they can go in production.

 

Good thought, MPX can be killer on restores but for this particular issue it has already been addressed.

HoldTheLine
Level 4

Can you advise of the tape brand ? This may be different from what is written on the tape (Eg. Oracle branded tapes are actually made by Imation (or at least they used to be)). If you look in the bptm log, you should see the manufacture listed - I guess searching for 'man' might narrow down the search.

 

Looks like they are Fuji:

 

 10:08:40.000 [2876.3080] <2> manage_drive_attributes: Reported medium manufacturer [FUJIFILM], sn [EPAMR4GD2U]

 

There is a case open, I am not sure if it would be kosher to post it publically - would it?  The tech I am working with is pretty good so it's not like I want to bash him or anything :)

HoldTheLine
Level 4

Update - I was finally able to get a good encrypted backup.  Using the tapes that were sent from the site with the T120.  Still investigating why we can write to those tapes and no others, might be the barcodes.

The barcodes for the media that work are 9 characters in this format:

 

#######LA

 

Where # = a digit

L is static

A = alpha

 

So an example would be

2880248LA

The only way to get NBU to work with them is to set up a media ID generation rule to chop off the first 2 characters and the last character, so the ID above would show up in NBU as 80248L

 

The tapes that I was never able to write to have a format of S51234L5 and would get read in as S51234

Sort of scratching my head here but we are not done looking into it, especially since we have a lot of data to move and need to make sure we get media that the T120 can write to and the i500 can read from....

 

jim_dalton
Level 6

Struggling to believe its the barcode rules...that determines how the robot recognises media, picks them out and mounts, verifies the labels. If the media werent the right media it wouldnt be picked nor mounted, yet media have been selected, we are beyond that point...but I'm happy to be enlightened!

There is something else yet to be revealed  i feel.

jim

HoldTheLine
Level 4

I agree, seems like either a barcode is readable or it's not.  The issues we are seeing are just bizzare - Quantum took it to IBM and they are stumped as well. 

 

This is really an odd one.

mph999
Level 6
Employee Accredited
Just occassionally, you find an issue with a cause that makes no sense. If there is one thing I have learnt, never evevr rule anything out ... For example the issue I saw with 0's on the VTL tape where the header should have been, caused as far as we know by OS tuning setings - how the hell does that happen ?

Will_Restore
Level 6

>>Quantum took it to IBM and they are stumped as well.

 

OK we are 'dying' here.  Any progress??

HoldTheLine
Level 4

Sorry, it's been really crazy around here.  Not real progress - still working with Quantum and we installed some IBM tool to get logs for the drives after the failures.  I did recieve a couple of test tapes from them. sort of a "Try these tapes that SHOULD work " and am seeing the exact same results -

 

Encryption unavailable for an ENCR pool  

 

Is there some double checking I can to in KMS?

 

 

Will_Restore
Level 6

verify bptm log ouput per Article URL http://www.symantec.com/docs/TECH87444

A backup policy is configured to use media from a pool name with the prefix "ENCR".

This is the trigger for the bptm process to enable encryption in the tape drive. The bptm process mounts it's tape then checks that encryption is possible, given the selected tape and drive.

It logs the results of its checks in its bptm log file; for example:
  16:54:17.552 [8584] <2> manage_drive_attributes: report_attr, fl1 0x00010049, fl2 0x0000000c

<snip>

Check the value for "fl1" in the bptm log. In the example above it is 0x00010049 and this was for an LTO3 media. When the correct media is loaded, the value is 0x20000 greater. In this example, if LTO4 media is used, the fl1 value is 0x00030049

 
Bit 0x00010000 indicates the Drive supports Encryption.
Bit 0x00020000 indicates the Media supports Encryption.
If both the drive and media supports encryption, these values will be added together (0x00030000) in the fl1 field.

HoldTheLine
Level 4

It logs the results of its checks in its bptm log file; for example:
16:54:17.552 [8584] <2> manage_drive_attributes: report_attr, fl1 0x00010049, fl2 0x0000000c

<snip>

 

This is very familiar to me - we have been looking at these exact entries.  The thing is, we always DO see the fl1 entry after the backup failure but it doesnt't really help pinpoint what the problem is.  Why?  Because:

 

- These are LTO5 drives that most certainly DO support encryption

- Using LTO5 tapes, ditto.  They DO support encryption.  Even when that entry says otherwise.  

 

When we see the fl1 entry in BPTM that says "Sorry, this drive and/or tape does not support encryption" yet we know that the drive and tapes in fact should support encryption, we have a false positive and are back to the drawing board.

 

Hope I didnt muddle things even more - short story is, in this case, that bptm entry is not useful in shedding any light on the issue because we know that all the components are capable of encryption.

Will_Restore
Level 6

If fl1 in your log is not 0x00030049 then encryption is not supported.  It's not a false positive. It's a positive negative blush

Computers are very logical and we just have to figure out what they are telling us.  We had discovered some media that we thought was LTO4 was really LTO3.

 

HoldTheLine
Level 4

If fl1 in your log is not 0x00030049 then encryption is not supported. It's not a false positive. It's a positive negative

Computers are very logical and we just have to figure out what they are telling us. We had discovered some media that we thought was LTO4 was really LTO3.

 

No argument here about the logical nature.

 

But it just makes no sense - a tape gets encrypted backups written to it in one library.  I load that same tape into another library,  do a long erase just to be safe, try to run an encrypted backup and see the BPTM entry above -

 

Since at one point it was able to be encrypted, the vendor doesn't see anything wrong with the new library/drives, the only thing left I can think of to look at is KMS.

?

 

mph999
Level 6
Employee Accredited
manage_drive_attributes - looks like that comes from the drives, so my thoughts ... 1. The drive is screwing up somehow eg. firmware issue 2. Could NBU be getting it wrong ... - as in what is in the log isn't what is reported by the drive I can only think of one way to check 1. - san analyzer. he prblem with this is that you probably don't have one and two understading the output (with no offence intended but they through out a log of data ...). The vendors will almost certainly be able to lend one, and if you are not familiar with the output (and I'm not ...) - between the vendors the understanding part can be done. 2. If the values come from the drives I doubt we do any processing on them but I'll have a look in the code. I'm not a programmer so if I can't conclude anything I suggest a case is opened with us (if not done already) and we work together via TSAnet if necessary to see what is going on. I image (but cannot promise) that this could go pretty much straight to Engineering. Just to save me reading the complete thread again, is there anything common that has been spotted, or are the failures pretty much certain - ie. are there any 'patterns' to the behaviot.

Magpie1888
Not applicable

Looks like you have been digging pretty deep here, apologies if this is obvious, but have you checked the encryption settings for the library partition on the i500 its self?

There are 3 possible settings, it need to be ‘Allow Application Managed’, if this is not set correctly you cannot create an encrypted backup, even if KMS is correctly installed.

There is also a FIPS setting, if this is set I believe that the drives are only allowed to read/write encrypted tapes, so make sure this option is un-set.

Clutching at straws here, but this would explain why you can’t create encrypted tapes

Getting even more desperate, during restore, the drive will make a SCSI call to the media server for the key, it’s obvious that KMS is returning the key (you’re getting data back), is the library seeing the drive seeing the drive go to encrypted mode & having a nasty library/drive handshaking argument which kills restore performance

Must the musing of a mad man, let us know

HoldTheLine
Level 4

Sorry for the lack of  updates - have been away a while and am dusting this off again.  To answer some of the questions:

 

Just to save me reading the complete thread again, is there anything common that has been spotted, or are the failures pretty much certain - ie. are there any 'patterns' to the behaviot.

 

I just finished some testing and do see some common things:

- Backups to LTO5 tapes work with no encrpytion

- Bakups to LTO5 tapes fail using encryption

- Backups to LTO4 tapes are good, with or without encryption.

 

Am still working with the vendors and have sent my latest findings - it seems that if I use an LTO4 tape everything works, so that rules out any issues with KMS, library, drive, etc.

 

 

There are 3 possible settings, it need to be ‘Allow Application Managed’, if this is not set correctly you cannot create an encrypted backup, even if KMS is correctly installed.

 

Confirmed, it's at Application Managed.

 

There is also a FIPS setting, if this is set I believe that the drives are only allowed to read/write encrypted tapes, so make sure this option is un-set.

 

Not sure where this is, but since we can encrypt to LTO4 tapes I would guess this doesnt apply. 

 

HoldTheLine
Level 4

If they are fuji, we might be in luck as I know someone who works there who might be able to confirm a couple of things.

 

We are seeing some interesting things here - the LTO5s that we cannot write to are all Fuji.  Soon I will have some HP LTO5s to test, sort of a shot in the dark but who knows maybe that Fuji EEPROM is causing the libraries some problems.  If you have any inside information I am all ears :)

 

I was able to put one of the LTO5s that failed in the i500 into our i6000 and it wrote encrypted backups with no problems.

 

The further we get into this the more it looks like tape - will know more over the next couple of days.

HoldTheLine
Level 4

Back to the drawing board - tested the same backups with a different brand of LTO5, HP to be exact and get the same results :(

 

 

mph999
Level 6
Employee Accredited

There is a known firmware bug related to multiplexed backups, that is the only 'issue' I can find that relates to KMS failues.  The problem doesn;t happen for non-multiplexed backups.

The issue is that the drive firmware under certain conditions reports the tape in not encrypted when in fact it is - when we fsf past the tape backup headers.  If we read through te backup headers the issue doesn;t happen - very odd.  Although it is a firmware issue, NBU gets around the problem be reading through the headers as opposed to fsf.

This surley has to be a firmware issue ...