cancel
Showing results for 
Search instead for 
Did you mean: 

Multiplex w/ KMS

Jevon
Level 4

NetBackup 7.1.0.4 with the IBM TS3310 tape library.

It would seem I cannot perform Multiplex w/ Encryption (KMS).  The backup runs without error (false positive?). The tape is unreadable afterwards.

Anyone else share this experience?  How did you fix it?

 

3 ACCEPTED SOLUTIONS

Accepted Solutions

mph999
Level 6
Employee Accredited

If you look in bptm for a job of the failed restore you should see lines containing this :

 

establish_decryption_key: next block encryption

This line should appear after this line:

 

io_position_for_read: successfully positioned 

The question:  how many 'establish_decryption' lines are there directly after the postioning line, there should only be 1.

If there are more, you are probably looking at an issue similar to eTrack 3093882, and will need to log a call.

The bptm log, KMS log and nbsu -c -t outputs will be required, as well as a exact description of the issue.

The casue apeears to be firmware related, and an EEB was released to work around the issue.

Martin

View solution in original post

Rick_Stone
Level 2
Employee Certified

It is unlikely that the retention level or timestamps had anything to do with not being able to decrypt your data.  With KMS encryption, during a read process, the tape drive will encounter an encrypted block of data on the tape and notify the BPTM process via SCSI commands.  BPTM will then communicate with NBKMS on the master server to retrieve the encryption key.  Without actually seeing a restore failure from a full BPTM log we can't know if the tape drive is telling us if the data is encrypted or not.

One of the comments earlier mentioned that this is likely a firmware issue, and I would agree to that.
What model of tape drive are you using and what is the firmware version?  It wouldn't be the first time that we have encountered defective firmware that would cause this issue.  Check out the following technote:

www.symantec.com/docs/TECH204085

View solution in original post

mph999
Level 6
Employee Accredited
There is a bug that affects certain makes of drive due to a firmware issue. The drive is unable to correctly position after reading the backup header for the restore that is requested. Although this is not a NetBackup fault, there is a EEB that provides a workaround by getting the drive to read trough the headers as opposed to an ' mt -f ' which is what is failing. I'll try and find the details for his issue tomorrow (Fri) Martin

View solution in original post

30 REPLIES 30

Nicolai
Moderator
Moderator
Partner    VIP   

You can have Netbackup KMS and multiplexing enabled - no problem. Done that myself. If you have configured KMS the intended way backup images will be encrypted wit AES256 in the tape drive.

What tape library does not matter - just a tape drive with that support the T10 encryption standard.

http://www.symantec.com/docs/HOWTO56305

Do not try to mix library based encryption with Netbackup based encryption. It's like oil and water - they do not mix.

Jevon
Level 4

I confirmed the drives do support encryption back when I first setup KMS.  Infact, when I don't use multiplexing, the tapes can be read just fine.

Nicolai
Moderator
Moderator
Partner    VIP   

When you don't multiplex are the tapes encrypted then ?. Verify with bpimagelist or using the GUI

The volume group must have the prefix ENCR_ else no encryption take place.

Jevon
Level 4

Yes, tapes show up encrypted.

"Flags:            0x40  (Tape Encrypted)"

ENCR_ is the prefix.  Even the IBM tape utiltiies show it is encrypted.

I have also not had a problem with mutiplexed tapes until encryption came along.

I'm getting the feeling that what I am experiencing is a little unique.  I was hoping someone else may have had this experience before.

bills
Level 5

I have used KMS with MPX before as well.  You say the tape is unreadable afterwards - what do you mean?  The restore process can't read it?  Check the KMS logs (unified logs only, if I recall).  It should say something about retrieving keys, etc.

Bill

Nicolai
Moderator
Moderator
Partner    VIP   

You are on Linux - right?.

What status code are you seeing - status 85 ?

Jevon
Level 4

Linux, Yes.

Status 85.  Yes.

I'll have to check the logs out tomorrow.  I'll see if I can post something then.

Nicolai
Moderator
Moderator
Partner    VIP   

Sound Good.

And you are 100% sure you did not change the encryption key somwhere in time (yes I made that one in a lab - did cost me 2 hours of debugging in vain) ?.

Jevon
Level 4

Yup.  Only ever made one key.


[root@bumaster ~]# nbkmsutil -listkgs
 
Key Group Name        : ENCR_PCI-Encrypted
Supported Cipher      : AES_256
Number of Keys        : 1
Has Active Key        : Yes
Creation Time         : Wed Sep  5 16:05:04 2012
Last Modification Time: Wed Sep  5 16:05:04 2012
Description           : -
 
 
Number of Key Groups: 1
 
[root@bumaster ~]# nbkmsutil -listkeys -kgname ENCR_PCI-Encrypted
 
Key Group Name        : ENCR_PCI-Encrypted
Supported Cipher      : AES_256
Number of Keys        : 1
Has Active Key        : Yes
Creation Time         : Wed Sep  5 16:05:04 2012
Last Modification Time: Wed Sep  5 16:05:04 2012
Description           : -
 
  Key Tag    : 20e16a269394992f92db19cbd4bd2ea2b09cbbbb0bdc4838ef7174c6897b773b2f92db
  Key Name              : nbuttckey
  Current State         : ACTIVE
  Creation Time         : Wed Sep  5 16:07:57 2012
  Last Modification Time: Wed Sep  5 16:07:57 2012
  Description           : TTC Backup Key
 
 
Number of Keys: 1
 

bills
Level 5

One of the verbose image listings will list the key tag for the encrypted image - might be a good idea to make sure that matches what's in your keystore....

Bill

mph999
Level 6
Employee Accredited

If you look in bptm for a job of the failed restore you should see lines containing this :

 

establish_decryption_key: next block encryption

This line should appear after this line:

 

io_position_for_read: successfully positioned 

The question:  how many 'establish_decryption' lines are there directly after the postioning line, there should only be 1.

If there are more, you are probably looking at an issue similar to eTrack 3093882, and will need to log a call.

The bptm log, KMS log and nbsu -c -t outputs will be required, as well as a exact description of the issue.

The casue apeears to be firmware related, and an EEB was released to work around the issue.

Martin

Nicolai
Moderator
Moderator
Partner    VIP   

Interesting !

No TN for eTrack 3093882 ?. I could not find any.

Jevon
Level 4

I do get those.

 

 

09:59:53.290 [12822] <2> io_position_for_read: successfully positioned 000170 to file number 5, mpx_header = 2

09:59:53.291 [12822] <2> establish_decryption_key: next block encryption status: LON 0x000000000000331e, algorithm index 1, encryption status 0x3
 

 

This eventually happens in the job.

 

04/23/2013 10:14:54 - Error bptm (pid=12822) cannot read image from media id 000170, drive index 0, Input/output error

04/23/2013 10:14:54 - Error bptm (pid=12828) The following files/folders were not restored:

04/23/2013 10:14:54 - Error bptm (pid=12828) UTF - /root/anaconda-ks.cfg

04/23/2013 10:14:55 - Info bpbrm (pid=12813) got ERROR 85 from media manager

04/23/2013 10:14:55 - Info bpbrm (pid=12813) terminating bpbrm child 12833 jobid=198940

04/23/2013 10:14:55 - Error bpbrm (pid=12833) bpcd on torbk01 exited with status 21: socket open failed

 

This comes up almost immediately in bptm

 

 

09:59:54.509 [12822] <2> mpx_read_data: attempting read error recovery, err = 5

09:59:54.509 [12822] <2> tape_error_rec: error recovery to block 13086 requested

09:59:54.509 [12822] <2> tape_error_rec: attempting error recovery, delay 3 minutes before next attempt, tries left = 5
 
It happens 5 times and then
 
10:14:54.663 [12822] <2> set_job_details: Tfile (198940): LOG 1366726494 16 bptm 12822 cannot read image from media id 000170, drive index 0, Input/output error


10:14:54.696 [12822] <16> mpx_read_data: cannot read image from media id 000170, drive index 0, Input/output error

10:14:54.696 [12822] <2> send_MDS_msg: DEVICE_STATUS 1 83395 bumaster 000170 4001353 IBM.ULT3580-TD4.000 2000283 READ_ERROR 0 0
 
 
 

 

Jevon
Level 4

Oh, and yes, there is only one 'establish_decryption_key' line.

 

I have just found out that the original admin for this system did use the generic 'SCSI Tape' driver that comes with RHEL 5.3.  Apparently, back when this backup server was created, using the IBM drivers made the drives undetectable by NetBackup.  Fastforward to today, and the result of that is likely biting me in the butt.

Seems like I'll have to do an OS upgrade and/or driver update.

Nicolai
Moderator
Moderator
Partner    VIP   

I am using the inbox (system default) driver on RH 5.6 - no problems. I think i have read somwhere the inbox driver is the only supported driver by Symantec.

Jevon
Level 4

Going to upgrade to OEL6.4 on Monday.  Will update how that goes.

Jevon
Level 4

Hello again.  Just an update.

 

I went for RHEL 5.3 to Oracle Linux 5.9.  I confirmed with the IBM Express Update utility that I have the latest BIOS on the QLogic HBA.

Still doesn't work.

I have opened up a support ticket with Symantec.  Let's see how they do.

Nicolai
Moderator
Moderator
Partner    VIP   

Thanks for the update. If Symantec can help you, Plese post the soloution to the issues.

Jevon
Level 4

Months later, and the problem has solved*.

What was the solution?  I'm really not sure.

I did notice during my testing that the data expiration dates were slightly off.  I had made a retention level of 2 hours, and the expiration date of newly assigned media would be set to yesterday.  Retention level of 2 days, expires today.  Somewhere, a time calculation was way off.  If that is the case, and timestamps are factored into the decryption process, than this could be a hint as to the cause of the problem.  But why Multiplexed tapes only?

How did I fix the timestamp issue?  No idea.  I restarted all NetBackup on the master/media server, and that behavior stopped.  For all I know, it could come back again, and when it does, it could cause this to happen again.  I have performed this restart several times prior.

I also upgraded to 7.5.0.6.  Could there be anything in there that fixed it?  I didn't see anything related in the patch notes.

If it comes up again, I'll share.

Edit: It is also very uncomfortable to lose trust in the integrity of ones backup because of something like this.