cancel
Showing results for 
Search instead for 
Did you mean: 

Frozen Tapes - Lots of them

Toddman214
Level 6

Netbackup 7.1.0.4, Windows 2008R2 with two Windows 2008r2 media server, two Dell ML6000 libraries with 6 drives each.

 

 

Hello all,

 

Im seeing something odd with media freezing.

I loaded up both of our Dell tape libraries with enough media to run through the entire weekend, into Monday. I do this every Friday afternoon. As of typing this (the very next morning), I started seeing some error 96's showing up. I know what that means, so I knew it should not be happening. I looked and I see that about 50 tapes total between my two libraries went into "frozen" status. What is especially odd to me is that in Library 1, about 30 tapes all went frozen at exactly 7:52:09. I know there are various reasons why tapes can go into a frozen state, but this happened in both libraries, and multiple tapes, so I can discount bad media or drive issues, especially since they all went frozen at the same second. Have any of you seen this behavior before?

Thank you!

 

 

Todd    

1 ACCEPTED SOLUTION

Accepted Solutions

Toddman214
Level 6

Uhhhhg!  Ok, I figured it out. Its kinda stupid, really. In our storage room was a tote full of tapes. These were set aside as sort of a "rainy day" supply of tapes in case we went low on inventory, and needed an emergency supply. Well, I needed to tap into those, so I inventoried a batch and they went straight into the scratch pool and started writing and I commenced to loading more batches. As it turns out, once upon a time, long before I took over backups for the company, Netbackup was set up to use the first 6 characters of the barcodes. I use the last six, i.e. 004132 vs 4132L3. All I can figure is that the change was made sometime in the middle of that batch of tapes. I suppose Netbackup could not read the tapes with the internal media ID set to the first 6 characters, so it immediately froze those tapes. I now have my new tapes and labels, and all is well. Live and learn.

 

Thanks all. This can be resolved.

 

 

Todd

View solution in original post

6 REPLIES 6

sksujeet
Level 6
Partner Accredited Certified

There could be log of reasons for that:

Normally write errors, media positionint or allocation error. Media Mismatch or barcode issue
As you said 30 tapes freezed at that point of time, could you check what operation were running at that point of time. As you have only 6 drives so 30 tapes freezing at that point of time... sure no one did them manually?

Check the logs to see what exactly it says. Are these new media and might be incomaptible with your current drives. I had once the whole box with a LTO2 tape delivered instead of LTO4.

Marianne
Level 6
Partner    VIP    Accredited Certified

Perhaps someone loaded Write Protected tapes in the robot(s)?
Or dropped the whole lot of tapes on the way to the robots?
Or put new labels on previously used tapes?

We can keep on guessing, but you need logs to tell you why tapes were frozen.

bptm logs on the media servers will tell us why.

Another place to look for hints is the 'Tape Logs' report. Run the report from Friday afternoon to current date. (You may want to filter the report to exclude 'Info' type events.)

RonCaplinger
Level 6

If they all froze at the same time, likely the problem was because your tape library could not load the tapes into a drive. 

I've seen this before when a robot was physically broken, and we did not see any notification until we looked at the library and saw the robot arm was crooked and a tape was laying on the floor of the library. 

It also happened many times when a media server was rebooted and was not previously configured with "SCSI persistence" on the HBA's.  When the server came back up, the drive paths were no longer correct because the paths had been restored after the reboot in a different order.

If the robot can't load a tape for whatever reason, it reports that back to NBU, which then freezes the tape and requests another one.  If the problem still isn't fixed (as the above two issues attest), it just keeps freezing scratch tapes until you have no more in your library and you are left with a rash of status code 96's from all subsequent backups.  And if you have more than one media server and are sharing drives, you may have some backups that run successfully (on the media servers that had not been rebooted) and some that don't (on the rebooted servers). 

mph999
Level 6
Employee Accredited

If the tapes froze at the same time, I would expect more likely there is some config issue.

Persoanlly, I'd just remove the config and add it back.

If only a couple of libraries, 

nbemmcmd -deletealldevices -allrecords

Check all is ok at os level

Readd with wizard.

Re-inventory

If this doesn't fix, then at least you have elminated the config as a cause.

Martin

Toddman214
Level 6

Uhhhhg!  Ok, I figured it out. Its kinda stupid, really. In our storage room was a tote full of tapes. These were set aside as sort of a "rainy day" supply of tapes in case we went low on inventory, and needed an emergency supply. Well, I needed to tap into those, so I inventoried a batch and they went straight into the scratch pool and started writing and I commenced to loading more batches. As it turns out, once upon a time, long before I took over backups for the company, Netbackup was set up to use the first 6 characters of the barcodes. I use the last six, i.e. 004132 vs 4132L3. All I can figure is that the change was made sometime in the middle of that batch of tapes. I suppose Netbackup could not read the tapes with the internal media ID set to the first 6 characters, so it immediately froze those tapes. I now have my new tapes and labels, and all is well. Live and learn.

 

Thanks all. This can be resolved.

 

 

Todd

mph999
Level 6
Employee Accredited

Oh well, config issue, just not quite where I thought.

Well done for finding it.

M