On my new costumer network:
I ve got a lot of frozen media due to down drives on LTO4 with SSO...
Currently frozing 4-5 tapes per week.... Cannot do anything else than duplicating those images stored in...
If someone got an idea??, I ve got a call opened with Sun-oracle and symantec since 1 year, and nobody give me the sollution...
Our workarroud is to make Oracle replacing the drive, (they said each time that the drive was okay...:) ) and the remplaced drive is okay for 6 month more... After 6 month use, the drive start to hang on by frozing tapes and going down...
So now, each time we froze or we see down path, we remplace the drive... and it works, but to my mind it is an heresy to change drive for frozen tapes.....
Originally referenced: https://www-secure.symantec.com/connect/forums/tape-device-going-down
Is it always the same tapes that are freezing or the same drive(s) that's going down?
How old are the tapes? - they can become unreliable after a few years.
There is no guarantee also that replacement are any better than those they replaced! Do they get cleaned when required?
You need proper logging to determine reason for errors.
bptm logs on all media servers is first on the list.
Next, add VERBOSE entry to vm.conf on all media servers and restart NBU. Media Manager/device errors will be logged in /var/adm/messages. (I assume your media servers are Solaris - if not, please let us know.)
As per Andy's post, you need to determine if errors are experienced with certain tapes or drives. Have a look at the following file that exists on each media server: /usr/openv/netbackup/db/media/errors .
Please also let us know which NBU version.
You may not need any cleaning tape for LTO4 drive, assuming there is a rotation of your tapes.
In fact, each tape will give a little clean-up to the drive each time it's mounted ... but if you mount the tape more than 1000 times, it will become dirty, and will make the drive dirtier itself.
With a good pool of tape, and a good rotation, you may never need to clean the drive.
But if you have too few tapes, you need to buy a cleaning tape and ask your drive stats : tpclean -L
FIrst we need to determine if it's media or drive problem?
Back in the days when I used to deal with Quantum libraries, we kept having drive down issue. It's easy to rule out media problem by testing it on another library/tape drive, if it works the media are fine.
We didn't think Netbackup is the problem because the fact is Netbackup as a software will keep trying but up to a certain limit it will just bring down the drive. If media is bad it will freeze the media.
We kept a checklist of what drives going down most frequently, report to Quantum and usually a replaced drive would fix the issue, but as you said, may not last for long. 6 months is quite good actually, sometimes we got the same drive going down again in just 2 weeks. After a long troubleshooting process, we found it's important to keep track of the drive firmware version as well. Some firmware from hardware vendor could have a problem on certain drive, it may work on day1 but not other days.
similarly for media issues - that's where the /usr/openv/netbackup/db/media/errors file can come in handy.
We did the same thing a while back to monitor drive issues & use as "evidence" for getting drives replaced.
Also, physically swapped drive location to see if the error "followed" the drive - if not could indicate a connectivity issue (e.g. physical cabling)
Thanks Laurent... (I just remenber that I have follow one of your trainning few years ago on la defense... )
Is it an official information that Lto4 gives little clean up on the drive during the use?
My tape rotation seems to be good Ive got only 6 or 7 tapes that has been mounted more than 500 times on a total of 5200 tapes.....
Drive Name Type Mount Time Frequency Last Cleaned Comment
********** **** ********** ********* **************** *******
DC1-Drive03 hcart* 2695.6 0 00:44 09/15/2010
DC1-Drive06 hcart* 6682.5 0 N/A
DC1-Drive02 hcart* 32.7 0 N/A
DC1-Drive01 hcart* 558.2 0 01:21 04/03/2011
DC1-Drive08 hcart* 31.6 0 N/A
DC2-Drive08 hcart* 883.9 0 19:36 03/01/2011
DC2-Drive03 hcart* 28.4 0 N/A
DC1-Drive04 hcart* 1693.2 0 18:09 11/25/2010
DC1-Drive07 hcart* 146.4 0 07:40 05/03/2011
DC2-Drive02 hcart* 1866.0 0 13:49 11/24/2010
DC2-Drive04 hcart* 608.9 0 06:34 03/22/2011
DC2-Drive07 hcart* 884.6 0 17:16 03/01/2011
DC1-Drive05 hcart* 2363.8 0 N/A
DC2-Drive01 hcart* 5.9 0 22:05 05/10/2011
DC2-Drive05 hcart* 1878.5 0 15:31 11/16/2010
DC2-Drive06 hcart* 68.0 0 N/A
Excuses for the bad format....