cancel
Showing results for 
Search instead for 
Did you mean: 

Frozen tape

simz123
Level 4

Hi, I have 2 frozen tapes.

i checked the tape logs on Admin Console for the Media/tape and it shows "media id xxx load operation reported an error".

I checked the \veritas\netbackup\db\media\errors and found the following:

03/07/18 14:29:26 502322 10 POSITION_ERROR 0003
03/07/18 15:04:04 502322 10 POSITION_ERROR 0003
03/08/18 10:16:14 504921 10 WRITE_ERROR 0003
03/23/18 10:54:23 502513 10 POSITION_ERROR 0003
04/23/18 15:01:36 500780 1 WRITE_ERROR 0014
04/23/18 15:01:39 500780 1 TAPE_ALERT 0014 0x34001000 0x00000000
04/30/18 06:44:28 502412 -1 RESERVE_ERROR 0004 0 1 0 0
05/07/18 06:00:51 500529 -1 RESERVE_ERROR 0006 0 1 0 0
05/07/18 06:54:11 505158 -1 RESERVE_ERROR 0002 0 1 0 0

Does this mean something to anyone?Should i be looking somewhere else?

Is there a way to find out what drive was used for this particulat tape?

I have used ROBTEST to move tapes from slot to drives in the library and everything seems to work there. Drives dont need cleaning either as per the SL Console. Automatic cleaning is enabled.

21 REPLIES 21

sclind
Moderator
Moderator
   VIP   

I would check the tape itself

check the tape for physical damage? or something else?

This is what i got: bpmedialist -m 502427
Server Host = brm-up-nbu-6

 id     rl  images   allocated        last updated      density  kbytes restores
           vimages   expiration       last read         <------- STATUS ------->
           On Hold
--------------------------------------------------------------------------------
502427   5      3   05/08/2018 00:10  05/11/2018 07:16  hcart2   452767786     0
                3   08/12/2018 06:00        N/A         FROZEN
           0

Amol_Nair
Level 6
Employee
The scsi errors reported do not match the time frame when this tape was assigned.. nor does the media id match up.. so probably you may be mixing up 2 different issues..

The other write errors are a few months old..

take a look at the system logs on the media server that should have more information on the load operation failure.. if you had robots debug log or the ltid logs they would also provide you some insight on why the load operation may have failed..

The output does show that the tape has 3 images on it.. It may be possible that the job mounted it on an unsupported drive (physical density mis match)

Marianne
Level 6
Partner    VIP    Accredited Certified

Are you looking at 'errors' file on media server brm-up-nbu-6? 

What is the timestamp for the tape logs error? 

Do you have bptm log folder on brm-up-nbu-6? 
This log is the best place to look (at least level 3 logging)  along with OS System log (/var/log/messages on Linux).

HI, @Marianne @Amol_Nair

on the media server i tried to run the following but it didnt return any output.

 egrep "501636|502542" /usr/openv/netbackup/db/media/errors

Ive attached the bptm logs of the media server. Also the tape logs timestamp too.

yesterday i only had 2 Frozen and today i have 20 total.

Alexis_Jeldrez
Level 6
Partner    VIP    Accredited Certified

That's a lot of tape errors.

  • Check if there are error in the operating system logs (/var/logs/messages, I guess). Grep -i for warnings, alerts and errors.
  • If it's a bunch of new tapes they could have a problem that's not replicable with old tapes.
  • With NetBackup down check if you can use the tape library with its own (web) interface. Try positioning a tape in the tape drive, for example. Could be a problem with a specific tape drive.
  • Also, try rebooting the library.

 

@Alexis_Jeldrezthese are old tapes which come back from iron mountain.

so im not sure how this actually works but when i ran: /usr/openv/volmgr/bin/robtest on my media server the output was:

No locally-controlled robots with test utilities are configured

From my master server:

brm-up-nbum-1:~ root # tpconfig -d
Id  DriveName           Type   Residence
      Drive Path                                                       Status
****************************************************************************

Currently defined robotics are:
  TLD(0)     robot control host = brm-up-nbu-1
  TLD(1)     robot control host = brm-up-nbu-1

EMM Server = brm-up-nbum-1

That means only brm-up-nbu-1 can do robtest? is that how it should be setup? or thats up to preferece?

i did the robtest on brm-up-nbu-1(media server) and i was able to move tapes to two different drives. i only did two because ive heard it gets stuck if you use it more then 5mins.

The bptm logs are at verbose 1 so do not have much info..

moreover you seem to have done a grep on it and shared just the lines which have reference to the media id.. so the only lines that come up are like - “/usr/openv/netbackup/logs/bptm/root.051418_00001.log:20:41:27.745 [58433] <2> report_drives: MEDIA = 501636
/usr/openv/netbackup/logs/bptm/root.051418_00001.log:20:48:47.451 [53251] <8> write_backup: media id 501636 load operation reported an error”


We would need the complete bptm file ofcourse at higher verbosity..

And mainly the system logs (var/log/messages) from the same media server as we all have been saying since the beginning, but I guess you may have missed that part..

Marianne
Level 6
Partner    VIP    Accredited Certified

I agree with @Amol_Nair - log snippets do not help. The media id is not always part of the error. 
We need full logs or at least ALL entries for a particular job/PID (e.g. 53251).

For load operation error, you need syslog (/var/log/messages) on the robot control host - brm-up-nbu-1.
Note that you need Media Manager processes to run in VERBOSE mode in order to have meaningful logging in messages file: 
Add VERBOSE to vm.conf on robot control host plus ALL media servers, followed by restart of NBU.

If we look at the RESERVE_ERROR entries in your opening post, it seems that there may be reservation conflicts of drives between media servers in SSO environment.

So, extremely important to verify that SSO is configured correctly and that Persistent Binding is in place between HBA and OS on all media servers.

You may want to check vmdareq output on the master server at regular intervals during peack backup window to view drive assignments. 

 

Hi @Amol_Nair@Marianne

I am very new to netbackup. i did find the log files you guys were asking for.

@Marianneyou mentioned "it seems that there may be reservation conflicts of drives between media servers in SSO environment.So, extremely important to verify that SSO is configured correctly and that Persistent Binding is in place between HBA and OS on all media servers."

How do i go about this? Where should i start from. Any documents would be helpful.

Really appreciate everyones help.

The var log messages shared are simply filled up with reservation conflict messages in them.. Proabably a good time to reach out to someone who is managing the storage and give you more details on how the configuration is done in the environment..


I guess the below link should give you an overview of what this reservation conflict message means

https://www.veritas.com/support/en_US/doc/24437881-126559615-0/v95674354-126559615

And probably the below link to outline the overview of what is persistent binding

https://www.veritas.com/support/en_US/article.100016290

Marianne
Level 6
Partner    VIP    Accredited Certified

I am curious to see vmdareq output on the master server.
(In /usr/openv/volmgr/bin.)

Please remember to add VERBOSE entry to vm.conf on robot control host and all other media servers, followed by NBU restart. 
This will ensure detailed logging around tape movement and other NBU Media Manager actions.

The reservation conflicts is what is causing your issues.

You need to verify that Persistent Binding is in place. 
Once this is done, you can re-do SSO config  by running the Device Config Wizard, select robot control host and all media servers sharing the drives. Once completed, the SSO config should be fine.

Hi @Marianne,

is this the correct way to add VERBOSE entry to vm.conf:

MM_SERVER_NAME=hostname

MM_SERVER_NAME=hostname

MM_SERVER_NAME=hostname

DAYS_TO_KEEP_LOGS = number

VERBOSE

I would add all the media servers,not master, to the robot control host media server only OR i do the above on all media servers? or just add all media servers on the robot control host media server and just add VERBOSE to all the other media servers?Days to keep i can put =3. that way it will keep the logs for 3days?

i attached the vmdareq, looks wierd to me.

@Amol_NairI read the links you provided, but when i clicked on the Emulex article for more info it wasnt loading.

so im unsure how to do Persistent Binding.

FYI, i have a drive paths going down everyday. I just UP them. should i be tackling this first and then moving on to frozen tapes?

The person who was working on NBU is no longer here, and im filling in with hardly any background.

i saved the vmdareq in notepad but it keeps saying the conetens of the attachment doesnt match its file type, even tho i saved it as all file types. Hopefully word works.

Marianne
Level 6
Partner    VIP    Accredited Certified
There must only be one MM_SERVER_NAME entry in vm.conf for the local hostname.
Add VERBOSE to vm.conf on robot control host and all media servers.

Days to keep logs is only needed if you create debug logs.

So, this is all you need:
MM_SERVER_NAME=hostname
VERBOSE

Do this on robot control host and all media servers.

vmdareq output is supposed to be readable text. I see nothing that is clear text. What do you see when you run the command?

Thanks @Marianne ill do that.

vmdareq output is all over the place. some is readable while other is just junk. i edited the doc and removed all the junk incase you wanted a look.



 

Hi @Marianne,

After adding the VERBOSE to vm.conf file do i need to restart the media servers or restart the nbu service?

Marianne
Level 6
Partner    VIP    Accredited Certified
Restart NBU, not the server.

@simz123

That emulex link is kind of just redirecting you to the guide for the hba anywhere utility..

Here is one of the pdf files, I believe for version 3.x of the utility.. You could always find newer guides depending on the version of the utility you would be using

http://pdfstream.manualsonline.com/4/43c2be87-9cb7-401c-af87-ad8816ed0c90.pdf