cancel
Showing results for 
Search instead for 
Did you mean: 

status: 83: media open error

Stanleyj
Level 6

i came in this morning and all of my backups starting at 4:08am started failing with error 83.  I have rebooted my systems and even upgraded to version 7.1.0.3 because i found an article saying it fixed a dedup issue when using 5200 appliances.  I have seen nothing in my environement to indicate what could have caused this.

My environment is a single 7.1.0.3 master server backing up to two 5200 (2.1) appliances.  Everything has been working flawlessly for weeks.   

 

any have any ideas?

1 ACCEPTED SOLUTION

Accepted Solutions

Stanleyj
Level 6

After working with support for several days an EEB file was sent to me that I applyed to both of my appliances. Now everything is back to normal.

i dont really know what the patch was for but it worked. They still are telling me this has nothing to do with updating my master server to 7.1.0.3 but that is the only thing that changed in my netbackup environment within the last few weeks.

Oh well its fixed. Thank you guys so much for all your help here.

 

Sorry i cant provide better information for anyone else that might run into this 

View solution in original post

20 REPLIES 20

VirtualED
Level 5
Employee Accredited Certified

Your Dedup Environment is down.  Maybe it is out of free space, or Content Router queue processing has failed 5 times.

Does the following command run from the Appliance:

# /usr/openv/pdde/pdcr/bin/crcontrol --dsstat

# /usr/openv/pdde/pdcr/bin/crcontrol --getmode

Is the Disk Volume up:

# /usr/openv/netbackup/bin/admincmd/nbdevquery -listdv -stype PureDisk -U

Is the Disk Pool Up:

# /usr/openv/netbackup/bin/admincmd/nbdevquery -listdp -stype PureDisk -U

Stanleyj
Level 6

Virutal ed,

I thought i had it docuemnted on the new maintance password for the appliance after the 2.1 update but apparently not (ghrr).  So that mean i am unable to try the commands you sent me.

I do know that the disk volume is up and it appears that my appliances are not full. They are 40tb appliances and my netbackup console and the admin page for the appliance only shows them only using about 20%.

 

The wierd thing is i can restart all the services on the appliances and my backup start working as normal for about two streams and then everything goes back to status 83.  I dont know buddy but thank you for helping.

I contacted support and 5 hours later they called me and asked that send them some logs.  But they have yet to email me what they want? 

Stanleyj
Level 6

I found some errors on the appliance saying the the pure_disk volume has been marked down.  So Ed your right but i dont know how to stop this from happening.

 

Im guessing this is all caused by me upgrading to 7.1.0.2.  Everything was working fine until i did that two weeks ago.  dangit!!

Mark_Solutions
Level 6
Partner Accredited Certified

Stanley

It may just be a co-incidence as the appliances have bene know to suddently have issues and they do tend to need a little tuning.

Obviously to do that you will need to actually log into it via putty (you can try root or admin and P@ssw0rd) as it may have been re-set during the patch.

Once logged in go to the Support menu, then Maintenance, then type elevate - this will give you file level access to the appliance

Now work through the following - lots here so dont skip anything ...

First change the keep alive time if they are not set to 510, 3 and 3:

# cat /proc/sys/net/ipv4/tcp_keepalive_time
  7200
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
# cat /proc/sys/net/ipv4/tcp_keepalive_probes
9

change the setting:

# echo 510 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes

To keep persistent after a reboot see below – use vi editor:

The changes would be rendered persistent with an addition such as the following to /etc/sysctl.conf

## Keepalive at 8.5 minutes

# start probing for heartbeat after 8.5 idle minutes (default 7200 sec)
net.ipv4.tcp_keepalive_time=510

# close connection after 4 unanswered probes (default 9)
net.ipv4.tcp_keepalive_probes=3 

# wait 45 seconds for reponse to each probe (default 75
net.ipv4.tcp_keepalive_intvl=3

Don’t need a restart to take effect - and then run : chkconfig boot.sysctl on

Second create this file with the value of 800 inside:

/usr/openv/netbackup/db/config/DPS_PROXYDEFAULTRECVTMO

Third enable compression for de-dupe:

/usr/openv/lib/ost-plugins/pd.conf

Edit line so Compression =1

Fourth make sure that the automatic tuning has been done (it should run on first boot up but has been know not to):

cd to: /opt/NBUAppliance/scripts/
type the following once in that directory:
../bin/perl tune.pl

Fifth go to usr/openv/netbackup/db/config and make sure that there are no SIZE or NUMBER DATA BUFFER files present - if there are make a note of their values and then rename them (you can try putting these back later if you have any performance issues

Now type exit and exit to come back to the normal menu and issue a reboot command

See how it goes - i have found that all of these setting together make the applicances totally stable - just the BUFFERS i have not checked recently as 2.1a adds some in i believe

Hope this helps

Stanleyj
Level 6

Mark,

Thank you so much for the detailed instructions.  They are spot on!!!  I am rebooting my first appliance now and will run some backups to see what happens. 

Every time i upgrade my master it always takes about a week of doing these "performance tuning" steps before everything starts working again.

Once again thank you so much for taking the time to type these instructions out.  The password you posted is correct and i am going to document that for later use.

Stanleyj
Level 6

Mark,

I double checked that all the settings you provided stayed after the reboot and then attempted to run some backups.  They are still failing with status 213 and 2074s.  I am at a loss on what is causing this.  i have requested that my ticket be escalated so hopefully i will get with support here soon and figure out whats going on. 

I will update after i hear something back from support.

Mark_Solutions
Level 6
Partner Accredited Certified

After you had done this work did you UP the Volume Pools as they may still have been marked down in NetBackup.

They are a bit odd sometimes as the GUI shows them as UP when they aren't so set them down and then set them back up in the GUI and try again.

Stanleyj
Level 6

Ed,

I tried the commands you suggested but they did not run.  Said they were invalid.

Also my appliance is at 2.1a patch level.  I may have forgotten to mention that.

Mark_Solutions
Level 6
Partner Accredited Certified

Which commands were invalid?

I guessed it was 2.1a as the password worked.

Also down the pool and up it again to see if that works

Stanleyj
Level 6

i tried all the commands and i get "command not found" message. 

Yes P@ssw0rd is the new root password for 2.1a

 

I have restarted my appliances serveral times.  is this the same as "downing" the pools?

Mark_Solutions
Level 6
Partner Accredited Certified

I am still unsure which commands you could not run as initially you said it was all done and you were rebooting - please be specific about what did not work.

To down the pool go to the Device Monitior in the Admin Console - Disk Pool tab, right click a pool and select Down Disk Pool, leave it for a few seconds and then right lick and select Up Disk Pool

Rebooting the appliance wont always change its status in NetBackup

Stanleyj
Level 6

Im and idiot.  I was trying to run the commands on the appliance.  That was completly my fault.  I didnt even realize i was running them in the wrong place until support ran the exact same commands you provided on the master server.

And YES you are correct my puredisk pool is down on my primary appliance.  I wish i could share all the trouble shooting stuff support was doing but he was issueing commands and switching screens so fast i couldnt keep up and that went on for 2 hours.

But in the end my Pure disk pool is still down.  he could get it started but then 5 minutes later it goes back down again without any real messages in logs other than connection refused.

Support keeps mentioning that it looks like a network issue but i have asked my network admins and no changes have been made to any policies or routing for 3 weeks.

 

I am waiting on a call back.

Mark_Solutions
Level 6
Partner Accredited Certified

Stanley

All of the commands i gave you, and the files to create / edit, were to be run on the appliance

You need to log into it via Putty, go to the Support Menu, type Maintenance (which usually asks you to log in again) which puts you at a command prompt, and then type elevate which gives you root access to the O/S on the appliance in order to check / create the files and setting as per my first post.

None of what i gave you was for the Master Server - just the appliance

Stanleyj
Level 6

Mark,

The commands and files you provided me worked just fine.  Its the command that Virtual_ED asked me to run in the very first response are the ones that will not run on the appliance.  I think they are meant for the master server.  Im sorry if i have confused you guys becauase i am VERY confused my self.  This is driving me crazy.  It doesnt make any sense why all the sudden my services and puredisk pools just start failing.

I followed some documentation about spad failing to stay up that i used on when i upgraded to 2.1a a while ago and now im getting error 83 for my backups and duplications.

I really do appriciate you guys helping me!! 

Stanleyj
Level 6

Mark,

All of the commands and file edits you provided worked on the appliance.  it was the commands that virtual_ed supplied that i believe can only be run on the master server.  im sorry if im making this confusing.  Im just really frustrated and confused at this point because this doesnt make any sense. 

It seems about every 2  months im on the phone with support trying to get these stupid appliances back working because and update or something "unexplained" seems to cause them to fail.

I really do appreciate all the help from you guys!!

Mark_Solutions
Level 6
Partner Accredited Certified

So did you DOWN and UP the pools in the device monitor to see if it came back to life?

What critical / warning errors are you getting in the All Log Entries report now?

Mark_Solutions
Level 6
Partner Accredited Certified

One other thing ... after my last upgrade to 2.1a spad and spoold were not coming up correctly after a reboot - somesort of timeout issue.

Go back onto the appliance using Putty and go to the Support Menu

Then use:

Processes Show

If you cannot see anything for spoold or spad then use:

Processes Stop

Processes Start

Then try Processes Show again to see if they are then running

See if it now comes back to life

Stanleyj
Level 6

The Spad service seems to be the one failing.  Funny thing is it will start after issuing the start command but then fails again after my backups try and kick off. 

I have used my google fu to find several articles on this very same issue but none of the changes seem to make any difference.

 

I dont know buddy.  I am waiting on support to call me back.  Watch this be something really stupid and not anything to do with the appliance.

Stanleyj
Level 6

Mark and Ed,

after working with support they have determined that the issue is not that I upgraded to 7.1.0.2 for my master server but that the optimized duplication to my second appliance is somehow erroring out and that is whats causing SPAD to stop working.  According to support the is was corrected in version 7.1.0.3 (which i have installed) but that a patch for the appliance has not been issued.

They have taken my information and are currently working on a patch to see if that will correct my problem. 

I really dont know how my appliances just stop talking between each other when they have been working flawlesly for months.

I can backup directly to my onsite and offsite appliance. I can backup to my onsite appliance and then duplicate to my tape libraries, but if setup a duplication to go from my onsite to my offsite after about 15 minutes the duplications will fail and cause spad to stop on the appliance that is sendig the duplication.   

I will keep updating this post until i a solution is figured out.