cancel
Showing results for 
Search instead for 
Did you mean: 

Drives keep going down

simz123
Level 4

Hi,

Drive paths keep going down daily, i just UP them again. need to find a solution soon. Ive read that if i Enable automatic path correction it should solve my problem. Any ideas?

 Just add the following to vm.conf file?

Add the following AUTO_PATH_CORRECTION entry to the file:

AUTO_PATH_CORRECTION = YES

What logs should i be looking at?

i used the following command on media server and attached the output if its usable.

grep -i down /var/log/mess*

10 REPLIES 10

simz123
Level 4

i ran tpautoconf -report_disc on all media servers and didnt get any output so there arent any discrepensies.

but when i ran tpautoconf -report_disc on master server it gave the following:

tpautoconf -report_disc
=========== Missing Device or no local control path (Robot) ===========
 Defined as robotic TLD(0)
 Inquiry = "STK     SL500           1483"
 Serial Number = 559000200613
 Robot Path = /dev/sg22
 Robot Control Host = brm-up-nbu-1
 Hosts configured for this device:
  Host = brm-up-nbum-1
=========== Missing Device or no local control path (Robot) ===========
 Defined as robotic TLD(1)
 Inquiry = "STK     SL500           1483"
 Serial Number = 559000201820
 Robot Path = /dev/sg28
 Robot Control Host = brm-up-nbu-1
 Hosts configured for this device:
  Host = brm-up-nbum-1

i do not see /dev/sg28 or /dev/sg/22 on the GUI admin console under drives. Is there another location i should be looking at?

Ive used the robtest and everything works perfectly, moved tapes to all drives and back to their original position.

the SL console(SL500) also shows that drives are good and do not need cleaning.

Marianne
Level 6
Partner    VIP    Accredited Certified

Please show us output of 'scan' command as well as full mesages file - we need to see what is leading up to the 'downing' of the robot and drive.

Genericus
Moderator
Moderator
   VIP   

You may find the simplest solution is to just idle netbackup, delete all the drives and rescan the media servers for drives.

I have had issues where "phantom" drive paths occur - go to Media Device Management/Devices/Drives and check to see if you have multiple paths, or if any have "MISSING" in the path. ( right click filter for contains missing )

I have had issues with the tpautoconf -replace_drive command, from 7.6 to 8.0 where the update path did not stay updated. My only solution was tpautoconf -a and recycle or delete and rescan.

Sounds like time to get out the "junior detective" hat and sleuth on!

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Mike_Gavrilov
Moderator
Moderator
Partner    VIP    Accredited Certified

No, it won't fix your problem. It's very old dialog when users want NetBackup to up drives automatically instead of fixing root cause of the problem. NetBackup doesn't up paths because it might up failed drive that can damage tapes.

Hi @Marianne @Mike_Gavrilov @Genericus,

Sorry for the late reply.

I added VERBOSE to all media servers.

Attachted messages from one media server.

Is it better if i post from all media servers?

Attached another var/log/messages entry from another media server.

hopefully can get to the bottom of this.

thanks everyone for your inputs.

Tape_Archived
Moderator
Moderator
   VIP   

From Logs looks drives are configured to backup NDMP. Can you verify if your credentials to login to to NDMP filer are set correctly?? I see following error in the log files -

ndmp_public_session_create() failed: NDMP server login failed - verify attributes with tpautoconf -verify

Go to Java console => Media and Device Management => Credentials => NDMP Hosts ; ensure credentials are set correct. Once set run tpautoconf -probe ndmp_filer_name and see if you get the list of the drives.

Hi @Tape_Archived,

password for netapp was changed, i went in and changed it so that should be good. i verified it using tpautoconf -verify brm-us-ntap-1

but was that the reason why drives kept being DOWN'ed?

Tape_Archived
Moderator
Moderator
   VIP   

As the drives were configured to backup the NDMP volumes and as it could not authenticate with NetApp Filer when the backup job was attempted, NetBackup realized something is wrong and to my best understanding NetBackup down's the drives as preventive major to let us know there is something wrong in here. 

Drive going down does not necessarily means Drive is bad, it's how Netbackup tries to message us there is some issue with drive configuration.

Marianne
Level 6
Partner    VIP    Accredited Certified

This is certainly the reason for these drives being DOWN'ed:

Jun 22 07:02:28 brm-up-nbu-1 tldd[36524]: TLD(0) DismountTape ****** from drive 1
Jun 22 07:02:28 brm-up-nbu-1 tldd[56962]: TLD(0) ndmp_public_session_create_wCred failed with error code -1009
Jun 22 07:02:28 brm-up-nbu-1 tldd[36524]: DecodeDismount: TLD(0) drive 1, Actual status: Unable to open drive

Jun 22 07:02:29 brm-up-nbu-1 tldd[36524]: TLD(0) DismountTape ****** from drive 2
Jun 22 07:02:29 brm-up-nbu-1 tldd[56963]: TLD(0) ndmp_public_session_create_wCred failed with error code -1009
Jun 22 07:02:29 brm-up-nbu-1 tldd[36524]: DecodeDismount: TLD(0) drive 2, Actual status: Unable to open drive

We see a similar error in messages file in this TN: https://www.veritas.com/support/en_US/article.100018880

SOLUTION/WORKAROUND: 
The resolution to this issue is to fix the network route between the network attached storage (NAS) filer and the NetBackup media server.