cancel
Showing results for 
Search instead for 
Did you mean: 

Troubleshoot Drive Down issue

Velmabii
Level 4

Hi All,

I'm new here in Netbackup. Kindly help me out to fix this issue.

We are using Netbackup 7.5, in our environment both the master and media server are the same, we have 4 drives out of which 2 drives goes down frequently.

We had even tried to start up the drive manually but again its going down.

Kindly help me out in this issue, with the basic troubleshooting too..

54 REPLIES 54

Marianne
Level 6
Partner    VIP    Accredited Certified

So... now you have a problem with the robot....

The OS has lost connection to the robot. One of these could be the issue:

  • Robot door was opened and robot still in process of coming online
  • Connectivity issue - check physical connection at both ends
  • Faulty robot - check robot display panel or web console for errors

 

It is probably time that you log a call with the hardware vendor. 
Almost 2 weeks and close to 40 comments later and we are no closer to finding a solution than the day you started this discussion...... 

Velmabii
Level 4

Dear Marianne,

I had already logged a call with h/w vendor and shared the library logs, they had mentioned there is no h/w errors, and they had mentioned the 3drives shows cleaning needed.

Clean the drives using cleaning tapes and rebot the TL is what they had sugested.

As i'm new to symantec as well as TL i'm stugling out here. :( 

Marianne
Level 6
Partner    VIP    Accredited Certified

Have you tried in the meantime to see why the OS lost connection to the robot?

When troubleshooting devices, best to 'forget' about NetBackup and troubleshoot at OS-level.

To quote from an old Status 84 In-depth Troubleshooting doc:

As an application, NetBackup has no direct access to a device, instead relying on the operating system (OS) to handle any communication with the device.

This means that during a write operation NetBackup asks the OS to write to the device and report back the success or failure of that operation. If there is a failure, NetBackup will merely report that a failure occurred, and any troubleshooting should start at the OS level.

If the OS is unable to perform the write, there are three likely causes; OS configuration, a problem on the SCSI path, or a problem with the device.

Velmabii
Level 4

Dear All,

I found drive control shows AVR (which means the robot is not communicating with the backup server)

 i had run 'scan' from cmd and got the attached output (2days before when i run the scan cmd i got the diff output)..

 

Velmabii
Level 4

Kindly help me out how can i make sure that its the issue in communication between the robot and the backup server and how what steps to do further??

mph999
Level 6
Employee Accredited

Reboot the library (proper power cycle, off/ on) and the robot control host as well.

Do you then see the library in :

1) Device manager

2) scan

(1) MUST work before (2) will work.

Marianne
Level 6
Partner    VIP    Accredited Certified
Check physical connection between the server and the robot. Check the robot display panel for errors. Check Event Viewer logs for errors - check for Hba and other device-related errors. Power cycle the robot followed by reboot of the server. If none of the above helps, get your hardware vendor on site.

Velmabii
Level 4

Thanks Marianne and mph, as the DC in remote and I have to go there physically and check the physical connectivity and reboot.

Will update if it works.

Velmabii
Level 4

Martin/Marianne,

Do you have any links (SOP) to reboot the IBM ULT3580-HH5 TL gracefull shutdown and starting up?

Which will help me out for gracefull shutdown and start

Marianne
Level 6
Partner    VIP    Accredited Certified

Search IBM site or Google for IBM ULT3580-HH5 TL user manual.

Velmabii
Level 4
Dear marianne, I had done reboot of library, now the robot is communicating with backup server.. I was abt to do drive clean in TL, backup has started.. How can I stop the running backups??

Marianne
Level 6
Partner    VIP    Accredited Certified

Right-click on the Active job in Activity Monitor and select Cancel.

Velmabii
Level 4

Dear Marianne & All,

Thanks for your help.

After rebooting the TL. The communication between the robot and the backup server worked, once the communication happened, I tried cleaning the drives in IBM TL itself (not in symantec). All drive cleaning is done, now the drive is up and running fine.

Once again thanks all..

Marianne
Level 6
Partner    VIP    Accredited Certified

Glad we could help! 

I will be relieved to see this discussion closed off!

sdo
Moderator
Moderator
Partner    VIP    Certified

Surely not just yet?   Isn't there the question of a split solution, like six ways !     :D