cancel
Showing results for 
Search instead for 
Did you mean: 

Would like somme feedback on a problem regarding devices status recognition with Netbackup on AIX

S_bastien_DEBEA
Level 3
Partner Accredited Certified

 

Hello,

 

Here is my problem. We work with a NBU 6.5.6 environnement on AIX 5.3.

There is a Clustered 2 nodes VTL (Protectier 2.5 emulating a V-TS3500 library with some IBM LTO3 drives).

We want to test every kind of problem to check how NBU handles them.

Each VTL node owns drives and a path to robotic library arm.

We've been able to manage the control path failover through Netbackup (Robtest,etc...).

When we simulate the loss of a VTL cluster node, NBU acces the control path through the other VTL node (that works fine) and all the drives that were attached to the node that is down became "defined" on the OS level (instead of available). The problem is that NBU does not detects these downed drives.

When a new backup starts, some are OK because there're going to the right node with remaining drives, and for others it tries to mount a tape in these lost drives, waits the mount timeout and return a "52" code wich is interpreted as a mount problem. So it downs the library, then restart it, retries to mount a tape in these drives and so on.

So NBU never detects downed drives even when the are not accessible from the OS side (and disconnected from the SAN).

It is only able to interprete the fact the drive never answers "ok, i'm ready with your tape" like if it was the robotic library which as not mounted the tape in the "lost drive".

We're not able to share the virtual tape drives among VTL nodes, so we can't use a multipath failover with ATAPE.

Did someone already faced this problem ? How could NBU be more aware of the real state of these disconnected drives (automatically) ?

Thanks a lot.

 

 

 

 

6 REPLIES 6

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

Please check your SCSI reservations configured for the Media Server > Media Properties. I've not had a chance to test this but from what I've read SPC-3 would be a better fit for your environment (if the devices support it)

 

http://www.symantec.com/docs/HOWTO13888

S_bastien_DEBEA
Level 3
Partner Accredited Certified

 

Thanks for your answer.

Unfortunatly I've already done this test (Media Server Host Props / Changed from SPC-2 to the other one).

It does not work better...

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

OK, can you give more details on the configuration. From what I've read I understand the robot control is clustered, and fails over fine when you lose a node. The drives component though, did you split the drives between nodes (not clustered), i.e. 10 on node 1 and 10 on node 2. Or are there 20 drives (possibly 10 active on node 1 and 10 active on node 2) clustered that should failover as well?

S_bastien_DEBEA
Level 3
Partner Accredited Certified

The only "Clustered" ressource is the robotic arm.

Drives are splitted on both nodes , 8 on VTL node1 and 8 on VTL node2 (different drives).

So we would like to see NBU downing the 8 drives on node 1.

The OVPASS/ROBTEST/MULTIPATH seems to work correctly.

We've an OVPASS0 on one port of VTL node 1 and an OVPASS1 on one port of VTL node 2.

Multipath is set to manual with both paths defined manually.

 

When the first node is disconnected (VTL services stopped or disconnected from the san), drives change from AVAILABLE to DEFINED (we must do a CFGMGR on the AIX NBU server, not dynamic) and Netbackup nevers change the state of these drives from UP to DOWN-TLD.

On a driver point of view we use the OVPASS (not SMC) for the robotic library (model V-TS3500) and tape drives use ATAPE (IBM ULTRIUM TD3).

Hope this helps.

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

Thanks for the info. What does /usr/openv/volgmr/bin/scan -tape show when you've turned of the 1 node?

S_bastien_DEBEA
Level 3
Partner Accredited Certified

I'm not on site anymore but I'll ask for this.

By the way we've had some trouble with AIX HBA (fscsiX) settings.

If "Dynamic Tracking" AND/OR "FAST_FAIL" options are activated the "OVPASS multipath" functions works "randomly".

Hard to explain. The dectection of "inaccessible" drives still fails and NBU considers it as a mounting problem (error 52).

Thanks.