Forum Discussion

solom's avatar
solom
Level 4
11 years ago

Failaover time more than 1 minute

Hi,

I have a Veritas cluster 6.1 configured on Red Hat 6.4, and it is taking more than one minute to failover even though no agents are configured yet.

The only resources configured are the Disk Groups and Mounts. The cluster is not generating any errors, still it takes a very long time either in deporting the Disk Group or in importing it, sometimes even volumes take long time to go online.

At first, the cluster service had only one disk group as a resource, and the failover time was around 25 sec, but when I added another disk group, failover time increased to 1.5 to 2 minutes.

Any advice what might be causing this slowness?

Thanks

  • vxdisk list
    DEVICE       TYPE            DISK         GROUP        STATUS
    disk_0       auto:LVM        -            -            online invalid
    eva64000_0   auto:cdsdisk    -            -            online
    eva64000_1   auto:cdsdisk    -            -            online
    eva64000_2   auto:cdsdisk    -            -            online
    eva64000_3   auto:cdsdisk    -            -            online
    eva64000_4   auto:cdsdisk    -            -            online
    eva64000_59  auto:cdsdisk    eva64000_59  PRI-INT      online
    eva64000_60  auto:cdsdisk    eva64000_60  PRI-LAB      online
    eva64000_61  auto:cdsdisk    eva64000_61  PRI-TC       online
    eva64000_62  auto:cdsdisk    eva64000_62  PRI-LAB      online
    eva64000_63  auto:cdsdisk    eva64000_63  PRI-INT      online
    eva64000_64  auto:cdsdisk    eva64000_64  PRI-TC       online

     

    Yes it is .

     The same configuration in the other side and is working well maybe the  problem in the network if there more traffic on the vlan. 

     

  • Hi, 

     

      Need  check  when the information  in   vcs:

    ==========

    2014/04/22 16:23:21 VCS WARNING V-16-6-16100 (TCPRI-CLU1) chkvxconfigd:The VxVM process vxconfigd seems to be un-responsive. Stopping vxnotify process, so that resources get unregistered from AMF monitoring

    ========

     

     

    what happened to vxconfigd.

    check /var/log/messages,  /etc/vx/dmpevents.log

    if needed, check  debug log.

  • HI, we'd better check system log on  Apr 22.

     

    Anyway, check  log  in April 27

    =========

    Apr 27 03:31:30 TCPRI-CLU1 multipathd: mpathbm: load table [0 629145600 multipath 1 queue_if_no_path 0 3 2 round-robin 0 1 1 68:16 1 round-robin 0 4 1 68:192 1 67:96 1 65:80 1 66:176 1 round-robin 0 3 1 69:112 1 8:160 1 66:0 1]
    Apr 27 03:31:50 TCPRI-CLU1 multipathd: mpathbm: load table [0 629145600 multipath 1 queue_if_no_path 0 2 1 round-robin 0 4 1 68:192 1 67:96 1 65:80 1 66:176 1 round-robin 0 4 1 68:16 1 69:112 1 8:160 1 66:0 1]    <<<<<<<<<<<<<<
    Apr 27 03:37:31 TCPRI-CLU1 kernel: __ratelimit: 1 callbacks suppressed
    Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:3:3: [sdcc] Unhandled error code<<<<<<<<<<<<
    Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:3:3: [sdcc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
    Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:3:3: [sdcc] CDB: Read(10): 28 00 00 00 01 20 00 00 10 00
    Apr 27 03:37:31 TCPRI-CLU1 kernel: __ratelimit: 1 callbacks suppressed
    Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] Unhandled error code
    Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
    Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] CDB: Read(10): 28 00 14 01 01 40 00 00 02 00
    Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] Unhandled error code
    Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
    Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] CDB: Read(10): 28 00 14 01 01 10 00 00 10 00
    Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] Unhandled error code

    =========

     

    suggestions:

    1. if possible, stop  multipathd , since dmp may not work with other multi path software together well.

    2. check if sth. abnormal, since many "Unhandled error code"

     

     

     

  • suggestions:

    1. if possible, stop  multipathd , since dmp may not work with other multi path software together well.

     

    This problem was .

     

    I'm sorry for the delay in reply

    Thank you very much