04-21-2014 07:31 AM
Hi,
I have a Veritas cluster 6.1 configured on Red Hat 6.4, and it is taking more than one minute to failover even though no agents are configured yet.
The only resources configured are the Disk Groups and Mounts. The cluster is not generating any errors, still it takes a very long time either in deporting the Disk Group or in importing it, sometimes even volumes take long time to go online.
At first, the cluster service had only one disk group as a resource, and the failover time was around 25 sec, but when I added another disk group, failover time increased to 1.5 to 2 minutes.
Any advice what might be causing this slowness?
Thanks
04-21-2014 07:36 AM
04-21-2014 11:22 AM
it is take 12 second
04-21-2014 08:10 PM
Hi Solom,
It seems because of there are many disks and volumes in the second DG you added into VCS.
The time costs on importing DG depends on how many disks and volumes in the DG. Because with importing DG, all disks will be read and all volumes will be online(start) at the same time. So typically, when there are many disks and volumes, the import or online DG/DG resource will take long time to complete.
Another thing is that if you configured all volumes as volume resource in the "Volume" resource. If so, VCS volume agent will wait until all volume to start/stop when the service group online/offline to complate.
04-22-2014 12:34 AM
I hava one disk at each DG and 5 volumes mounts on the disk i think not many .
And when i force the node1 ore node2 the failover taking 10 second .
Regards
04-22-2014 01:31 AM
can you attach engine_A.log & main.cf for us ?
G
04-22-2014 06:37 AM
I attached .
Thanks
04-24-2014 12:27 AM
Hi, solom
You mean this log :
======
2014/04/22 16:10:32 VCS INFO V-16-1-50135 User admin fired command: hagrp -switch INT-PRI TCPRI-CLU2 localclus from ::ffff:10.100.208.76
2014/04/22 16:10:32 VCS NOTICE V-16-1-10208 Initiating switch of group INT-PRI from system TCPRI-CLU1 to system TCPRI-CLU2
2014/04/22 16:10:32 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INT-DB (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:10:32 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INT-HS (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:10:32 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INTJRNALT (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:10:32 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INTJRNPRI (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:10:32 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INTBACKUP (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:10:33 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INT-DB (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:10:34 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INTJRNALT (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:10:35 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INTBACKUP (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:10:35 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INTJRNPRI (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:10:41 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INT-HS (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:10:41 VCS NOTICE V-16-1-10300 Initiating Offline of Resource PRI-INT (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:10:48 VCS INFO V-16-1-10305 Resource PRI-INT (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:10:48 VCS NOTICE V-16-1-10446 Group INT-PRI is offline on system TCPRI-CLU1
2014/04/22 16:10:48 VCS NOTICE V-16-1-10301 Initiating Online of Resource PRI-INT (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:10:48 VCS NOTICE V-16-10031-1513 (TCPRI-CLU2) DiskGroup:PRI-INT:online:Diskgroups will be imported with reservations.
2014/04/22 16:10:54 VCS WARNING V-16-10031-1509 (TCPRI-CLU2) DiskGroup:PRI-INT:online:vxdg import succeeded on Disk Group PRI-INT.
2014/04/22 16:10:54 VCS NOTICE V-16-10031-1559 (TCPRI-CLU2) DiskGroup:PRI-INT:online:Volumes in DiskGroup PRI-INT will be started automatically as part of import command,the system level autostartvolume is set On
2014/04/22 16:10:55 VCS INFO V-16-1-10298 Resource PRI-INT (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:10:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource TRAKPRIVOL-INTJRNPRI (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:10:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource TRAKPRIVOL-INTJRNALT (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:10:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource TRAKPRIVOL-INTBACKUP (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:10:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource TRAKPRIVOL-INT-HS (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:10:55 VCS NOTICE V-16-1-10301 Initiating Online of Resource TRAKPRIVOL-INT-DB (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:10:58 VCS INFO V-16-1-10298 Resource TRAKPRIVOL-INTJRNALT (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:10:59 VCS INFO V-16-1-10298 Resource TRAKPRIVOL-INT-HS (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:11:00 VCS INFO V-16-1-10298 Resource TRAKPRIVOL-INT-DB (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:11:01 VCS INFO V-16-1-10298 Resource TRAKPRIVOL-INTJRNPRI (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:11:02 VCS INFO V-16-1-10298 Resource TRAKPRIVOL-INTBACKUP (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:11:02 VCS NOTICE V-16-1-10447 Group INT-PRI is online on system TCPRI-CLU2
==============
or this time:
===========
2014/04/22 16:20:16 VCS INFO V-16-1-50135 User admin fired command: hagrp -switch INT-PRI TCPRI-CLU2 localclus from ::ffff:10.100.208.76
2014/04/22 16:20:16 VCS NOTICE V-16-1-10208 Initiating switch of group INT-PRI from system TCPRI-CLU1 to system TCPRI-CLU2
2014/04/22 16:20:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INT-DB (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:20:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INT-HS (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:20:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INTJRNALT (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:20:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INTJRNPRI (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:20:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INTBACKUP (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:20:17 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INT-DB (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:20:18 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INTJRNALT (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:20:19 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INTJRNPRI (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:20:19 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INT-HS (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:20:20 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INTBACKUP (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:20:20 VCS NOTICE V-16-1-10300 Initiating Offline of Resource PRI-INT (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU1
2014/04/22 16:23:21 VCS WARNING V-16-6-16100 (TCPRI-CLU1) chkvxconfigd:The VxVM process vxconfigd seems to be un-responsive. Stopping vxnotify process, so that resources get unregistered from AMF monitoring
2014/04/22 16:23:21 VCS INFO V-16-2-13717 (TCPRI-CLU1) Output of the completed operation (imf_getnotification)
==============================================
Cannot continue monitoring event
Got notification for group: PRI-INT
==============================================
2014/04/22 16:25:22 VCS WARNING V-16-2-13011 (TCPRI-CLU1) Resource(PRI-INT): offline procedure did not complete within the expected time.
2014/04/22 16:25:22 VCS ERROR V-16-2-13063 (TCPRI-CLU1) Agent is calling clean for resource(PRI-INT) because offline did not complete within the expected time.
2014/04/22 16:26:23 VCS ERROR V-16-2-13006 (TCPRI-CLU1) Resource(PRI-INT): clean procedure did not complete within the expected time.
2014/04/22 16:27:24 VCS INFO V-16-1-10305 Resource PRI-INT (Owner: Unspecified, Group: INT-PRI) is offline on TCPRI-CLU1 (VCS initiated)
2014/04/22 16:27:24 VCS NOTICE V-16-1-10446 Group INT-PRI is offline on system TCPRI-CLU1
2014/04/22 16:27:24 VCS NOTICE V-16-1-10301 Initiating Online of Resource PRI-INT (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:27:24 VCS NOTICE V-16-10031-1513 (TCPRI-CLU2) DiskGroup:PRI-INT:online:Diskgroups will be imported with reservations.
2014/04/22 16:27:45 VCS WARNING V-16-10031-1509 (TCPRI-CLU2) DiskGroup:PRI-INT:online:vxdg import succeeded on Disk Group PRI-INT.
2014/04/22 16:27:45 VCS NOTICE V-16-10031-1559 (TCPRI-CLU2) DiskGroup:PRI-INT:online:Volumes in DiskGroup PRI-INT will be started automatically as part of import command,the system level autostartvolume is set On
2014/04/22 16:27:45 VCS INFO V-16-2-13717 (TCPRI-CLU2) Output of the completed operation (imf_getnotification)
==============================================
Got notification for group: PRI-INT
==============================================
2014/04/22 16:27:46 VCS INFO V-16-1-10298 Resource PRI-INT (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:27:46 VCS NOTICE V-16-1-10301 Initiating Online of Resource TRAKPRIVOL-INTJRNPRI (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:27:46 VCS NOTICE V-16-1-10301 Initiating Online of Resource TRAKPRIVOL-INTJRNALT (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:27:46 VCS NOTICE V-16-1-10301 Initiating Online of Resource TRAKPRIVOL-INTBACKUP (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:27:46 VCS NOTICE V-16-1-10301 Initiating Online of Resource TRAKPRIVOL-INT-HS (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:27:46 VCS NOTICE V-16-1-10301 Initiating Online of Resource TRAKPRIVOL-INT-DB (Owner: Unspecified, Group: INT-PRI) on System TCPRI-CLU2
2014/04/22 16:27:49 VCS INFO V-16-1-10298 Resource TRAKPRIVOL-INT-HS (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:28:05 VCS INFO V-16-1-10298 Resource TRAKPRIVOL-INTJRNALT (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:28:06 VCS INFO V-16-1-10298 Resource TRAKPRIVOL-INTBACKUP (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:28:07 VCS INFO V-16-1-10298 Resource TRAKPRIVOL-INTJRNPRI (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:28:08 VCS INFO V-16-1-10298 Resource TRAKPRIVOL-INT-DB (Owner: Unspecified, Group: INT-PRI) is online on TCPRI-CLU2 (VCS initiated)
2014/04/22 16:28:08 VCS NOTICE V-16-1-10447 Group INT-PRI is online on system TCPRI-CLU2
===============
04-24-2014 01:30 AM
the last one
04-24-2014 03:18 AM
Hi Solom,
even in the last one the online procedure took only a few seconds.
What took long was the offline procedure because according to the logs the vxconfigd is either not running or not responding:
2014/04/22 16:23:21 VCS WARNING V-16-6-16100 (TCPRI-CLU1) chkvxconfigd:The VxVM process vxconfigd seems to be un-responsive. Stopping vxnotify process, so that resources get unregistered from AMF monitoring
vxconfigd is the main VxVM daemon, who manages the import/export of diskgroups and start/stop of volumes.
Can you check if vxconfigd is running?
#ps -ef | grep vxconfigd
if running check if it is enabled:
#vxdctl mode
if disabled try to enable:
#vxdctl enable
or
vxconfigd -m enable
if vxconfigd is not running or can't be enabled try to restart:
#vxconfigd -k -x syslog
regards,
Dan
04-24-2014 04:36 AM
[root@TCPRI-CLU1 ~]# vxdctl mode
mode: enabled
[root@TCPRI-CLU1 ~]#
04-24-2014 04:40 AM
can you do another simple check to see if the vxconfigd is responsive?
#vxdisk list
04-24-2014 06:54 AM
vxdisk list
DEVICE TYPE DISK GROUP STATUS
disk_0 auto:LVM - - online invalid
eva64000_0 auto:cdsdisk - - online
eva64000_1 auto:cdsdisk - - online
eva64000_2 auto:cdsdisk - - online
eva64000_3 auto:cdsdisk - - online
eva64000_4 auto:cdsdisk - - online
eva64000_59 auto:cdsdisk eva64000_59 PRI-INT online
eva64000_60 auto:cdsdisk eva64000_60 PRI-LAB online
eva64000_61 auto:cdsdisk eva64000_61 PRI-TC online
eva64000_62 auto:cdsdisk eva64000_62 PRI-LAB online
eva64000_63 auto:cdsdisk eva64000_63 PRI-INT online
eva64000_64 auto:cdsdisk eva64000_64 PRI-TC online
Yes it is .
The same configuration in the other side and is working well maybe the problem in the network if there more traffic on the vlan.
04-25-2014 02:59 AM
Hi,
Need check when the information in vcs:
==========
2014/04/22 16:23:21 VCS WARNING V-16-6-16100 (TCPRI-CLU1) chkvxconfigd:The VxVM process vxconfigd seems to be un-responsive. Stopping vxnotify process, so that resources get unregistered from AMF monitoring
========
what happened to vxconfigd.
check /var/log/messages, /etc/vx/dmpevents.log
if needed, check debug log.
04-27-2014 04:08 AM
See the messages.log 27 April .
I attached
04-28-2014 09:48 PM
HI, we'd better check system log on Apr 22.
Anyway, check log in April 27
=========
Apr 27 03:31:30 TCPRI-CLU1 multipathd: mpathbm: load table [0 629145600 multipath 1 queue_if_no_path 0 3 2 round-robin 0 1 1 68:16 1 round-robin 0 4 1 68:192 1 67:96 1 65:80 1 66:176 1 round-robin 0 3 1 69:112 1 8:160 1 66:0 1]
Apr 27 03:31:50 TCPRI-CLU1 multipathd: mpathbm: load table [0 629145600 multipath 1 queue_if_no_path 0 2 1 round-robin 0 4 1 68:192 1 67:96 1 65:80 1 66:176 1 round-robin 0 4 1 68:16 1 69:112 1 8:160 1 66:0 1] <<<<<<<<<<<<<<
Apr 27 03:37:31 TCPRI-CLU1 kernel: __ratelimit: 1 callbacks suppressed
Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:3:3: [sdcc] Unhandled error code<<<<<<<<<<<<
Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:3:3: [sdcc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:3:3: [sdcc] CDB: Read(10): 28 00 00 00 01 20 00 00 10 00
Apr 27 03:37:31 TCPRI-CLU1 kernel: __ratelimit: 1 callbacks suppressed
Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] Unhandled error code
Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] CDB: Read(10): 28 00 14 01 01 40 00 00 02 00
Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] Unhandled error code
Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] CDB: Read(10): 28 00 14 01 01 10 00 00 10 00
Apr 27 03:37:31 TCPRI-CLU1 kernel: sd 2:0:2:8: [sdbw] Unhandled error code
=========
suggestions:
1. if possible, stop multipathd , since dmp may not work with other multi path software together well.
2. check if sth. abnormal, since many "Unhandled error code"
04-29-2014 01:54 AM
dmb ??
05-12-2014 12:55 AM
suggestions:
1. if possible, stop multipathd , since dmp may not work with other multi path software together well.
This problem was .
I'm sorry for the delay in reply
Thank you very much