cancel
Showing results for 
Search instead for 
Did you mean: 

Reccuring unmount of Diskgroups

chuckchang23
Level 3

Hello Experts,

After finding that some of our DGs were unmounted, i have executed VCS clear command on resources of DG which were faulted and then made them online after it. Eventually, DGs were mounted and mountpoints were present to OS. After 2-3 days, we have noticed that these DGs were unmounted due to failing resources again.

2013/05/03 05:25:52 VCS ERROR V-16-10001-1004 (EMMDPD03) CVMVxconfigd:???:monitor:Core file saved once already
2013/05/03 05:25:53 VCS INFO V-16-2-13001 (EMMDPD03) Resource(cvm_vxconfigd): Output of the completed operation (monitor) 
2013/05/03 05:25:53 VCS ERROR V-16-2-13067 (EMMDPD03) Agent is calling clean for resource(cvm_vxconfigd) because the resource became OFFLINE unexpectedly, on its own.
2013/05/03 05:25:54 VCS INFO V-16-2-13068 (EMMDPD03) Resource(cvm_vxconfigd) - clean completed successfully.
2013/05/03 05:25:54 VCS ERROR V-16-2-13073 (EMMDPD03) Resource(cvm_vxconfigd) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 5) the r
esource.
2013/05/03 05:25:54 VCS WARNING V-16-10001-1006 (EMMDPD03) CVMVxconfigd:???:online:attempting to start vxconfigd
2013/05/03 05:26:13 VCS NOTICE V-16-2-13076 (EMMDPD03) Agent has successfully restarted resource(cvm_vxconfigd).
2013/05/03 11:59:12 VCS ERROR V-16-10001-1004 (EMMDPD03) CVMVxconfigd:???:monitor:Core file saved once already
2013/05/03 11:59:13 VCS INFO V-16-2-13001 (EMMDPD03) Resource(cvm_vxconfigd): Output of the completed operation (monitor) 
2013/05/03 11:59:13 VCS ERROR V-16-2-13067 (EMMDPD03) Agent is calling clean for resource(cvm_vxconfigd) because the resource became OFFLINE unexpectedly, on its own.
2013/05/03 11:59:14 VCS INFO V-16-2-13068 (EMMDPD03) Resource(cvm_vxconfigd) - clean completed successfully.
2013/05/03 11:59:14 VCS ERROR V-16-2-13073 (EMMDPD03) Resource(cvm_vxconfigd) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 5) the r
esource.
2013/05/03 11:59:14 VCS WARNING V-16-10001-1006 (EMMDPD03) CVMVxconfigd:???:online:attempting to start vxconfigd
2013/05/03 11:59:32 VCS NOTICE V-16-2-13076 (EMMDPD03) Agent has successfully restarted resource(cvm_vxconfigd).
2013/05/03 14:05:48 VCS INFO V-16-1-50135 User root fired command: hagrp -flush ServerGroup1_DG  EMMDPD03  from localhost
2013/05/03 14:06:27 VCS INFO V-16-1-50135 User root fired command: MSG_RES_PROBE cfsmount2  EMMDPD03  from localhost
2013/05/03 14:06:41 VCS INFO V-16-1-50135 User root fired command: hagrp -offline ServerGroup1_DG  EMMDPD03  from localhost
2013/05/03 14:06:41 VCS NOTICE V-16-1-10167 Initiating manual offline of group ServerGroup1_DG on system EMMDPD03
2013/05/03 14:07:03 VCS INFO V-16-1-50135 User root fired command: hares -clear cvmvoldg2  EMMDPD03  from localhost
2013/05/03 14:07:03 VCS INFO V-16-1-10307 Resource cvmvoldg2 (Owner: unknown, Group: ServerGroup1_DG) is offline on EMMDPD03 (Not initiated by VCS)
2013/05/03 14:07:09 VCS INFO V-16-1-50135 User root fired command: hares -clear cvmvoldg4  EMMDPD03  from localhost
2013/05/03 14:07:09 VCS INFO V-16-1-10307 Resource cvmvoldg4 (Owner: unknown, Group: ora_DG) is offline on EMMDPD03 (Not initiated by VCS)
2013/05/03 14:07:12 VCS INFO V-16-1-50135 User root fired command: hares -clear cvmvoldg5  EMMDPD03  from localhost
2013/05/03 14:07:12 VCS INFO V-16-1-10307 Resource cvmvoldg5 (Owner: unknown, Group: lic_DG) is offline on EMMDPD03 (Not initiated by VCS)
2013/05/03 14:07:15 VCS INFO V-16-1-50135 User root fired command: hares -clear cvmvoldg6  EMMDPD03  from localhost
2013/05/03 14:07:15 VCS INFO V-16-1-10307 Resource cvmvoldg6 (Owner: unknown, Group: lic_DG) is offline on EMMDPD03 (Not initiated by VCS)
2013/05/03 14:20:11 VCS INFO V-16-1-50135 User root fired command: hagrp -flush ServerGroup1_DG  EMMDPD03  from localhost
2013/05/03 14:20:12 VCS INFO V-16-1-10306 Resource cvmvoldg2 (Owner: unknown, Group: ServerGroup1_DG) is offline on EMMDPD03 (Previous State = OFFLINE)
2013/05/03 14:28:05 VCS ERROR V-16-2-13027 (EMMDPD05) Resource(Server3) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:09 VCS ERROR V-16-2-13027 (EMMDPD06) Resource(Server5) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:09 VCS ERROR V-16-2-13027 (EMMDPD06) Resource(Server4) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:11 VCS ERROR V-16-2-13027 (EMMDPD03) Resource(cfsmount2) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:11 VCS ERROR V-16-2-13210 (EMMDPD03) Agent is calling clean for resource(cfsmount2) because 1 successive invocations of the monitor procedure did not 
bash-3.00$ cat /var/VRTSvcs/log/engine_A.log | grep 2013/05/ | more
2013/05/02 06:47:26 VCS ERROR V-16-10001-1004 (EMMDPD03) CVMVxconfigd:???:monitor:Core file saved once already
2013/05/02 06:47:27 VCS INFO V-16-2-13001 (EMMDPD03) Resource(cvm_vxconfigd): Output of the completed operation (monitor) 
2013/05/02 06:47:27 VCS ERROR V-16-2-13067 (EMMDPD03) Agent is calling clean for resource(cvm_vxconfigd) because the resource became OFFLINE unexpectedly, on its own.
2013/05/02 06:47:28 VCS INFO V-16-2-13068 (EMMDPD03) Resource(cvm_vxconfigd) - clean completed successfully.
2013/05/02 06:47:28 VCS ERROR V-16-2-13073 (EMMDPD03) Resource(cvm_vxconfigd) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 5) the r
esource.
2013/05/02 06:47:28 VCS WARNING V-16-10001-1006 (EMMDPD03) CVMVxconfigd:???:online:attempting to start vxconfigd
2013/05/02 06:47:51 VCS NOTICE V-16-2-13076 (EMMDPD03) Agent has successfully restarted resource(cvm_vxconfigd).
2013/05/03 05:25:52 VCS ERROR V-16-10001-1004 (EMMDPD03) CVMVxconfigd:???:monitor:Core file saved once already
2013/05/03 05:25:53 VCS INFO V-16-2-13001 (EMMDPD03) Resource(cvm_vxconfigd): Output of the completed operation (monitor) 
2013/05/03 05:25:53 VCS ERROR V-16-2-13067 (EMMDPD03) Agent is calling clean for resource(cvm_vxconfigd) because the resource became OFFLINE unexpectedly, on its own.
2013/05/03 05:25:54 VCS INFO V-16-2-13068 (EMMDPD03) Resource(cvm_vxconfigd) - clean completed successfully.
2013/05/03 05:25:54 VCS ERROR V-16-2-13073 (EMMDPD03) Resource(cvm_vxconfigd) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 5) the r
esource.
2013/05/03 05:25:54 VCS WARNING V-16-10001-1006 (EMMDPD03) CVMVxconfigd:???:online:attempting to start vxconfigd
2013/05/03 05:26:13 VCS NOTICE V-16-2-13076 (EMMDPD03) Agent has successfully restarted resource(cvm_vxconfigd).
2013/05/03 11:59:12 VCS ERROR V-16-10001-1004 (EMMDPD03) CVMVxconfigd:???:monitor:Core file saved once already
2013/05/03 11:59:13 VCS INFO V-16-2-13001 (EMMDPD03) Resource(cvm_vxconfigd): Output of the completed operation (monitor) 
2013/05/03 11:59:13 VCS ERROR V-16-2-13067 (EMMDPD03) Agent is calling clean for resource(cvm_vxconfigd) because the resource became OFFLINE unexpectedly, on its own.
2013/05/03 11:59:14 VCS INFO V-16-2-13068 (EMMDPD03) Resource(cvm_vxconfigd) - clean completed successfully.
2013/05/03 11:59:14 VCS ERROR V-16-2-13073 (EMMDPD03) Resource(cvm_vxconfigd) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 5) the r
esource.
2013/05/03 11:59:14 VCS WARNING V-16-10001-1006 (EMMDPD03) CVMVxconfigd:???:online:attempting to start vxconfigd
2013/05/03 11:59:32 VCS NOTICE V-16-2-13076 (EMMDPD03) Agent has successfully restarted resource(cvm_vxconfigd).
2013/05/03 14:05:48 VCS INFO V-16-1-50135 User root fired command: hagrp -flush ServerGroup1_DG  EMMDPD03  from localhost
2013/05/03 14:06:27 VCS INFO V-16-1-50135 User root fired command: MSG_RES_PROBE cfsmount2  EMMDPD03  from localhost
2013/05/03 14:06:41 VCS INFO V-16-1-50135 User root fired command: hagrp -offline ServerGroup1_DG  EMMDPD03  from localhost
2013/05/03 14:06:41 VCS NOTICE V-16-1-10167 Initiating manual offline of group ServerGroup1_DG on system EMMDPD03
2013/05/03 14:07:03 VCS INFO V-16-1-50135 User root fired command: hares -clear cvmvoldg2  EMMDPD03  from localhost
2013/05/03 14:07:03 VCS INFO V-16-1-10307 Resource cvmvoldg2 (Owner: unknown, Group: ServerGroup1_DG) is offline on EMMDPD03 (Not initiated by VCS)
2013/05/03 14:07:09 VCS INFO V-16-1-50135 User root fired command: hares -clear cvmvoldg4  EMMDPD03  from localhost
2013/05/03 14:07:09 VCS INFO V-16-1-10307 Resource cvmvoldg4 (Owner: unknown, Group: ora_DG) is offline on EMMDPD03 (Not initiated by VCS)
2013/05/03 14:07:12 VCS INFO V-16-1-50135 User root fired command: hares -clear cvmvoldg5  EMMDPD03  from localhost
2013/05/03 14:07:12 VCS INFO V-16-1-10307 Resource cvmvoldg5 (Owner: unknown, Group: lic_DG) is offline on EMMDPD03 (Not initiated by VCS)
2013/05/03 14:07:15 VCS INFO V-16-1-50135 User root fired command: hares -clear cvmvoldg6  EMMDPD03  from localhost
2013/05/03 14:07:15 VCS INFO V-16-1-10307 Resource cvmvoldg6 (Owner: unknown, Group: lic_DG) is offline on EMMDPD03 (Not initiated by VCS)
2013/05/03 14:20:11 VCS INFO V-16-1-50135 User root fired command: hagrp -flush ServerGroup1_DG  EMMDPD03  from localhost
2013/05/03 14:20:12 VCS INFO V-16-1-10306 Resource cvmvoldg2 (Owner: unknown, Group: ServerGroup1_DG) is offline on EMMDPD03 (Previous State = OFFLINE)
2013/05/03 14:28:05 VCS ERROR V-16-2-13027 (EMMDPD05) Resource(Server3) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:09 VCS ERROR V-16-2-13027 (EMMDPD06) Resource(Server5) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:09 VCS ERROR V-16-2-13027 (EMMDPD06) Resource(Server4) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:11 VCS ERROR V-16-2-13027 (EMMDPD03) Resource(cfsmount2) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:11 VCS ERROR V-16-2-13210 (EMMDPD03) Agent is calling clean for resource(cfsmount2) because 1 successive invocations of the monitor procedure did not 
complete within the expected time.
2013/05/03 14:28:12 VCS INFO V-16-2-13068 (EMMDPD03) Resource(cfsmount2) - clean completed successfully.
2013/05/03 14:28:17 VCS ERROR V-16-2-13027 (EMMDPD07) Resource(Server1) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:26 VCS ERROR V-16-2-13027 (EMMDPD04) Resource(Server2) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:31 VCS ERROR V-16-2-13027 (EMMDPD04) Resource(fmmweb) - monitor procedure did not complete within the expected time.
2013/05/03 14:29:06 VCS ERROR V-16-2-13027 (EMMDPD07) Resource(licserv) - monitor procedure did not complete within the expected time.
2013/05/03 14:31:51 VCS ERROR V-16-2-13027 (EMMDPD05) Resource(cfsmount2) - monitor procedure did not complete within the expected time.
2013/05/03 14:31:51 VCS ERROR V-16-2-13210 (EMMDPD05) Agent is calling clean for resource(cfsmount2) because 1 successive invocations of the monitor procedure did not 
complete within the expected time.
2013/05/03 14:31:51 VCS INFO V-16-2-13026 (EMMDPD07) Resource(Server1) - monitor procedure finished successfully after failing to complete within the expected time for
 (2) consecutive times.
2013/05/03 14:31:52 VCS ERROR V-16-2-13027 (EMMDPD04) Resource(cfsmount2) - monitor procedure did not complete within the expected time.
2013/05/03 14:31:52 VCS ERROR V-16-2-13210 (EMMDPD04) Agent is calling clean for resource(cfsmount2) because 1 successive invocations of the monitor procedure did not 
complete within the expected time.
2013/05/03 14:31:52 VCS ERROR V-16-2-13027 (EMMDPD06) Resource(cfsmount2) - monitor procedure did not complete within the expected time.
2013/05/03 14:31:52 VCS ERROR V-16-2-13210 (EMMDPD06) Agent is calling clean for resource(cfsmount2) because 1 successive invocations of the monitor procedure did not 
complete within the expected time.
2013/05/03 14:31:52 VCS ERROR V-16-2-13027 (EMMDPD07) Resource(cfsmount2) - monitor procedure did not complete within the expected time.
2013/05/03 14:31:52 VCS ERROR V-16-2-13210 (EMMDPD07) Agent is calling clean for resource(cfsmount2) because 1 successive invocations of the monitor procedure did not 
complete within the expected time.
2013/05/03 14:31:52 VCS INFO V-16-2-13026 (EMMDPD04) Resource(fmmweb) - monitor procedure finished successfully after failing to complete within the expected time for 
(2) consecutive times.
2013/05/03 14:31:52 VCS INFO V-16-2-13068 (EMMDPD05) Resource(cfsmount2) - clean completed successfully.
2013/05/03 14:31:52 VCS INFO V-16-2-13082 (EMMDPD05) Resource(cfsmount2) recovered from fault, on its own.
2013/05/03 14:31:53 VCS INFO V-16-2-13026 (EMMDPD04) Resource(Server2) - monitor procedure finished successfully after failing to complete within the expected time for
 (2) consecutive times.
2013/05/03 14:31:53 VCS INFO V-16-2-13068 (EMMDPD04) Resource(cfsmount2) - clean completed successfully.
2013/05/03 14:31:53 VCS INFO V-16-2-13026 (EMMDPD06) Resource(Server5) - monitor procedure finished successfully after failing to complete within the expected time for
 (2) consecutive times.
2013/05/03 14:31:53 VCS INFO V-16-2-13026 (EMMDPD06) Resource(Server4) - monitor procedure finished successfully after failing to complete within the expected time for
 (2) consecutive times.
2013/05/03 14:31:53 VCS INFO V-16-2-13082 (EMMDPD04) Resource(cfsmount2) recovered from fault, on its own.
2013/05/03 14:31:53 VCS INFO V-16-2-13068 (EMMDPD06) Resource(cfsmount2) - clean completed successfully.
2013/05/03 14:31:53 VCS INFO V-16-2-13082 (EMMDPD06) Resource(cfsmount2) recovered from fault, on its own.
2013/05/03 14:31:53 VCS INFO V-16-2-13068 (EMMDPD07) Resource(cfsmount2) - clean completed successfully.
2013/05/03 14:31:53 VCS INFO V-16-2-13082 (EMMDPD07) Resource(cfsmount2) recovered from fault, on its own.
2013/05/03 14:31:53 VCS INFO V-16-2-13026 (EMMDPD05) Resource(Server3) - monitor procedure finished successfully after failing to complete within the expected time for
 (2) consecutive times.
2013/05/03 14:32:13 VCS INFO V-16-2-13026 (EMMDPD03) Resource(cfsmount2) - monitor procedure finished successfully after failing to complete within the expected time f
or (3) consecutive times.
2013/05/03 14:32:13 VCS INFO V-16-2-13082 (EMMDPD03) Resource(cfsmount2) recovered from fault, on its own.
2013/05/03 14:34:05 VCS ERROR V-16-2-13027 (EMMDPD05) Resource(Server3) - monitor procedure did not complete within the expected time.
2013/05/03 14:34:10 VCS ERROR V-16-2-13027 (EMMDPD06) Resource(Server5) - monitor procedure did not complete within the expected time.
2013/05/03 14:34:10 VCS ERROR V-16-2-13027 (EMMDPD06) Resource(Server4) - monitor procedure did not complete within the expected time.
2013/05/03 14:34:13 VCS ERROR V-16-2-13027 (EMMDPD03) Resource(cfsmount2) - monitor procedure did not complete within the expected time.
2013/05/03 14:34:13 VCS ERROR V-16-2-13210 (EMMDPD03) Agent is calling clean for resource(cfsmount2) because 1 successive invocations of the monitor procedure did not 
complete within the expected time.
2013/05/03 14:34:14 VCS INFO V-16-2-13068 (EMMDPD03) Resource(cfsmount2) - clean completed successfully.
2013/05/03 14:34:16 VCS ERROR V-16-2-13027 (EMMDPD07) Resource(Server1) - monitor procedure did not complete within the expected time.
2013/05/03 14:34:26 VCS ERROR V-16-2-13027 (EMMDPD04) Resource(Server2) - monitor procedure did not complete within the expected time.
2013/05/03 14:34:30 VCS ERROR V-16-2-13027 (EMMDPD04) Resource(fmmweb) - monitor procedure did not complete within the expected time.
2013/05/03 14:34:40 VCS INFO V-16-1-10305 Resource cfsmount2 (Owner: unknown, Group: ServerGroup1_DG) is offline on EMMDPD03 (VCS initiated)
2013/05/03 14:34:40 VCS NOTICE V-16-1-10446 Group ServerGroup1_DG is offline on system EMMDPD03
2013/05/03 14:34:40 VCS INFO V-16-6-15002 (EMMDPD03) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/nfs_postoffline EMMDPD03 ServerGroup1_DG   successfully
2013/05/03 14:34:40 VCS WARNING V-16-0 (EMMDPD03) hatrigger:POSTOFFLINE: EMMDPD03 ServerGroup1_DG
2013/05/03 14:34:41 VCS INFO V-16-6-15002 (EMMDPD03) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline EMMDPD03 ServerGroup1_DG   successfully
2013/05/03 14:38:06 VCS INFO V-16-2-13026 (EMMDPD07) Resource(licserv) - monitor procedure finished successfully after failing to complete within the expected time for
 (2) consecutive times.
2013/05/03 14:42:43 VCS NOTICE V-16-1-10166 Initiating manual online of group ServerGroup1_DG on system EMMDPD03
2013/05/03 14:42:43 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group ServerGroup1_DG on node EMMDPD03
2013/05/03 14:42:43 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group ServerGroup1_DG on node EMMDPD04
2013/05/03 14:42:43 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group ServerGroup1_DG on node EMMDPD05
2013/05/03 14:42:43 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group ServerGroup1_DG on node EMMDPD06
2013/05/03 14:42:43 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group ServerGroup1_DG on node EMMDPD07
2013/05/03 14:42:43 VCS NOTICE V-16-1-10301 Initiating Online of Resource cvmvoldg2 (Owner: unknown, Group: ServerGroup1_DG) on System EMMDPD03
2013/05/03 14:42:43 VCS WARNING V-16-20007-1074 (EMMDPD03) CVMVolDg:cvmvoldg2:online:setup_vxnotify: old vxnotify of pid 11084 will be killed. my pid is 21744
2013/05/03 14:42:45 VCS INFO V-16-20007-1046 (EMMDPD03) CVMVolDg:cvmvoldg2:online:resource cvmvoldg2 is online
2013/05/03 14:43:17 VCS INFO V-16-1-10298 Resource cvmvoldg2 (Owner: unknown, Group: ServerGroup1_DG) is online on EMMDPD03 (VCS initiated)
2013/05/03 14:43:17 VCS NOTICE V-16-1-10301 Initiating Online of Resource cfsmount2 (Owner: unknown, Group: ServerGroup1_DG) on System EMMDPD03
2013/05/03 14:43:40 VCS INFO V-16-20011-5506 (EMMDPD03) CFSMount:cfsmount2:online:CFSMOUNT Onlined :MountOptions : -F vxfs -o cluster,cluster           Block Device /d
ev/vx/dsk/bgw1dg/vol01 : MountPoint /var/opt/BGw/ServerGroup1
2013/05/03 14:43:41 VCS INFO V-16-1-10298 Resource cfsmount2 (Owner: unknown, Group: ServerGroup1_DG) is online on EMMDPD03 (VCS initiated)
2013/05/03 14:43:41 VCS NOTICE V-16-1-10447 Group ServerGroup1_DG is online on system EMMDPD03
2013/05/03 14:44:44 VCS NOTICE V-16-1-10166 Initiating manual online of group lic_DG on system EMMDPD03
2013/05/03 14:44:44 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group lic_DG on node EMMDPD03
2013/05/03 14:44:44 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group lic_DG on node EMMDPD04
2013/05/03 14:44:44 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group lic_DG on node EMMDPD05
2013/05/03 14:44:44 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group lic_DG on node EMMDPD06
2013/05/03 14:44:44 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group lic_DG on node EMMDPD07
2013/05/03 14:44:44 VCS NOTICE V-16-1-10301 Initiating Online of Resource cvmvoldg5 (Owner: unknown, Group: lic_DG) on System EMMDPD03
2013/05/03 14:44:44 VCS NOTICE V-16-1-10301 Initiating Online of Resource cvmvoldg6 (Owner: unknown, Group: lic_DG) on System EMMDPD03
2013/05/03 14:44:44 VCS WARNING V-16-20007-1074 (EMMDPD03) CVMVolDg:cvmvoldg5:online:setup_vxnotify: old vxnotify of pid 28916 will be killed. my pid is 13614
2013/05/03 14:44:44 VCS WARNING V-16-20007-1074 (EMMDPD03) CVMVolDg:cvmvoldg6:online:setup_vxnotify: old vxnotify of pid 1860 will be killed. my pid is 13654
2013/05/03 14:44:47 VCS INFO V-16-20007-1046 (EMMDPD03) CVMVolDg:cvmvoldg5:online:resource cvmvoldg5 is online
2013/05/03 14:44:47 VCS INFO V-16-20007-1046 (EMMDPD03) CVMVolDg:cvmvoldg6:online:resource cvmvoldg6 is online
2013/05/03 14:45:19 VCS INFO V-16-1-10298 Resource cvmvoldg6 (Owner: unknown, Group: lic_DG) is online on EMMDPD03 (VCS initiated)
2013/05/03 14:45:19 VCS NOTICE V-16-1-10301 Initiating Online of Resource cfsmount6 (Owner: unknown, Group: lic_DG) on System EMMDPD03
2013/05/03 14:45:19 VCS INFO V-16-1-10298 Resource cvmvoldg5 (Owner: unknown, Group: lic_DG) is online on EMMDPD03 (VCS initiated)
2013/05/03 14:45:19 VCS NOTICE V-16-1-10301 Initiating Online of Resource cfsmount5 (Owner: unknown, Group: lic_DG) on System EMMDPD03
2013/05/03 14:45:19 VCS INFO V-16-20011-5506 (EMMDPD03) CFSMount:cfsmount5:online:CFSMOUNT Onlined :MountOptions : -F vxfs -o cluster,cluster           Block Device /d
ev/vx/dsk/lic1dg/vol01 : MountPoint /var/opt/sentinel
2013/05/03 14:45:19 VCS INFO V-16-20011-5506 (EMMDPD03) CFSMount:cfsmount6:online:CFSMOUNT Onlined :MountOptions : -F vxfs -o cluster,cluster           Block Device /d
ev/vx/dsk/fmm1dg/vol01 : MountPoint /var/opt/mediation/fmmdb
2013/05/03 14:45:20 VCS INFO V-16-1-10298 Resource cfsmount6 (Owner: unknown, Group: lic_DG) is online on EMMDPD03 (VCS initiated)
2013/05/03 14:45:20 VCS INFO V-16-1-10298 Resource cfsmount5 (Owner: unknown, Group: lic_DG) is online on EMMDPD03 (VCS initiated)
2013/05/03 14:45:20 VCS NOTICE V-16-1-10447 Group lic_DG is online on system EMMDPD03
2013/05/03 14:45:48 VCS NOTICE V-16-1-10166 Initiating manual online of group ora_DG on system EMMDPD03
2013/05/03 14:45:48 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group ora_DG on node EMMDPD03
2013/05/03 14:45:48 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group ora_DG on node EMMDPD04
2013/05/03 14:45:48 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group ora_DG on node EMMDPD05
2013/05/03 14:45:48 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group ora_DG on node EMMDPD06
2013/05/03 14:45:48 VCS NOTICE V-16-1-10232 Clearing Restart attribute for group ora_DG on node EMMDPD07
2013/05/03 14:45:48 VCS NOTICE V-16-1-10301 Initiating Online of Resource cvmvoldg4 (Owner: unknown, Group: ora_DG) on System EMMDPD03
2013/05/03 14:45:48 VCS WARNING V-16-20007-1074 (EMMDPD03) CVMVolDg:cvmvoldg4:online:setup_vxnotify: old vxnotify of pid 16612 will be killed. my pid is 26115
2013/05/03 14:45:50 VCS INFO V-16-20007-1046 (EMMDPD03) CVMVolDg:cvmvoldg4:online:resource cvmvoldg4 is online
2013/05/03 14:46:22 VCS INFO V-16-1-10298 Resource cvmvoldg4 (Owner: unknown, Group: ora_DG) is online on EMMDPD03 (VCS initiated)
2013/05/03 14:46:22 VCS NOTICE V-16-1-10301 Initiating Online of Resource cfsmount4 (Owner: unknown, Group: ora_DG) on System EMMDPD03
2013/05/03 14:46:22 VCS INFO V-16-20011-5506 (EMMDPD03) CFSMount:cfsmount4:online:CFSMOUNT Onlined :MountOptions : -F vxfs -o cluster,cluster           Block Device /d
ev/vx/dsk/ora1dg/vol01 : MountPoint /var/opt/mediation/ora
2013/05/03 14:46:23 VCS INFO V-16-1-10298 Resource cfsmount4 (Owner: unknown, Group: ora_DG) is online on EMMDPD03 (VCS initiated)
2013/05/03 14:46:23 VCS NOTICE V-16-1-10447 Group ora_DG is online on system EMMDPD03
2013/05/03 15:03:18 VCS ERROR V-16-20007-1017 (EMMDPD03) CVMVolDg:cvmvoldg2:monitor:check_notify_status: can't stabilise vxstat
2013/05/03 15:25:22 VCS ERROR V-16-20007-1017 (EMMDPD03) CVMVolDg:cvmvoldg5:monitor:check_notify_status: can't stabilise vxstat
2013/05/03 15:25:22 VCS ERROR V-16-20007-1017 (EMMDPD03) CVMVolDg:cvmvoldg6:monitor:check_notify_status: can't stabilise vxstat
2013/05/03 15:26:21 VCS ERROR V-16-20007-1017 (EMMDPD03) CVMVolDg:cvmvoldg2:monitor:check_notify_status: can't stabilise vxstat
2013/05/03 15:26:22 VCS ERROR V-16-20007-1017 (EMMDPD03) CVMVolDg:cvmvoldg5:monitor:check_notify_status: can't stabilise vxstat
2013/05/03 15:27:23 VCS ERROR V-16-20007-1017 (EMMDPD03) CVMVolDg:cvmvoldg6:monitor:check_notify_status: can't stabilise vxstat

 

May i ask what is causing it? When checking on the engine logs, resources are failing with out giving a root cause 

 

1 ACCEPTED SOLUTION

Accepted Solutions

mikebounds
Level 6
Partner Accredited

Use:

hatype -display | grep FaultOnMonitorTimeouts

This will give FaultOnMonitorTimeouts for all types.

You could may be monitor system resource in cron and if you see issue again, check to see if this coincides with high CPU or you could run "fsclustadm" in cron which is what VCS uses to monitor CFS mounts to see if this is really hanging.

Mike

View solution in original post

3 REPLIES 3

mikebounds
Level 6
Partner Accredited

There is normally no way of VCS knowing why a resource is failing, so in your case the vxconfigd is dying - so VCS just sees it is not there when it checks - it doesn't know why it is down and with several other resources, they are timing out:

 

2013/05/03 14:28:17 VCS ERROR V-16-2-13027 (EMMDPD07) Resource(Server1) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:26 VCS ERROR V-16-2-13027 (EMMDPD04) Resource(Server2) - monitor procedure did not complete within the expected time.
2013/05/03 14:28:31 VCS ERROR V-16-2-13027 (EMMDPD04) Resource(fmmweb) - monitor procedure did not complete within the expected time.
2013/05/03 14:29:06 VCS ERROR V-16-2-13027 (EMMDPD07) Resource(licserv) - monitor procedure did not complete within the expected time.
2013/05/03 14:31:51 VCS ERROR V-16-2-13027 (EMMDPD05) Resource(cfsmount2) - monitor procedure did not complete within the expected time.
 
If a resource monitor times out more times than is set by FaultOnMonitorTimeouts then it faults, but this is by default 4, so for most resources you are seeing:
 2013/05/03 14:38:06 VCS INFO V-16-2-13026 (EMMDPD07) Resource(licserv) - monitor procedure finished successfully after failing to complete within the expected time for (2) consecutive times.
 
So the resource does not fault, but I do see
2013/05/03 14:31:52 VCS ERROR V-16-2-13210 (EMMDPD07) Agent is calling clean for resource(cfsmount2) because 1 successive invocations of the monitor procedure did not complete within the expected time.
 
So this suggests that that maybe FaultOnMonitorTimeouts for the CFSMount resource type has been changed from its default of 4
 
However this resource recovers on its own:
2013/05/03 14:31:53 VCS INFO V-16-2-13082 (EMMDPD04) Resource(cfsmount2) recovered from fault, on its own.
 
So as you have several different types of resources timing out, this is probably caused by a system resource issue so the monitors do not have enough system resource (CPU and memory) to complete the monitor entrypoint in time and low system resource, MAY also be causing vxconfigd to be failing.
 
Mike
 

 

chuckchang23
Level 3

Hi Mikebounds,

 

Is there a way to check the setting of faultonmonitor? Also, how can we verify low system resource. When i checked my system, processes/disk util are low.

mikebounds
Level 6
Partner Accredited

Use:

hatype -display | grep FaultOnMonitorTimeouts

This will give FaultOnMonitorTimeouts for all types.

You could may be monitor system resource in cron and if you see issue again, check to see if this coincides with high CPU or you could run "fsclustadm" in cron which is what VCS uses to monitor CFS mounts to see if this is really hanging.

Mike