services not started after node reboot in VCS 5.1
Hello,
we have a 4 node VCS 5.1 cluster (EngineVersion 5.1.10.40) configured with I/O fencing. 2 of the nodes are for the applications and the other 2 for the DB.
Applications and the DB are configured to run only on the designated nodes.
We are testing a complete heartbeat failure by shutting down the switch. The fencing works properly and three nodes are rebooted.
We have observed a situation where 2 of the services was not started after both application nodes have rebooted. The service was in PARTIAL state.
mmsoap-rg State lpdmc1p |PARTIAL|
mmsoap-rg State lpdmc2p |OFFLINE|
smppc-rg State lpdmc1p |PARTIAL|
smppc-rg State lpdmc2p |OFFLINE|
Here it is their configuration:
group mmsoap-rg (
SystemList = { lpdmc1p = 0, lpdmc2p = 1 }
AutoStartList = { lpdmc1p, lpdmc2p }
)
IP mmsoap-lh-res (
Device = bond0
Address = "10.40.248.199"
NetMask = "255.255.255.224"
)
LVMLogicalVolume opt-mmsoap-lv-res (
VolumeGroup = mmsoap-vg
LogicalVolume = opt-mmsoap-lv
)
LVMLogicalVolume var-opt-mmsoap-lv-res (
VolumeGroup = mmsoap-vg
LogicalVolume = var-opt-mmsoap-lv
)
LVMVolumeGroup mmsoap-vg-res (
VolumeGroup = mmsoap-vg
EnableLVMTagging = 1
)
Mount opt-mmsoap-mnt-res (
MountOpt = "rw,noatime,nodiratime,nosuid,nodev"
FsckOpt = "-y"
BlockDevice = "/dev/mapper/mmsoap--vg-opt--mmsoap--lv"
MountPoint = "/opt/mmsoap"
FSType = ext3
)
Mount var-opt-mmsoap-mnt-res (
MountOpt = "rw,noatime,nodiratime,nosuid,nodev"
FsckOpt = "-y"
BlockDevice = "/dev/mapper/mmsoap--vg-var--opt--mmsoap--lv"
MountPoint = "/var/opt/mmsoap"
FSType = ext3
)
NIC mmsoap-nic-res (
Device = bond0
)
SicapApplication mmsoap-app-res (
AppUser = mmsoap
)
requires group smppc-rg online global firm
mmsoap-app-res requires mmsoap-lh-res
mmsoap-app-res requires opt-mmsoap-mnt-res
mmsoap-app-res requires var-opt-mmsoap-mnt-res
mmsoap-lh-res requires mmsoap-nic-res
opt-mmsoap-lv-res requires mmsoap-vg-res
opt-mmsoap-mnt-res requires opt-mmsoap-lv-res
var-opt-mmsoap-lv-res requires mmsoap-vg-res
var-opt-mmsoap-mnt-res requires var-opt-mmsoap-lv-res
// resource dependency tree
//
// group mmsoap-rg
// {
// SicapApplication mmsoap-app-res
// {
// IP mmsoap-lh-res
// {
// NIC mmsoap-nic-res
// }
// Mount opt-mmsoap-mnt-res
// {
// LVMLogicalVolume opt-mmsoap-lv-res
// {
// LVMVolumeGroup mmsoap-vg-res
// }
// }
// Mount var-opt-mmsoap-mnt-res
// {
// LVMLogicalVolume var-opt-mmsoap-lv-res
// {
// LVMVolumeGroup mmsoap-vg-res
// }
// }
// }
// }
group smppc-rg (
SystemList = { lpdmc1p = 0, lpdmc2p = 1 }
AutoStartList = { lpdmc1p, lpdmc2p }
)
IP smppc-lh-res (
Device = bond0
Address = "10.40.248.200"
NetMask = "255.255.255.224"
)
LVMLogicalVolume opt-smppc-lv-res (
VolumeGroup = smppc-vg
LogicalVolume = opt-smppc-lv
)
LVMLogicalVolume var-opt-smppc-lv-res (
VolumeGroup = smppc-vg
LogicalVolume = var-opt-smppc-lv
)
LVMVolumeGroup smppc-vg-res (
VolumeGroup = smppc-vg
EnableLVMTagging = 1
)
Mount opt-smppc-mnt-res (
MountOpt = "rw,noatime,nodiratime,nosuid,nodev"
FsckOpt = "-y"
BlockDevice = "/dev/mapper/smppc--vg-opt--smppc--lv"
MountPoint = "/opt/smppc"
FSType = ext3
)
Mount var-opt-smppc-mnt-res (
MountOpt = "rw,noatime,nodiratime,nosuid,nodev"
FsckOpt = "-y"
BlockDevice = "/dev/mapper/smppc--vg-var--opt--smppc--lv"
MountPoint = "/var/opt/smppc"
FSType = ext3
)
NIC smppc-nic-res (
Device = bond0
)
SicapApplication smppc-app-res (
AppUser = smppc
)
requires group mmg-rg online global firm
opt-smppc-lv-res requires smppc-vg-res
opt-smppc-mnt-res requires opt-smppc-lv-res
smppc-app-res requires opt-smppc-mnt-res
smppc-app-res requires smppc-lh-res
smppc-app-res requires var-opt-smppc-mnt-res
smppc-lh-res requires smppc-nic-res
var-opt-smppc-lv-res requires smppc-vg-res
var-opt-smppc-mnt-res requires var-opt-smppc-lv-res
// resource dependency tree
//
// group smppc-rg
// {
// SicapApplication smppc-app-res
// {
// Mount opt-smppc-mnt-res
// {
// LVMLogicalVolume opt-smppc-lv-res
// {
// LVMVolumeGroup smppc-vg-res
// }
// }
// IP smppc-lh-res
// {
// NIC smppc-nic-res
// }
// Mount var-opt-smppc-mnt-res
// {
// LVMLogicalVolume var-opt-smppc-lv-res
// {
// LVMVolumeGroup smppc-vg-res
// }
// }
// }
// }
The state of its resources:
#Resource Attribute System Value
mmsoap-lh-res State lpdmc1p OFFLINE
mmsoap-lh-res State lpdmc2p OFFLINE
opt-mmsoap-lv-res State lpdmc1p ONLINE
opt-mmsoap-lv-res State lpdmc2p OFFLINE
var-opt-mmsoap-lv-res State lpdmc1p ONLINE
var-opt-mmsoap-lv-res State lpdmc2p OFFLINE
mmsoap-vg-res State lpdmc1p OFFLINE
mmsoap-vg-res State lpdmc2p OFFLINE
opt-mmsoap-mnt-res State lpdmc1p OFFLINE
opt-mmsoap-mnt-res State lpdmc2p OFFLINE
var-opt-mmsoap-mnt-res State lpdmc1p OFFLINE
var-opt-mmsoap-mnt-res State lpdmc2p OFFLINE
mmsoap-nic-res State lpdmc1p ONLINE
mmsoap-nic-res State lpdmc2p ONLINE
mmsoap-app-res State lpdmc1p OFFLINE
mmsoap-app-res State lpdmc2p OFFLINE
smppc-lh-res State lpdmc1p OFFLINE
smppc-lh-res State lpdmc2p OFFLINE
opt-smppc-lv-res State lpdmc1p ONLINE
opt-smppc-lv-res State lpdmc2p OFFLINE
var-opt-smppc-lv-res State lpdmc1p ONLINE
var-opt-smppc-lv-res State lpdmc2p OFFLINE
smppc-vg-res State lpdmc1p OFFLINE
smppc-vg-res State lpdmc2p OFFLINE
opt-smppc-mnt-res State lpdmc1p OFFLINE
opt-smppc-mnt-res State lpdmc2p OFFLINE
var-opt-smppc-mnt-res State lpdmc1p OFFLINE
var-opt-smppc-mnt-res State lpdmc2p OFFLINE
smppc-nic-res State lpdmc1p ONLINE
smppc-nic-res State lpdmc2p ONLINE
smppc-app-res State lpdmc1p OFFLINE
smppc-app-res State lpdmc2p OFFLINE
In the engine log I had the following messages related to them:
2016/04/05 02:27:45 VCS NOTICE V-16-1-10181 Group mmsoap-rg AutoRestart set to 1
2016/04/05 02:27:46 VCS INFO V-16-1-10304 Resource mmsoap-lh-res (Owner: Unspecified, Group: mmsoap-rg) is offline on lpdmc1p (First probe)
2016/04/05 02:27:46 VCS INFO V-16-1-10297 Resource opt-mmsoap-lv-res (Owner: Unspecified, Group: mmsoap-rg) is online on lpdmc1p (First probe)
2016/04/05 02:27:46 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group mmsoap-rg on all nodes
2016/04/05 02:27:46 VCS INFO V-16-1-10297 Resource var-opt-mmsoap-lv-res (Owner: Unspecified, Group: mmsoap-rg) is online on lpdmc1p (First probe)
2016/04/05 02:27:46 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group mmsoap-rg on all nodes
2016/04/05 02:27:46 VCS INFO V-16-1-10304 Resource opt-mmsoap-mnt-res (Owner: Unspecified, Group: mmsoap-rg) is offline on lpdmc1p (First probe)
2016/04/05 02:27:46 VCS INFO V-16-1-10304 Resource var-opt-mmsoap-mnt-res (Owner: Unspecified, Group: mmsoap-rg) is offline on lpdmc1p (First probe)
2016/04/05 02:27:47 VCS INFO V-16-1-10304 Resource mmsoap-app-res (Owner: Unspecified, Group: mmsoap-rg) is offline on lpdmc1p (First probe)
2016/04/05 02:27:48 VCS INFO V-16-1-10304 Resource mmsoap-vg-res (Owner: Unspecified, Group: mmsoap-rg) is offline on lpdmc1p (First probe)
2016/04/05 02:27:48 VCS NOTICE V-16-1-10438 Group mmsoap-rg has been probed on system lpdmc1p
if I manually onlined the group that was OK but I had to separately online both:
2016/04/05 05:58:03 VCS INFO V-16-1-50135 User root fired command: hagrp -online -any smppc-rg localclus from localhost
2016/04/05 05:58:03 VCS NOTICE V-16-1-10301 Initiating Online of Resource smppc-lh-res (Owner: Unspecified, Group: smppc-rg) on System lpdmc1p
2016/04/05 05:58:03 VCS NOTICE V-16-1-10301 Initiating Online of Resource smppc-vg-res (Owner: Unspecified, Group: smppc-rg) on System lpdmc1p
2016/04/05 05:58:03 VCS NOTICE V-16-1-10301 Initiating Online of Resource opt-smppc-mnt-res (Owner: Unspecified, Group: smppc-rg) on System lpdmc1p
2016/04/05 05:58:03 VCS NOTICE V-16-1-10301 Initiating Online of Resource var-opt-smppc-mnt-res (Owner: Unspecified, Group: smppc-rg) on System lpdmc1p
2016/04/05 05:58:04 VCS ERROR V-16-10031-14001 (lpdmc1p) LVMVolumeGroup:smppc-vg-res:online:Activation of volume group failed.
2016/04/05 05:58:05 VCS INFO V-16-1-10298 Resource opt-smppc-mnt-res (Owner: Unspecified, Group: smppc-rg) is online on lpdmc1p (VCS initiated)
2016/04/05 05:58:05 VCS INFO V-16-1-10298 Resource var-opt-smppc-mnt-res (Owner: Unspecified, Group: smppc-rg) is online on lpdmc1p (VCS initiated)
2016/04/05 05:58:06 VCS INFO V-16-1-10298 Resource smppc-vg-res (Owner: Unspecified, Group: smppc-rg) is online on lpdmc1p (VCS initiated)
2016/04/05 05:58:16 VCS INFO V-16-1-10298 Resource smppc-lh-res (Owner: Unspecified, Group: smppc-rg) is online on lpdmc1p (VCS initiated)
2016/04/05 05:58:16 VCS NOTICE V-16-1-10301 Initiating Online of Resource smppc-app-res (Owner: Unspecified, Group: smppc-rg) on System lpdmc1p
2016/04/05 05:58:16 VCS INFO V-16-1-0 (lpdmc1p) SicapApplication:???:???:Running preonline for resource smppc-app-res
2016/04/05 05:58:16 VCS INFO V-16-1-0 (lpdmc1p) SicapApplication:???:???:Preonline for resource smppc-app-res finished
2016/04/05 05:58:16 VCS INFO V-16-1-0 (lpdmc1p) SicapApplication:???:???:Starting resource smppc-app-res
2016/04/05 05:58:21 VCS INFO V-16-1-0 (lpdmc1p) SicapApplication:???:???:Resource smppc-app-res is started
2016/04/05 05:58:34 VCS INFO V-16-1-10298 Resource smppc-app-res (Owner: Unspecified, Group: smppc-rg) is online on lpdmc1p (VCS initiated)
2016/04/05 05:58:34 VCS NOTICE V-16-1-10447 Group smppc-rg is online on system lpdmc1p
2016/04/05 05:59:52 VCS INFO V-16-1-50135 User root fired command: hagrp -online -any mmsoap-rg localclus from localhost
2016/04/05 05:59:52 VCS NOTICE V-16-1-10301 Initiating Online of Resource mmsoap-lh-res (Owner: Unspecified, Group: mmsoap-rg) on System lpdmc1p
2016/04/05 05:59:52 VCS NOTICE V-16-1-10301 Initiating Online of Resource mmsoap-vg-res (Owner: Unspecified, Group: mmsoap-rg) on System lpdmc1p
2016/04/05 05:59:52 VCS NOTICE V-16-1-10301 Initiating Online of Resource opt-mmsoap-mnt-res (Owner: Unspecified, Group: mmsoap-rg) on System lpdmc1p
2016/04/05 05:59:52 VCS NOTICE V-16-1-10301 Initiating Online of Resource var-opt-mmsoap-mnt-res (Owner: Unspecified, Group: mmsoap-rg) on System lpdmc1p
2016/04/05 05:59:53 VCS ERROR V-16-10031-14001 (lpdmc1p) LVMVolumeGroup:mmsoap-vg-res:online:Activation of volume group failed.
2016/04/05 05:59:53 VCS INFO V-16-1-10298 Resource opt-mmsoap-mnt-res (Owner: Unspecified, Group: mmsoap-rg) is online on lpdmc1p (VCS initiated)
2016/04/05 05:59:53 VCS INFO V-16-1-10298 Resource var-opt-mmsoap-mnt-res (Owner: Unspecified, Group: mmsoap-rg) is online on lpdmc1p (VCS initiated)
2016/04/05 05:59:54 VCS INFO V-16-1-10298 Resource mmsoap-vg-res (Owner: Unspecified, Group: mmsoap-rg) is online on lpdmc1p (VCS initiated)
2016/04/05 06:00:02 VCS INFO V-16-1-10298 Resource mmsoap-lh-res (Owner: Unspecified, Group: mmsoap-rg) is online on lpdmc1p (VCS initiated)
2016/04/05 06:00:02 VCS NOTICE V-16-1-10301 Initiating Online of Resource mmsoap-app-res (Owner: Unspecified, Group: mmsoap-rg) on System lpdmc1p
2016/04/05 06:00:02 VCS INFO V-16-1-0 (lpdmc1p) SicapApplication:???:???:Running preonline for resource mmsoap-app-res
2016/04/05 06:00:03 VCS INFO V-16-1-0 (lpdmc1p) SicapApplication:???:???:Preonline for resource mmsoap-app-res finished
2016/04/05 06:00:03 VCS INFO V-16-1-0 (lpdmc1p) SicapApplication:???:???:Starting resource mmsoap-app-res
2016/04/05 06:00:09 VCS INFO V-16-1-0 (lpdmc1p) SicapApplication:???:???:Resource mmsoap-app-res is started
2016/04/05 06:00:21 VCS INFO V-16-1-10298 Resource mmsoap-app-res (Owner: Unspecified, Group: mmsoap-rg) is online on lpdmc1p (VCS initiated)
2016/04/05 06:00:21 VCS NOTICE V-16-1-10447 Group mmsoap-rg is online on system lpdmc1p
What can I do to have this service started automatically in such a case when the HB is lost and the nodes are rebooted due to fencing panic?
Thank you in advance,
Laszlo