Business Continuity

            Existing VCS Resource Dependency

VCS resource dependencies are construct used for linking two resources. A dependency consists of parent & child resources, and dictates start/stop order for the resources in the dependency. For starting/keeping online a parent resource, all child resources must be started/kept online first. Currently a parent resource can depend upon multiple child resources but all child resources must be onlined before trying to online parent resource.

With growing use of dependencies in Datacenters, at times it is needed that rather than parent resource depending upon child resource, it depends upon set of child resources. In certain configurations/scenarios, parent resource depends upon set of ‘n’ child resources out of which minimum ‘m’ child resources must be onlined before parent resource is onlined. While child and parent resources are online, user can offline a child resource from a set of ‘n’ child resources till ‘m’ child resources from the set are still online. Similarly while child and parent resources are online; if a child resource abruptly stops from set of ‘n’ child resources and still ‘m’ child resources are running from the set, then parent resource remains unaffected. These scenarios discussed were not handled by existing dependency. Following use cases elaborate the requirement.

Use Cases.jpg

SFRAC: atleast 1 OCR must be online for onlining CSSD.

CP Server: atleast 1 IP from each subnet must be online for onlining CP Server.

File Store: atleast 1 mount point must online for onlining virtual IP.

.

            ‘atleast’ VCS Resource Dependency

In VCS 6.2 release, a new dependency called ‘atleast’ was introduced. In case of ‘atleast’ dependency, from the set of ‘n’ child resources of a parent resource, at least ‘m’ child resource(s) must be onlined before parent resource is onlined. As per dependency’s definitions, child resources must be onlined before parent resource. This holds true for ‘atleast’ dependency also. ‘atleast’ dependency provides flexibility of onlining any ‘m’(minimum) child resources from set of total ‘n’ child resources. ‘atleast’ dependency is elaborated with the help of ora_grp group consisting of a cssd resource and 5 ocr resources.

.

How resources are linked?

CLI:

# hares -link cssd ocr1,ocr2,ocr3,ocr4,ocr5 -min 1

main.cf:

group ora_grp (
	.
	.
        )
	.
	.
	... Resource definitions ...
	.
	.
	cssd requires atleast 1 from ocr1,ocr2,ocr3,ocr4,ocr5

Minimum criteria must be greater than 1 and less than total number of child resources in the set.

.

How dependent resources are displayed?

To maintain backward compatibility, ‘atleast’ dependency is flatten and displayed in existing format. With ‘-atleast’ switch, it is displayed in new format.

Without -atleast switch:

# hares -dep
#Group       Parent      Child
ora_grp      cssd        ocr5
ora_grp      cssd        ocr4
ora_grp      cssd        ocr3
ora_grp      cssd        ocr2
ora_grp      cssd        ocr1

With -atleast switch:

# hares -dep -atleast
#Group       Parent      Child
ora_grp      cssd        ocr1, ocr2, ocr3, ocr4, ocr5. Min = 1

In VCS Java GUI and VOM, atleast dependency is shown as existing dependency.

Dependency View.jpg

.

How resources are unlinked?

Partial unlinking isn’t allowed. All resources must be specified while unlinking ‘atleast’ dependency. Order of child resources is not important.

# hares -dep -atleast
#Group       Parent     Child
ora_grp      cssd       ocr1, ocr2, ocr3, ocr4, ocr5. Min = 1


# hares -unlink cssd ocr1
VCS WARNING V-16-1-10279 Could not unlink cssd and ocr1. cssd and ocr1 are part of an 'atleast' dependency. Individual/partial unlinking is not allowed in 'atleast' dependency.

# hares -unlink cssd ocr1,ocr2,ocr4,ocr3,ocr5
# echo $?
0

# hares -dep -atleast
VCS WARNING V-16-1-50034 No Resource dependencies are configured

.

Service group state computation with ‘atleast’ resource dependency.

Fault/offline state of ‘atleast’ child resources is tolerated if minimum criteria is met for ‘atleast’ dependency. Service group will be still reported online.

.

            Online operation of resources linked with ‘atleast’ dependency

Online operation of resources with atleast dependency can be elaborated with following pseudo code

If (child resource is part of 'atleast' dependency)
Then
	If (Number of child resources online from set < m)
	Then
		Do not initiate online of parent resource.
	Else If (Number of child resources online from set == m)
	Then
		At least m child resources from set of n child resources are ONLINE.
		Initiate online of parent resource.
	Else If (Number of child resources online from set > m)
		Online of parent resource has been already initiated.
		Do nothing.
	End If
Else
	Existing dependency.
	Initiate online of parent resource if all other child resources has already completed online.
End If

E.g. onlining ora_grp service group which has resources linked with ‘atleast’ dependency

#Group       Attribute             System       Value
ora_grp      State                 vcslx545-vm1 |OFFLINE|
ora_grp      State                 vcslx545-vm2 |OFFLINE|

#Group       Parent     Child
ora_grp      cssd       ocr1, ocr2, ocr3, ocr4, ocr5. Min = 1

# hagrp -online ora_grp -any
VCS NOTICE V-16-1-50735 Attempting to online group on system vcslx545-vm1

All child resources are started concurrently.

#Group       Attribute             System       Value
ora_grp      State                 vcslx545-vm1 |OFFLINE|STARTING|
ora_grp      State                 vcslx545-vm2 |OFFLINE|

#Resource    Attribute               System                 Value
cssd         IState                  localclus:vcslx545-vm1 waiting for children online
cssd         IState                  localclus:vcslx545-vm2 not waiting
cssd         State                   localclus:vcslx545-vm1 OFFLINE
cssd         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr1         IState                  localclus:vcslx545-vm1 waiting to go online
ocr1         IState                  localclus:vcslx545-vm2 not waiting
ocr1         State                   localclus:vcslx545-vm1 OFFLINE
ocr1         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr2         IState                  localclus:vcslx545-vm1 waiting to go online
ocr2         IState                  localclus:vcslx545-vm2 not waiting
ocr2         State                   localclus:vcslx545-vm1 OFFLINE
ocr2         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr3         IState                  localclus:vcslx545-vm1 waiting to go online
ocr3         IState                  localclus:vcslx545-vm2 not waiting
ocr3         State                   localclus:vcslx545-vm1 OFFLINE
ocr3         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr4         IState                  localclus:vcslx545-vm1 waiting to go online
ocr4         IState                  localclus:vcslx545-vm2 not waiting
ocr4         State                   localclus:vcslx545-vm1 OFFLINE
ocr4         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr5         IState                  localclus:vcslx545-vm1 waiting to go online
ocr5         IState                  localclus:vcslx545-vm2 not waiting
ocr5         State                   localclus:vcslx545-vm1 OFFLINE
ocr5         State                   localclus:vcslx545-vm2 OFFLINE

As soon as criteria is met, parent resource is started even if some child resources are still in process of onlining.

#Group       Attribute             System       Value
ora_grp      State                 vcslx545-vm1 |PARTIAL|STARTING|
ora_grp      State                 vcslx545-vm2 |OFFLINE|

#Resource    Attribute               System                 Value
cssd         IState                  localclus:vcslx545-vm1 waiting to go online
cssd         IState                  localclus:vcslx545-vm2 not waiting
cssd         State                   localclus:vcslx545-vm1 OFFLINE
cssd         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr1         IState                  localclus:vcslx545-vm1 waiting to go online
ocr1         IState                  localclus:vcslx545-vm2 not waiting
ocr1         State                   localclus:vcslx545-vm1 OFFLINE
ocr1         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr2         IState                  localclus:vcslx545-vm1 not waiting
ocr2         IState                  localclus:vcslx545-vm2 not waiting
ocr2         State                   localclus:vcslx545-vm1 ONLINE
ocr2         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr3         IState                  localclus:vcslx545-vm1 waiting to go online
ocr3         IState                  localclus:vcslx545-vm2 not waiting
ocr3         State                   localclus:vcslx545-vm1 OFFLINE
ocr3         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr4         IState                  localclus:vcslx545-vm1 waiting to go online
ocr4         IState                  localclus:vcslx545-vm2 not waiting
ocr4         State                   localclus:vcslx545-vm1 OFFLINE
ocr4         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr5         IState                  localclus:vcslx545-vm1 waiting to go online
ocr5         IState                  localclus:vcslx545-vm2 not waiting
ocr5         State                   localclus:vcslx545-vm1 OFFLINE
ocr5         State                   localclus:vcslx545-vm2 OFFLINE

When parent resource completes with minimum criteria met, Service Group is reported ONLINE. Some child resources are still in process of onlining.

#Group       Attribute             System       Value
ora_grp      State                 vcslx545-vm1 |ONLINE|
ora_grp      State                 vcslx545-vm2 |OFFLINE|

#Resource    Attribute               System                 Value
cssd         IState                  localclus:vcslx545-vm1 not waiting
cssd         IState                  localclus:vcslx545-vm2 not waiting
cssd         State                   localclus:vcslx545-vm1 ONLINE
cssd         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr1         IState                  localclus:vcslx545-vm1 waiting to go online
ocr1         IState                  localclus:vcslx545-vm2 not waiting
ocr1         State                   localclus:vcslx545-vm1 OFFLINE
ocr1         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr2         IState                  localclus:vcslx545-vm1 not waiting
ocr2         IState                  localclus:vcslx545-vm2 not waiting
ocr2         State                   localclus:vcslx545-vm1 ONLINE
ocr2         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr3         IState                  localclus:vcslx545-vm1 not waiting
ocr3         IState                  localclus:vcslx545-vm2 not waiting
ocr3         State                   localclus:vcslx545-vm1 ONLINE
ocr3         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr4         IState                  localclus:vcslx545-vm1 waiting to go online
ocr4         IState                  localclus:vcslx545-vm2 not waiting
ocr4         State                   localclus:vcslx545-vm1 OFFLINE
ocr4         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr5         IState                  localclus:vcslx545-vm1 not waiting
ocr5         IState                  localclus:vcslx545-vm2 not waiting
ocr5         State                   localclus:vcslx545-vm1 ONLINE
ocr5         State                   localclus:vcslx545-vm2 OFFLINE

.

            Offline operation/Fault of resources linked with ‘atleast’ dependency

Offline operation OR fault of resources with atleast dependency can be elaborated with following pseudo code.

If (parent resource is ONLINE)
Then
	If (dependency type == 'atleast)
	Then
		If still at least 'm' child resources are ONLINE from set of 'n' child resources
		Then
			Parent resource can continue to be ONLINE.
			No action required.
		Else
			Parent resource cannot continue to be ONLINE.
			Take action according to *rigidity of dependency.
		End If
	Else
		Existing dependency.
		Take action according to *rigidity of dependency.
	End If
End If

E.g. Offlining child resources while parent resource is still online.

#Group       Parent     Child
ora_grp      cssd       ocr1, ocr2, ocr3, ocr4, ocr5. Min = 1

#Group       Attribute             System       Value
ora_grp      State                 vcslx545-vm1 |ONLINE|
ora_grp      State                 vcslx545-vm2 |OFFLINE|

#Resource    Attribute               System                 Value
cssd         Critical                localclus              1
cssd         State                   localclus:vcslx545-vm1 ONLINE
cssd         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr1         Critical                localclus              1
ocr1         State                   localclus:vcslx545-vm1 ONLINE
ocr1         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr2         Critical                localclus              1
ocr2         State                   localclus:vcslx545-vm1 ONLINE
ocr2         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr3         Critical                localclus              1
ocr3         State                   localclus:vcslx545-vm1 ONLINE
ocr3         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr4         Critical                localclus              1
ocr4         State                   localclus:vcslx545-vm1 ONLINE
ocr4         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr5         Critical                localclus              1
ocr5         State                   localclus:vcslx545-vm1 ONLINE
ocr5         State                   localclus:vcslx545-vm2 OFFLINE

# hares -offline ocr1 -sys vcslx545-vm1

#Group       Parent     Child
ora_grp      cssd       ocr1, ocr2, ocr3, ocr4, ocr5. Min = 1

#Group       Attribute             System       Value
ora_grp      State                 vcslx545-vm1 |ONLINE|
ora_grp      State                 vcslx545-vm2 |OFFLINE|

#Resource    Attribute               System                 Value
cssd         Critical                localclus              1
cssd         State                   localclus:vcslx545-vm1 ONLINE
cssd         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr1         Critical                localclus              1
ocr1         State                   localclus:vcslx545-vm1 OFFLINE
ocr1         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr2         Critical                localclus              1
ocr2         State                   localclus:vcslx545-vm1 ONLINE
ocr2         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr3         Critical                localclus              1
ocr3         State                   localclus:vcslx545-vm1 ONLINE
ocr3         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr4         Critical                localclus              1
ocr4         State                   localclus:vcslx545-vm1 ONLINE
ocr4         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr5         Critical                localclus              1
ocr5         State                   localclus:vcslx545-vm1 ONLINE
ocr5         State                   localclus:vcslx545-vm2 OFFLINE

As shown in snippet above, offline was allowed for critical resource while parent was online. Service group is still reported ONLINE. Similarly fault of critical resource(s) is tolerated till minimum criteria is met. 3 Critical resource (ocr2, ocr3, and ocr4) have faulted but parent continues and service group is still reported ONLINE.

#Group       Parent     Child
ora_grp      cssd       ocr1, ocr2, ocr3, ocr4, ocr5. Min = 1

#Group       Attribute             System       Value
ora_grp      State                 vcslx545-vm1 |ONLINE|
ora_grp      State                 vcslx545-vm2 |OFFLINE|

#Resource    Attribute               System                 Value
cssd         Critical                localclus              1
cssd         State                   localclus:vcslx545-vm1 ONLINE
cssd         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr1         Critical                localclus              1
ocr1         State                   localclus:vcslx545-vm1 OFFLINE
ocr1         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr2         Critical                localclus              1
ocr2         State                   localclus:vcslx545-vm1 FAULTED
ocr2         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr3         Critical                localclus              1
ocr3         State                   localclus:vcslx545-vm1 FAULTED
ocr3         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr4         Critical                localclus              1
ocr4         State                   localclus:vcslx545-vm1 FAULTED
ocr4         State                   localclus:vcslx545-vm2 OFFLINE
#
ocr5         Critical                localclus              1
ocr5         State                   localclus:vcslx545-vm1 ONLINE
ocr5         State                   localclus:vcslx545-vm2 OFFLINE

If offlining child resource can cause violation of minimum criteria, then command is rejected by VCS.

# hares -offline ocr5 -sys vcslx545-vm1
VCS WARNING V-16-1-10287 Online resources depend on resource ocr5. Take them offline first

# echo $?
1

If fault of child resource violated minimum criteria, VCS takes corrective action and failovers service group. When last online resource ocr5 faults, ora_grp is failedover to peer system.

#Group       Parent     Child
ora_grp      cssd       ocr1, ocr2, ocr3, ocr4, ocr5. Min = 1

#Group       Attribute             System       Value
ora_grp      State                 vcslx545-vm1 |OFFLINE|FAULTED|
ora_grp      State                 vcslx545-vm2 |ONLINE|

#Resource    Attribute               System                 Value
cssd         Critical                localclus              1
cssd         State                   localclus:vcslx545-vm1 OFFLINE
cssd         State                   localclus:vcslx545-vm2 ONLINE
#
ocr1         Critical                localclus              1
ocr1         State                   localclus:vcslx545-vm1 OFFLINE
ocr1         State                   localclus:vcslx545-vm2 ONLINE
#
ocr2         Critical                localclus              1
ocr2         State                   localclus:vcslx545-vm1 FAULTED
ocr2         State                   localclus:vcslx545-vm2 ONLINE
#
ocr3         Critical                localclus              1
ocr3         State                   localclus:vcslx545-vm1 FAULTED
ocr3         State                   localclus:vcslx545-vm2 ONLINE
#
ocr4         Critical                localclus              1
ocr4         State                   localclus:vcslx545-vm1 FAULTED
ocr4         State                   localclus:vcslx545-vm2 ONLINE
#
ocr5         Critical                localclus              1
ocr5         State                   localclus:vcslx545-vm1 FAULTED
ocr5         State                   localclus:vcslx545-vm2 ONLINE