08-15-2011 02:09 PM
I am trying to automate scripts that we run across multiple clusters during a Disaster Recovery scenario. We have an HTC (Hitachi True Copy) resource on the database cluster that makes the local storage a P-VOL before importing the database disk group. This includes database storage on the local cluster as well as application storage located on another CFS cluster. We currently run one script to failover the database to the DR site. Once the storage is failed over, we run another script on the CFS cluster to unfreeze (persistent) the service groups we need to run, 'vxdg -Cs import' the shared disk group, and run fsck on the shared file systems we are importing, then we start the shared mount points that have CVMVolDg / CFSMount resources.
I am looking for a way to tie these two scripts together, but we have security restrictions in our environment such that there is no root-to-root communication between the clusters. TCP port 14141 is enabled between the clusters.
Does anyone have any suggestions for kicking off the script on the second CFS cluster? I have been able to kick off a trigger from the database cluster to the application cluster, and was thinking about invoking a preonline trigger to import the shared disk group and run fsck. I was also thinking of invoking a postoffline trigger to deport the shared disk group, but the postoffline trigger only accepts two arguments, <system> and <group>. One issue with triggers is I need to make sure they only run on one node. I would do this using:
export VCS_HOST=cluster-vip
halogin admin
hatrigger -preonline 0 <CVM master> <group> IMPORT
The preonline script would then check the 4th argument and see that it is IMPORT and run the 'vxdg -Cs import' and 'fsck' commands.
I guess another way to do this would be to set UserIntGlobal to 1 for the service group which the preonline and postoffline triggers can check for and import/export the shared disk group, but then I end up with trying to make sure that only one system (the CVM master) runs the import command and only one system runs the export command after all other CFS service groups are offline. In the case of these clusters, the CFS mount points will not necessary all be mounted on the CVM master, and the CVM master won't always be the last node to offline the shared mount point.
Does anyone have any suggestions?
08-15-2011 03:23 PM
This should all be handled by the HTC agent - you need to perform the following steps from the HTC agent guide:
haconf -makerw hatype -modify CVMVolDg SupportedActions import deport vxdctlenable haconf -dump -makero
08-16-2011 02:03 PM
The database cluster is a GCO cluster that uses the HTC agent, but the application cluster does not run the HTC agent. The application disks are replicated using the HTC resource on the database cluster. I saw the mention of vxdctlenable in the HTC agent guide, but I could not find any additional information on what it actually does, so we didn't enable it. Do you have any pointers to additional documentation about adding the vxdctlenable action to the SupportedActions attribute? Is there any risk to adding vxdctlenable to the SupportedActions attribute at the primary site?
08-16-2011 08:51 PM
Sean,
If I am understanding your post correctly, are you managing the HORCM/TC pairs for more than one node from a single host? i.e. Does your horcm##.conf on your Database cluster contains entries for LDev's that are from both the DB and application/CFS cluster? (DB Service Groups are global and the application Service Groups are local)
Ideally each Service Group would be configured as global and the would "Switchover" independently of one another with separate HORCM configurations. However, if that is not possible, the coordination between the two clusters would be best served by the remote group agent. In theory you should be able to have the Application Service Group contain a remote SG resource that would verify that your Database and HTC pair status are available before coming online. You would then have to create the same configuration (resource dependency) at your remote site as there would be no concurrency violation protection between the application Service Groups at either site (due to that fact they are Local and not Global).
The all "global" route however is much more ideal.
Joe D
08-17-2011 02:41 AM
Sean,
If Joe's understanding is correct, then you would be better to have DB and App using different HORCM pairs as Joe says, and if this is not possible then you may be able to use RemoteGroup agent, also as Joe says. However you still need to address import and deport. The way this normally works is that when you online a HTC resource in a CVM cluster, the HTC agent calls the import action to import the diskgroup.
This should work if you put HTC agent in app cluster - if you do this then HTC agent should be able to tell that horcm has already done takeover and cvm dg is still imported - see extract below from HTC agent online script
if ($ret == 0) { VCSAG_LOG_MSG("N", "devices in group $groupname are all read/write enabled; no action is required", 18, "$groupname"); $res->create_lockfile(); $res->cvm_import(); exit(0);
Mike
08-17-2011 12:03 PM
Joe & Mike, I like the ideas you are coming up with, especially the 'hares -action' idea. Here's a little more background. We are in a very secure environment, so there is no hardcoding of admin or operator passwords allowed. We will probably never allow an automatic failover between data centers. Currently we have a three tier architecture (web->app->db), but only the app and db tiers are clustered using VCS. In the future we will also cluster the web tier. We support multiple business applications which are composed of infrastructure components located in each of these tiers. For example, application A has components on the dbcluster, appcluster1 and appcluster 2, while application B has components on the dbcluster, appcluster1 and appcluster3. Replication for all of application A's storage is controlled using a single HORCM consistency group. Replication needs to be handled as a single consistency group and not multiple HORCM groups.
I looked into enabling the import, deport and vxdctlenable actions as part of the CVMVolDg agent. These actions are only defined if you have the HTC agent installed. Since we need to have a HA/DR license to use the HTC agent, it isn't cost effective for us to upgrade to SFCFS HA/DR on two dozen boxes. This would cost several hundred thousand dollars in software licensing to replace running a single script by hand. So, this isn't a viable solution for us.
Since DR failovers are controlled by hand, I envision the solution to be a VCS administrator logging into the database cluster to initiate a failover. This script would then offline service groups on appcluster1,2,3, deport shared disk groups and then failover the database to the remote data center using GCO. Next, the script would import disk groups and online service groups on appcluster1,2,3. The script would prompt for a VCS administrator password, so passwords won't be stored in a script.
I am going to do some research with the 'hares -action' command to see if I can make it do what we need it to do.
08-17-2011 03:12 PM
With regards to security, the passwords are encrypted in .vcspwd file and this file would be owned by root and you could change it as often as you want (if you have to change passwords every month), so security wise this really is no different to encrypted passwords in main.cf. If you are really serious about security then you should use secure cluster if you are not already. If you have not set-up a halogin session, then you will prompted for a password if you use ha commands in your script - if you are going to do this then I would use a halogin command, run your ha command and then run a halogin -endsession.
I wouldn't have thought you would need a DR license to use action scripts as these are actions to the CVMVolDg agent, even though the scripts get installed with HTCAgent, so I would just copy action scripts from DB cluster, so that App cluster just contains CVMVolDg actions and not HTC agent. Would also be useful to copy cvm_import and cvm_deport functions, but I don't know where these are located.
Note if you use hares -action, then using actionargs can be useful too.
Mike
08-17-2011 09:35 PM
Secure Cluster doesn't scale for us. Our largest cluster is a 13-node CFS cluster with 40 users and 160 service groups (and growing). Because Secure Cluster defines access using (node,user,group), that gives us 83,200 possible security identifiers to have to manage. In reality, we would probably need to define about 2000 security identifiers for this cluster, but even that number is unmanageable.
We currently use halogin for accessing VCS from non-root accounts.
I did some testing using 'hares -action' and it looks very promising. It's not interactive, but it solves the basic problem of running a script as root on a remote cluster. Now I have some scripting to do. Thanks Mike for the idea of using agent action scripts.
08-18-2011 02:23 PM
Seann,
Ultimately you may want to consider Virtual Business Services with VERITAS Operations Manager as a means to control all of these functions. VOM will create a centralized authentication point as well as orchestrate all the tiered actions your are considering. VOM 4.0 should support all of what you are looking to achieve.
Joe D