Service group

68 Topics

Need way to create VCS group attributes
We have long been able to create new VCS resource attributes using hatype and haattr commands. We would like to be able to create new group attributes. In our particular use case, we would like to create a temporary group attribute called SleepMonitorthat someone with operator privileges can update. (similar to the TFrozen group attribute) We would set SleepMonitor to an integer time() value that specifies when our custom monitoring script will stop ignoring the state of a service group and alert the GroupOwner when something is wrong. We want this attribute to be temporary so that it doesn't clutter up main.cf or require main.cf to be updated when a user wants to sleep monitoring. The group attribute UserIntGlobal could work if it wasn't a permanent attribute requiring the VCS configuration to be open to modify it using administrator privileges. From time to time we come up with other ideas of things we could do that require new group attributes to be created. Having the ability to create new group attributes would be phenomenal. -Seann
Solved
S_Herdejurgen
9 years ago Place Cluster Server
1.4KViews
1like
1Comment
Trigger after failed cleanup script
Hi there, I have a system where the cleanup script can fail/timeout and I want to execute another script if this happens. And I was wondering which can be the best way of doing this. In the veritas cluster server administrators guide for Linux I found the trigger RESNOTOFF. From the documentation it is my understanding that this trigger will be triggered in the following cases: A resource fails going offline (started by VCS) and the clean up fails. A resource goes offline unexpectedly and the clean up fails. I have tested this and the RESNOTOFF is working in the first scenario but not in the second. For testing the second scenario I kill the service and I can see the following message in the engine_A.log: VCS ERROR V-16-2-13067 (node1) Agent is calling clean for resource(service1) because the resource became OFFLINE unexpectedly, on its own. When the cleanup fails I would expect the resource to became UNABLE TO OFFLINE. However, the status of the resource is still ONLINE: # hares -state service1 #Resource Attribute System Value service1 State node1 ONLINE service1 State node2 OFFLINE So the resource is ONLINE and VCS keeps running the cleanup command indefinitely (which is failing). I was wondering if I need to configure something else to make the RESNOTOFF to work in this particular scenario. Thanks,
Solved
javierrv
9 years ago Place Cluster Server
936Views
0likes
3Comments
RESNOTOFF not triggered.
Hi, I am following the veritas cluster server administrators guide for linux and trying to trigger the resnotoff script. From the documentation it is my understanding that is a resource faults and the clean command returns 1, resnotoff should be triggered. To begin my service group is in an ONLINE state: [root@node1 ~]# hastatus -sum | grep test B Grp_CS_c1_testservice node1 Y N ONLINE B Grp_CS_c1_testservice node2 Y N ONLINE I have the clean limit set to 1 and the clean script set to /bin/false to force this to return an error exit code. Res_App_c1_fmmed1_testapplication ArgListValues node1 User 1 root StartProgram 1 "/usr/share/litp/vcs _lsb_start vmservice 5" StopProgram 1 "/usr/share/litp/vcs_lsb_stop vmservice 5" CleanProgram 1 /bin/false M onitorProgram 1 "/usr/share/litp/vcs_lsb_status vmservice" PidFiles 0 MonitorProcesses 0 EnvF ile 1 "" UseSUDash 1 0 State 1 2 IState 1 0 Res_App_c1_fmmed1_testapplication ArgListValues node2 User 1 root StartProgram 1 "/usr/share/litp/vcs _lsb_start vmservice 5" StopProgram 1 "/usr/share/litp/vcs_lsb_stop vmservice 5" CleanProgram 1 /bin/false M onitorProgram 1 "/usr/share/litp/vcs_lsb_status vmservice" PidFiles 0 MonitorProcesses 0 EnvF ile 1 "" UseSUDash 1 0 State 1 2 IState 1 0 Res_App_c1_fmmed1_testapplication CleanProgram global /bin/false Res_App_c1_fmmed1_testapplication CleanRetryLimit global 1 The resnotoff is enables for this resource Res_App_c1_fmmed1_testapplication TriggersEnabled global RESNOTOFF Now I manually kill the service Grp_CS_c1_testservice on node 1 and see the following in the /var/log/messages Jun 16 17:02:33 node1 AgentFramework[10323]: VCS ERROR V-16-2-13067 Thread(4147325808) Agent is calling clean for resource(Res_App_c 1_fmmed1_testapplication) because the resource became OFFLINE unexpectedly, on its own. Jun 16 17:02:33 node1 Had[9975]: VCS ERROR V-16-2-13067 (node1) Agent is calling clean for resource(Res_App_c1_fmmed1_testapplicatio n) because the resource became OFFLINE unexpectedly, on its own. Jun 16 17:02:34 node1 AgentFramework[10323]: VCS ERROR V-16-2-13069 Thread(4147325808) Resource(Res_App_c1_fmmed1_testapplication) - clean failed. and in the engine_A.log 2015/06/16 17:02:33 VCS ERROR V-16-2-13067 (node1) Agent is calling clean for resource(Res_App_c1_fmmed1_testapplication) because the resourc e became OFFLINE unexpectedly, on its own. 2015/06/16 17:02:34 VCS INFO V-16-10031-504 (node1) Application:Res_App_c1_fmmed1_testapplication:clean:Executed /bin/false as user root 2015/06/16 17:02:35 VCS ERROR V-16-2-13069 (node1) Resource(Res_App_c1_fmmed1_testapplication) - clean failed. 2015/06/16 17:03:35 VCS ERROR V-16-1-50148 ADMIN_WAIT flag set for resource Res_App_c1_fmmed1_testapplication on system node1 with the reason 4 2015/06/16 17:03:35 VCS INFO V-16-10031-504 (node1) Application:Res_App_c1_fmmed1_testapplication:clean:Executed /bin/false as user root From my understanding of the VCS adminisrator guide section titles 'VCS behavior when an online resource faults' the resnotoff should be triggered however it is not and the resource goes to an ADMIN WAIT state. group resource system message --------------- -------------------- --------------- -------------------- Res_App_c1_fmmed1_testapplication node1 |ADMIN WAIT| Is it possible to get the resnotoff triggered for a cluster in this state or do I need to use the resadminwait trigger (contrary to the documentation). Thanks,
Solved
justinfay
9 years ago Place Cluster Server
1.2KViews
0likes
3Comments
Integrating SAP with VCS 6.2 (on Oracle Linux 6.5)
Hi, I was wondering if someone has some additional information regarding how to setup my cluster... I have both VCS (inclusing Storage Foundation) and Linux knowledge. I do however have no background in SAP. And as SAP is a very complex product, I can not see the forest because of the trees... Setup 2 node (active-passive) cluster of Oracle Linux 6.5 nodes. Veritas Storage Foundation HA (= VxVM + DMP + VCS). Oracle 11.2 as database. SAP ECC 6.0 Apart from the Installation & Configuration guide on the SAP NetWeaver Agent, I found little information about implementing SAP in VCS. Source: "Symantec™ High Availability Agent for SAP NetWeaver Installation and Configuration Guide for Linux 6.2". But unfortunately I can not find a howto, guide or whatever from Symantec, nor from the usual Google attempts. My customer is however also not very SAP knowledged. From what I understand it is a very basic SAP setup, if not the simplest. They are using SAP ECC6.0 and an Oracle 11.2 database. So I assume they are just having a Central Instance and the Database. After some Google resource, I found out that SAP ECC 6.0 is technically a SAP NetWeaver 7.0. On Symantec SORT, I found 3 versions of SAP NetWeaver. I downloaded the first one, as the descripton says: SAP NetWeaver SAP NetWeaver 7.1, 7.2, 7.3, 7.4 SAP NetWeaver 7.1, 7.3 Agent: SAPNW04 5.0.16.0 Application version(s): SAP R/3 4.6, R/3 Enterprise 4.7, NW04, NW04s, ERP/ECC 5.0/6.0, SCM/APO 4.1/5.0/5.1/7.0, SRM 4.0/5.0/7.0, CRM 4.0/5.0/7.0 Source: https://sort.symantec.com/agents/detail/1077 SAP ERP 2005 = SAP NetWeaver 2004s (BASIS 7.00) = ECC 6.0 Source: http://itknowledgeexchange.techtarget.com/itanswers/difference-bet-ecc-60-sap-r3-47/ Source: http://www.fasttrackph.com/sap-ecc-6-0/ Source : http://wulibi.blogspot.be/2010/03/what-is-sap-ecc-60-in-brief.html Currently I have this setup unfinshed: Installed & configured Storage Foundation HA on both nodes. Instaled the ACC Libraries on both nodes. see: https://sort.symantec.com/agents/detail/1183 Installed the SAP NetWeaver Agent on both nodes. see: https://sort.symantec.com/agents/detail/1077 Configured next to the CusterServiceGroup, 3 Service Groups: SG_sap the shared storage Resources: DiskGroup + Volumes + Mount. the SAPNW Agent Resource. SG_oracle the shared storage Resources: DiskGroup + Volumes + Mount the Oracle Agent Resurce. SG_nfs still empty. SAPNW Agent. SAP instance type The SAPNW Agent documentation states: The agent supports the following SAP instance types: Central Services Instance Application Server Instance Enqueue Replication Server Instance. Source: "Symantec™ High Availability Agent for SAP NetWeaver Installation and Configuration Guide for Linux 6.2" But I guess the SAP ECC 6.0 has them all in one central instance, right? So I only need one SAPNW Agent. How is the SAP installed: only ABAP only Java add-in (both ABAP and Java). Source: "Symantec™ High Availability Agent for SAP NetWeaver Installation and Configuration Guide for Linux 6.2" I have no idea. How can I find this out? InstName Attribute Another thing is the InstName Attribute. This also does not correspond with the information I have. My SAP intance is T30. So the syntax is correct more or less, but it isn't listed below. Which is important also to decide on the value for the ProcMon Attribute The SAPSID and InstName form a unique identifier that can identify the processes running for a particular instance. Some examples of SAP instances are given as follows: InstName = InstType DVEBMGS00 = SAP Application Server - ABAP (Primary) D01 SAP = Application Server - ABAP (Additional) ASCS02 = SAP Central Services - ABAP J03 = SAP Application Server - Java SCS04 = SAP Central Services - Java ERS05 = SAP Enqueue Replication Server SMDA97 = Solution Manager Diagnostics Agent Source: "Symantec™ High Availability Agent for SAP NetWeaver Installation and Configuration Guide for Linux 6.2" In the listing of the required attributes it is also stated. However, the default value is CENTRAL. I guess this is correct in my case? InstName Attribute: An identifier that classifies and describes the SAP server instance type. Valid values are: APPSERV: SAP Application Server ENQUEUE: SAP Central Services ENQREP: Enqueue Replication Server SMDAGENT: Solution Manager Diagnostics Agent SAPSTARTSRV: SAPSTARTSRV Process Note: The value of this attribute is not case-sensitive. Type and dimension: string-scalar Default: APPSERV Example: ENQUEUE EnqSrvResName Attribute A required attribute is the EnqSrvResName Attribute. The documentation says this should be the Resource Name for the SAP Central Instance. But I am assuming I only have a SAP Central Instance. So I guess I should use the name of my SAP Agent Resouce from my SAP Service Group? EnqSrvResName Attribute: The name of the VCS resource for SAP Central Services (A)SCS Instance. This attribute is used by Enqueue and Enqueue Replication Server. Using this attribute the Enqueue server queries the Enqueue Replication Server resource state while determining the fail over target and vice a versa. Type and dimension: string-scalar Default: No default value Example: SAP71-PI1SCS_sap Source: "Symantec™ High Availability Agent for SAP NetWeaver Installation and Configuration Guide for Linux 6.2" Is anyone able to help me out? Thanks in advance.
Solved
sanderfiers
10 years ago Place Cluster Server
2.4KViews
2likes
9Comments
VCS - Resource to mount CIFS shares
Hi, I want to manage a CIFS mount within a Service Group and cannot find the appropriate resource. I just one to mount a CIFS share in my VCS cluster, acting as a client. Can anyone let me know which Resource type should I use? Thanks & Regards, JL
Solved
Jose_Luis_B
10 years ago Place Cluster Server
2.6KViews
2likes
8Comments
VCS clustered Enterprise vault migrate index service to new node
Hi, We have three node cluster for Enterprise vault in Veritas Cluster server, the node details are as follows: Node 1: EV server with all services Node 2: SQL server Node 3: Spare server for EV service failover and SQl service failover. Also GCO and VVR is configured for SQl server and EV server volumes are replicating using VVR but are not in VCS control. Now we want to remove index service from EV server (node 1) and add it to new server, also add new server to the cluster. Can anybody share the steps we should follow from VCS end to perform above changes.
r1_abhinav
10 years ago Place Cluster Server
1.1KViews
0likes
5Comments
Listener resource remain faulted
Hello, we are doing some failure tests for a customer. We have VCS 6.2 running on solaris 10. We have an Oracle database and of course the listener associated with it. We try to simulate different kind of failures. One of them is to kill the listener. In this situation the cluster observes that the listener has died, and it fails over the service to the other node. BUT the listener resource will remain in FAULTED state on the original node, and the group to which belongs will be in OFFLINE FAULTED state. In this situation if something goes wrong on the second node the service will not fail back to the original one until we manually run hagrp -clear. Is there anything we can do to fix this? (to have the clear done automatically) Here are some lines from the log: 2015/03/30 17:26:10 VCS ERROR V-16-2-13067 (node2p) Agent is calling clean for resource(ora_listener-res) because the resource became OFFLINE unexpectedly, on its own. 2015/03/30 17:26:11 VCS INFO V-16-2-13068 (node2p) Resource(ora_listener-res) - clean completed successfully. 2015/03/30 17:26:11 VCS INFO V-16-1-10307 Resource ora_listener-res (Owner: Unspecified, Group: oracle_rg) is offline on node2p (Not initiated by VCS) in these it says that clean for the resource has completed successfully, but the resource is still faulted. but if I run hares -clear manually, the the fault goes away. 20150330-173628:root@node1p:~# hares -state ora_listener-res #Resource Attribute System Value ora_listener-res State node1p ONLINE ora_listener-res State node2p FAULTED 20150330-173636:root@node1p:~# hares -clear ora_listener-res 20150330-173653:root@node1p:~# hares -state ora_listener-res #Resource Attribute System Value ora_listener-res State node1p ONLINE ora_listener-res State node2p OFFLINE 20150330-173655:root@node1p:~#
Solved
Laszlo_Budai
10 years ago Place Cluster Server
3.4KViews
0likes
5Comments
Veritas Cluster Server, Resource Application failed to start
Hello, on 2 servers OSLinux Red Hat 6.3, I've got a VRTS 6.0 For on Application, when I put it offline and then online on the same server (with all other resources online on thisserver) it start well. But when I test a 'switchto' ofall the service group on the other node it doesn't start properly (and all the other resources started ) The application is link with 3 Mount ressources and 1 IP ressource weset attributescritical to false, and we set UseSUDash to true. The StartProgram script is supposed to start several processes, with a offline, online action all the processes are started, with a 'switch to' action, only half of them are started. No interesting log in the Application side. Any suggestion to debug will be appreciated.
cedric_tours
10 years ago Place Cluster Server
2.4KViews
0likes
7Comments
Unable to bring the Service Group online.
Hi All, I tried to bring a SG online in a node but it's not comming online. Let me explaing the issue. We did reboot of a node aixprd001 and we found that /etc/filesystem is corrupted so the SG bosinit_SG is in partial state since lot of cluster FS in not mounted. Then we corrected the entry and done the manual mout of all the FS but the SG still show the status partial so we did the bellow command. hagrp -clear bosinit_SG -all Once done the SG is in online state. For safer side we tried to offline the SG and brought it up online again but the SG failed to come online, Bellow is the only error we able find the engine_A.log file. 2014/12/17 06:49:04 VCS NOTICE V-16-1-10166 Initiating manual online of group bosinit_SG on system aixprd001 2014/12/17 06:49:04 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group bosinit_SG on all nodes Please help me by providing suggestion, I will provide the output of logs if needed. Thanks, Rufus
Solved
prabindr
10 years ago Place Cluster Server
2KViews
0likes
4Comments
Volume is mounted on two split-brain nodes in VCS 6.0
I have built a three-node cluster using vcs 6.0 in sles 11 sp1. Here is the configuration: main.cf: include "OracleASMTypes.cf" include "types.cf" include "Db2udbTypes.cf" include "OracleTypes.cf" include "SybaseTypes.cf" cluster vcscluster ( ClusterAddress = "192.168.4.10" SecureClus = 1 UseFence = SCSI3 ) system vcs1 ( ) system vcs2 ( ) system vcs3 ( ) group ClusterService ( SystemList = { vcs1 = 0, vcs2 = 1, vcs3 = 2 } AutoStartList = { vcs1, vcs2, vcs3 } OnlineRetryLimit = 3 OnlineRetryInterval = 120 ) IP webip ( Device = eth0 Address = "192.168.4.10" NetMask = "255.255.255.0" ) NIC csgnic ( Device = eth0 ) webip requires csgnic // resource dependency tree // // group ClusterService // { // IP webip // { // NIC csgnic // } // } group apache ( SystemList = { vcs1 = 0, vcs2 = 1, vcs3 = 2 } AutoStartList = { vcs1 } ) DiskGroup share_dg ( DiskGroup = share_dg ) Mount apache_fs ( MountPoint = "/srv/www/htdocs" BlockDevice = "/dev/vx/dsk/share_dg/apache" FSType = vxfs FsckOpt = "-y" ) apache_fs requires share_dg // resource dependency tree // // group apache // { // Mount apache_fs // { // DiskGroup share_dg // } // } # lltstat -l LLT link information: link 0 eth1 on ether hipri mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6 txpkts 129728 txbytes 14155153 rxpkts 119866 rxbytes 7909769 latehb 0 badcksum 0 errors 0 link 1 eth2 on ether lowpri mtu 1500, sap 0xcafe, broadcast FF:FF:FF:FF:FF:FF, addrlen 6 txpkts 49369 txbytes 2400217 rxpkts 50476 rxbytes 2480391 latehb 0 badcksum 0 errors 0 # vxfenconfig -l I/O Fencing Configuration Information: ====================================== Single Disk Flag : 0 Count : 3 Disk List Disk Name Major Minor Serial Number Policy /dev/vx/rdmp/disk_2s3 201 67 7ae525da dmp /dev/vx/rdmp/disk_1s3 201 51 27cddc71 dmp /dev/vx/rdmp/disk_0s3 201 35 132a74e8 dmp # vxfenadm -d I/O Fencing Cluster Information: ================================ Fencing Protocol Version: 201 Fencing Mode: SCSI3 Fencing SCSI3 Disk Policy: dmp Cluster Members: * 0 (vcs1) 1 (vcs2) 2 (vcs3) RFSM State Information: node 0 in state 8 (running) node 1 in state 8 (running) node 2 in state 8 (running) # hastatus -summary -- SYSTEM STATE -- System State Frozen A vcs1 RUNNING 0 A vcs2 RUNNING 0 A vcs3 RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B ClusterService vcs1 Y N ONLINE B ClusterService vcs2 Y N OFFLINE B ClusterService vcs3 Y N OFFLINE B apache vcs1 Y N OFFLINE B apache vcs2 Y N OFFLINE B apache vcs3 Y N ONLINE #df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 7218432 4356176 2495576 64% / devtmpfs 995788 212 995576 1% /dev tmpfs 995788 0 995788 0% /dev/shm tmpfs 4 0 4 0% /dev/vx /dev/vx/dsk/share_dg/apache 512000 3285 476928 1% /srv/www/htdocs When I disconnect the net link of eth1/eth2 in VCS3. Apache is brought up in vcs1. But when I check vcs3, the mount point still exists . And after several minutes, a kernel panic occur in vcs3. I think it is very dangerous thatavolume is mounted on two split-brain nodes . How can I prevent this happens?
Jamesb_china
10 years ago Place Cluster Server
492Views
0likes
0Comments