09-19-2011 10:31 AM
Hello everyone. I have a two node cluster to serve as an Oracle database server. I have the Oracle binaries installed on disks local to each of the nodes (so they are outside the control of the Cluster Manager). I have a diskgroup which is three 1TB LUNs from my SAN, six volumes on the diskgroup (u02 through u07), six mount points (/u02 through /u07), a database listener and the actual Oracle database. I was able to successfully manually bring up these individual components and confirmed that the database was up an running.
I then tried a "Switch To" operation to see if everything would mode to the other node of the cluster. It turns out this was a bad idea. Within the Cluster Manager gui, the diskgroup has a state of Online, Istate of "Waiting to go offline propogate" and Flag of "Unable to offline". The volumes show as "Offline on all systems" but the mounts still show as online with "Status Unknown". When I try to take the mount points offline, I get the message "VCS ERROR V-16-1-10277 The Service Group i1025prd to which Resource Mnt_scratch belongs has failed or switch, online, and offline operations are prohibited."
Can anyone tell me how I can fix this?
Ken
Solved! Go to Solution.
09-20-2011 01:54 AM
Ken,
Sorry I forgot "-sys" of hagrp -flush command, but the error did tell you what was wrong and the correct syntax:
VCS WARNING V-16-1-10691 Must specify system name VCS INFO V-16-1-10601 Usage: hagrp -flush <group> [-force] -sys <system> [-clus <cluster> | -localclus]
Stuff in square brackets is optional so mandatory args are:
hagrp -flush <group> -sys <system>
You can see from main.cf which objects are groups which are ClusterService and i1025prd, so you cannot use Ora_DiskGroup_Data because this, as you said in your email, is a resource, not a group. Flush is stopping VCS taking further action so that stuff that is waiting to offline will not continue once diskgroup resource is deleted
Mike
09-19-2011 10:57 AM
Hi Ken,
Can you provide:
Mike
09-19-2011 12:37 PM
As requested:
1. main.cf
include "OracleASMTypes.cf" include "types.cf" include "Db2udbTypes.cf" include "OracleTypes.cf" include "SybaseTypes.cf" cluster st31bcl01 ( UserNames = { admin = chiAhcHeiDiiGqiChf } ClusterAddress = "192.168.84.216" Administrators = { admin } ) system st31bbl01 ( ) system st31bbl02 ( ) group ClusterService ( SystemList = { st31bbl01 = 0, st31bbl02 = 1 } AutoStartList = { st31bbl01, st31bbl02 } OnlineRetryLimit = 3 OnlineRetryInterval = 120 ) IP webip ( Device = e1000g0 Address = "192.168.84.216" NetMask = "255.255.252.0" ) NIC csgnic ( Device = e1000g0 ) webip requires csgnic // resource dependency tree // // group ClusterService // { // IP webip // { // NIC csgnic // } // } group i1025prd ( SystemList = { st31bbl01 = 0, st31bbl02 = 1 } ) DiskGroup Ora_DiskGroup_Data ( DiskGroup = oradatadg ) Mount Mnt_scratch ( MountPoint = "/scratch" BlockDevice = "/dev/vx/dsk/oradatadg/scratch" FSType = vxfs MountOpt = largefiles FsckOpt = "-y" ) Mount Mnt_u02 ( MountPoint = "/u02" BlockDevice = "/dev/vx/dsk/oradatadg/u02" FSType = vxfs MountOpt = largefiles FsckOpt = "-y" ) Mount Mnt_u03 ( MountPoint = "/u03" BlockDevice = "/dev/vx/dsk/oradatadg/u03" FSType = vxfs MountOpt = largefiles FsckOpt = "-y" ) Mount Mnt_u04 ( MountPoint = "/u04" BlockDevice = "/dev/vx/dsk/oradatadg/u04" FSType = vxfs MountOpt = largefiles FsckOpt = "-y" ) Mount Mnt_u05 ( MountPoint = "/u05" BlockDevice = "/dev/vx/dsk/oradatadg/u05" FSType = vxfs MountOpt = largefiles FsckOpt = "-y" ) Mount Mnt_u06 ( MountPoint = "/u06" BlockDevice = "/dev/vx/dsk/oradatadg/u06" FSType = vxfs MountOpt = largefiles FsckOpt = "-y" ) Mount Mnt_u07 ( MountPoint = "/u07" BlockDevice = "/dev/vx/dsk/oradatadg/u07" FSType = vxfs MountOpt = largefiles FsckOpt = "-y" ) NIC Ora_NIC ( Enabled = 0 ) Netlsnr Ora_Netlsnr ( Owner = oraprod Home = "/u01/app/oraprod/product/10.2.0/db_1" Listener = list1020_db1 ) Oracle i1025prd ( Sid = i1025prd Owner = oraprod Home = "/u01/app/oraprod/product/10.2.0/db_1" Pfile = "spfile=/u01/app/oraprod/admin/i1025prd/pfile/spfilei1025prd.ora" StartUpOpt = STARTUP EnvFile = "/u01/app/oraprod/.profile-veritas" ) Volume Vol_scratch ( Volume = scratch DiskGroup = oradatadg ) Volume Vol_u02 ( Volume = u02 DiskGroup = oradatadg ) Volume Vol_u03 ( Volume = u03 DiskGroup = oradatadg ) Volume Vol_u04 ( Volume = u04 DiskGroup = oradatadg ) Volume Vol_u05 ( Volume = u05 DiskGroup = oradatadg ) Volume Vol_u06 ( Volume = u06 DiskGroup = oradatadg ) Volume Vol_u07 ( Volume = u07 DiskGroup = oradatadg ) Mnt_u02 requires i1025prd Mnt_u03 requires i1025prd Mnt_u04 requires i1025prd Mnt_u05 requires i1025prd Mnt_u06 requires i1025prd Mnt_u07 requires i1025prd Ora_DiskGroup_Data requires Vol_scratch Ora_DiskGroup_Data requires Vol_u02 Ora_DiskGroup_Data requires Vol_u03 Ora_DiskGroup_Data requires Vol_u04 Ora_DiskGroup_Data requires Vol_u05 Ora_DiskGroup_Data requires Vol_u06 Ora_DiskGroup_Data requires Vol_u07 Vol_scratch requires Mnt_scratch Vol_u02 requires Mnt_u02 Vol_u03 requires Mnt_u03 Vol_u04 requires Mnt_u04 Vol_u05 requires Mnt_u05 Vol_u06 requires Mnt_u06 Vol_u07 requires Mnt_u07 // resource dependency tree // // group i1025prd // { // DiskGroup Ora_DiskGroup_Data // { // Volume Vol_u06 // { // Mount Mnt_u06 // { // Oracle i1025prd // } // } // Volume Vol_u05 // { // Mount Mnt_u05 // { // Oracle i1025prd // } // } // Volume Vol_u07 // { // Mount Mnt_u07 // { // Oracle i1025prd // } // } // Volume Vol_u02 // { // Mount Mnt_u02 // { // Oracle i1025prd // } // } // Volume Vol_u04 // { // Mount Mnt_u04 // { // Oracle i1025prd // } // } // Volume Vol_scratch // { // Mount Mnt_scratch // } // Volume Vol_u03 // { // Mount Mnt_u03 // { // Oracle i1025prd // } // } // } // NIC Ora_NIC // Netlsnr Ora_Netlsnr // }
2. extract from engine log starting from when you ran "hagrp -switch"
2011/09/19 10:59:56 VCS INFO V-16-1-50135 User admin fired command: hagrp -switch i1025prd st31bbl01 localclus from ::ffff:192.168.187.77 2011/09/19 10:59:56 VCS NOTICE V-16-1-10208 Initiating switch of group i1025prd from system st31bbl02 to system st31bbl01 2011/09/19 10:59:56 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Ora_DiskGroup_Data (Owner: unknown, Group: i1025prd) on System st31bbl02 2011/09/19 10:59:56 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Ora_Netlsnr (Owner: unknown, Group: i1025prd) on System st31bbl02 2011/09/19 10:59:56 VCS WARNING V-16-10001-1020 (st31bbl02) DiskGroup:Ora_DiskGroup_Data:offline:The command 'vxvol -g oradatadg stopall' failed. Doing a forced stop 2011/09/19 10:59:58 VCS INFO V-16-2-13716 (st31bbl02) Resource(Ora_DiskGroup_Data): Output of the completed operation (offline) ============================================== VxVM vxdg ERROR V-5-1-584 Disk group oradatadg: Some volumes in the disk group are in use ============================================== 2011/09/19 10:59:59 VCS ERROR V-16-2-13064 (st31bbl02) Agent is calling clean for resource(Ora_DiskGroup_Data) because the resource is up even after offline completed. 2011/09/19 11:00:00 VCS WARNING V-16-10001-1071 (st31bbl02) DiskGroup:Ora_DiskGroup_Data:clean:Diskgroup deport returned 31 2011/09/19 11:00:00 VCS INFO V-16-2-13716 (st31bbl02) Resource(Ora_DiskGroup_Data): Output of the completed operation (clean) ============================================== VxVM vxdg ERROR V-5-1-584 Disk group oradatadg: Some volumes in the disk group are in use ============================================== 2011/09/19 11:00:00 VCS ERROR V-16-2-13069 (st31bbl02) Resource(Ora_DiskGroup_Data) - clean failed. 2011/09/19 11:00:01 VCS INFO V-16-20002-40 (st31bbl02) Netlsnr:Ora_Netlsnr:offline:lsnrctl returned the following output +--------------------------------------------------------------------+ LD_LIBRARY_PATH - /usr/lib: LSNRCTL for Solaris: Version 10.2.0.5.0 - Production on 19-SEP-2011 10:59:56 Copyright (c) 1991, 2010, Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=st31bcl01)(PORT=1530)) The command completed successfully +====================================================================+ 2011/09/19 11:00:01 VCS INFO V-16-1-10305 Resource Ora_Netlsnr (Owner: unknown, Group: i1025prd) is offline on st31bbl02 (VCS initiated) 2011/09/19 11:00:04 VCS ERROR V-16-2-13067 (st31bbl02) Agent is calling clean for resource(i1025prd) because the resource became OFFLINE unexpectedly, on its own. 2011/09/19 11:00:05 VCS WARNING V-16-20002-23 (st31bbl02) Oracle:i1025prd:clean:Oracle database i1025prd not running 2011/09/19 11:00:05 VCS ERROR V-16-2-13067 (st31bbl02) Agent is calling clean for resource(Vol_scratch) because the resource became OFFLINE unexpectedly, on its own. 2011/09/19 11:00:05 VCS INFO V-16-2-13068 (st31bbl02) Resource(i1025prd) - clean completed successfully. 2011/09/19 11:00:05 VCS INFO V-16-1-10307 Resource i1025prd (Owner: unknown, Group: i1025prd) is offline on st31bbl02 (Not initiated by VCS) 2011/09/19 11:00:06 VCS INFO V-16-2-13068 (st31bbl02) Resource(Vol_scratch) - clean completed successfully. 2011/09/19 11:00:06 VCS INFO V-16-1-10307 Resource Vol_scratch (Owner: unknown, Group: i1025prd) is offline on st31bbl02 (Not initiated by VCS) 2011/09/19 11:00:08 VCS ERROR V-16-2-13067 (st31bbl02) Agent is calling clean for resource(Vol_u02) because the resource became OFFLINE unexpectedly, on its own. 2011/09/19 11:00:09 VCS INFO V-16-2-13068 (st31bbl02) Resource(Vol_u02) - clean completed successfully. 2011/09/19 11:00:09 VCS INFO V-16-1-10307 Resource Vol_u02 (Owner: unknown, Group: i1025prd) is offline on st31bbl02 (Not initiated by VCS) 2011/09/19 11:00:11 VCS ERROR V-16-2-13067 (st31bbl02) Agent is calling clean for resource(Vol_u03) because the resource became OFFLINE unexpectedly, on its own. 2011/09/19 11:00:12 VCS INFO V-16-2-13068 (st31bbl02) Resource(Vol_u03) - clean completed successfully. 2011/09/19 11:00:12 VCS INFO V-16-1-10307 Resource Vol_u03 (Owner: unknown, Group: i1025prd) is offline on st31bbl02 (Not initiated by VCS) 2011/09/19 11:00:14 VCS ERROR V-16-2-13067 (st31bbl02) Agent is calling clean for resource(Vol_u04) because the resource became OFFLINE unexpectedly, on its own. 2011/09/19 11:00:15 VCS INFO V-16-2-13068 (st31bbl02) Resource(Vol_u04) - clean completed successfully. 2011/09/19 11:00:15 VCS INFO V-16-1-10307 Resource Vol_u04 (Owner: unknown, Group: i1025prd) is offline on st31bbl02 (Not initiated by VCS) 2011/09/19 11:00:18 VCS ERROR V-16-2-13067 (st31bbl02) Agent is calling clean for resource(Vol_u05) because the resource became OFFLINE unexpectedly, on its own. 2011/09/19 11:00:19 VCS INFO V-16-2-13068 (st31bbl02) Resource(Vol_u05) - clean completed successfully. 2011/09/19 11:00:19 VCS INFO V-16-1-10307 Resource Vol_u05 (Owner: unknown, Group: i1025prd) is offline on st31bbl02 (Not initiated by VCS) 2011/09/19 11:00:21 VCS ERROR V-16-2-13067 (st31bbl02) Agent is calling clean for resource(Vol_u06) because the resource became OFFLINE unexpectedly, on its own. 2011/09/19 11:00:22 VCS INFO V-16-2-13068 (st31bbl02) Resource(Vol_u06) - clean completed successfully. 2011/09/19 11:00:22 VCS INFO V-16-1-10307 Resource Vol_u06 (Owner: unknown, Group: i1025prd) is offline on st31bbl02 (Not initiated by VCS) 2011/09/19 11:00:24 VCS ERROR V-16-2-13067 (st31bbl02) Agent is calling clean for resource(Vol_u07) because the resource became OFFLINE unexpectedly, on its own. 2011/09/19 11:00:25 VCS INFO V-16-2-13068 (st31bbl02) Resource(Vol_u07) - clean completed successfully. 2011/09/19 11:00:25 VCS INFO V-16-1-10307 Resource Vol_u07 (Owner: unknown, Group: i1025prd) is offline on st31bbl02 (Not initiated by VCS) 2011/09/19 11:01:01 VCS ERROR V-16-2-13077 (st31bbl02) Agent is unable to offline resource(Ora_DiskGroup_Data). Administrative intervention may be required. 2011/09/19 11:01:02 VCS WARNING V-16-10001-1071 (st31bbl02) DiskGroup:Ora_DiskGroup_Data:clean:Diskgroup deport returned 31 2011/09/19 11:01:02 VCS INFO V-16-2-13716 (st31bbl02) Resource(Ora_DiskGroup_Data): Output of the completed operation (clean) ============================================== VxVM vxdg ERROR V-5-1-584 Disk group oradatadg: Some volumes in the disk group are in use ============================================== 2011/09/19 11:02:03 VCS INFO V-16-1-50135 User admin fired command: hares -offline Mnt_u07 st31bbl02 from ::ffff:192.168.187.77 2011/09/19 11:02:04 VCS WARNING V-16-10001-1071 (st31bbl02) DiskGroup:Ora_DiskGroup_Data:clean:Diskgroup deport returned 31 2011/09/19 11:02:04 VCS INFO V-16-2-13716 (st31bbl02) Resource(Ora_DiskGroup_Data): Output of the completed operation (clean) ============================================== VxVM vxdg ERROR V-5-1-584 Disk group oradatadg: Some volumes in the disk group are in use ============================================== 2011/09/19 11:03:06 VCS WARNING V-16-10001-1071 (st31bbl02) DiskGroup:Ora_DiskGroup_Data:clean:Diskgroup deport returned 31 2011/09/19 11:03:06 VCS INFO V-16-2-13716 (st31bbl02) Resource(Ora_DiskGroup_Data): Output of the completed operation (clean) ============================================== VxVM vxdg ERROR V-5-1-584 Disk group oradatadg: Some volumes in the disk group are in use
3. Output from "hastatus -sum" after failed switch.
-- SYSTEM STATE -- System State Frozen A st31bbl01 RUNNING 0 A st31bbl02 RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B ClusterService st31bbl01 Y N OFFLINE B ClusterService st31bbl02 Y N ONLINE B i1025prd st31bbl01 Y N OFFLINE B i1025prd st31bbl02 Y N STOPPING|PARTIAL -- RESOURCES NOT PROBED -- Group Type Resource System E i1025prd NIC Ora_NIC st31bbl01 E i1025prd NIC Ora_NIC st31bbl02 -- RESOURCES OFFLINING -- Group Type Resource System IState G i1025prd DiskGroup Ora_DiskGroup_Data st31bbl02 W_OFFLINE_PROPAGATE
09-19-2011 12:51 PM
Ken,
Your resource dependencies are the wrong way - you have
They should be the reverse way round:
i.e Oracle DB requires filesytems to mounted and mounts require diskgroup to be imported.
From the GUI, the diskgroup should be at the bottom and Oracle at the top.
As the dependencies are wrong, VCS tries to take the diskgroup offline first and it can't because filesystems are still mounted.
Mike
09-19-2011 01:55 PM
OK, so I've unlinked everything but still can't take the mount points off line. Suggestions?
09-19-2011 02:06 PM
Flush service group (hagrp -flush i1025prd) and then delete diskgroup resource. You should offline Oracle resource first and then offline mounts, but if you are unable to do this, then Stop Oracle and umount mounts manually.
Once service group is in a clean state ("hastatus -sum" does not show any resources offlining or onlining), then you can recreate your diskgroup resources and link all resources correctly if you haven't already.
After that you should be able to sucessfully switch service group across sites.
Mike
09-19-2011 02:22 PM
"hagrp -flush i1025prd" return an error:
[st31bbl0 2] / > hagrp -flush i1025prd VCS WARNING V-16-1-10691 Must specify system name VCS INFO V-16-1-10601 Usage: . . . hagrp -flush <group> [-force] -sys <system> [-clus <cluster> | -localclus] . . .
I also tried using the name of the disk group resource "hagrp -flush Ora_DiskGroup_Data" but got the same error.
09-19-2011 04:21 PM
hagrp -flush i1025prd -sys st31bbl02
Also make sure that nothing is mounted on those mount points (df -k).
09-20-2011 01:54 AM
Ken,
Sorry I forgot "-sys" of hagrp -flush command, but the error did tell you what was wrong and the correct syntax:
VCS WARNING V-16-1-10691 Must specify system name VCS INFO V-16-1-10601 Usage: hagrp -flush <group> [-force] -sys <system> [-clus <cluster> | -localclus]
Stuff in square brackets is optional so mandatory args are:
hagrp -flush <group> -sys <system>
You can see from main.cf which objects are groups which are ClusterService and i1025prd, so you cannot use Ora_DiskGroup_Data because this, as you said in your email, is a resource, not a group. Flush is stopping VCS taking further action so that stuff that is waiting to offline will not continue once diskgroup resource is deleted
Mike
09-20-2011 03:38 AM
In addition to all the excellent advice that Mike has provided thus far, please have a look at the sample configuration in the VCS for Oracle manual:
https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/vcs_oracle_agent_51sp1_sol.pdf
Sample dependency diagram for "single Oracle instance configuration" on p. 134.
Sample main.cf starts on p.135.
09-21-2011 09:49 AM
I was wary about deleting and re-adding the diskgroup so I did a "hastop -all" followed by a "hastart" on both nodes. Unfortunately things didn't restart cleanly and I saw the following:
[st31bbl01] / > hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A st31bbl01 ADMIN_WAIT 0
A st31bbl02 LEAVING 0
I ended up opening a ticket with Symantec support and they had me run "hasys -force st31bbl01" which seems to have fixed the problem: "hastatus -sum" reports both nodes as running and I am able once again to connect with the gui.
Thanks for your help everyone.
09-21-2011 10:00 AM
So now that your dependencies are the right way round, can you switch service group across systems now?
Mike
09-21-2011 11:53 AM
Unfortunately no, it is still not working.
I had my test database configured on node 1 with a LOCAL_LISTENER initialization parameter that referenced node 1. When I tried to switch to node 2, the listener wouldn't start. I tried changing LOCAL_LISTENER to reference the name of the cluster but now the database won't start on either node. Database startup fails with the following error:
ORA-00119: invalid specification for system parameter LOCAL_LISTENER ORA-00130: invalid listener address '(ADDRESS=(PROTOCOL=TCP)(HOST=st31bcl02.apacorp.net)(PORT=1530))'
Any suggestions?
09-21-2011 12:06 PM
Ken,
Your problem is that you have not configured a IP resource in you Oracle service group (i1025prd). The IP in your ClusterService group is for cluster management purposes and moves independently of the Oracle service group - this IP is optional, but the IP in the Oracle service group is mandatory.
From memory the only file you need to change in Oracle is tnsnames.ora so that the Host entry is the virtual IP (in the Oracle service group) or the virtual hostname if the virtual IP is in DNS or the local hosts file.
Mike
09-21-2011 12:13 PM
Double-check $TNS_ADMIN/listener.ora : Did you remember to add the virtual hostname or IP address in the Host field?
Was your resource dependencies changed so that Listener will only be started once the Virtual IP is up?
Have a look at the section called "Transparent listener failover" in the manual that I've mentioned above.
There is also an example a couple of pages on:
3 Configure the Oracle file listener.ora as per VCS requirements. The changes
required in the file depends on your Oracle configuration.
In the file listener.ora located at $TNS_ADMIN, edit the "Host=" line in the
ADDRESS_LIST section and add the name of the high availability address for
the service group, in this case, oraprod.
LISTENER_PROD =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = oraprod)(PORT = 1521))
)
)
09-21-2011 12:57 PM
I added the resource Ora_IP back in and gave it the IP address of the cluster. This resource had previously been deleted at the recommendation of the technical contact at the vendor where we bought our licenses (who, as you can guess, really does not understand how this system is supposed to work). At the same time, I'm embarrassed to say I noticed a typo in my tnsnames.ora file where the cluster name was misspelled. Between these two corrections, I am able to get the database and listener started on node 2 of my cluster.
If I want to switch my database to run on node 1, should it really be as easy as right-clicking on the database resource group and selecting "switch to st31bbl01"?
Ken
09-21-2011 01:03 PM
Yes, it's that easy or you can use "hagrp -switch i1025prd -to st31bbl01"
Mike
10-26-2011 01:15 PM
Hi Ken
It's a month later...
Have you been able to successfully switch the SG?
Please mark Mike's post that pointed you in the right direction as Solution.