08-24-2011 06:44 PM
I am writing to ask for help regarding Storage Foundation HA 5.1 with Oracle 11g failover support. We have two-nodes Sun Cluster running on M3000 also connect to Netapp LUN Sharing through FC. DB is designed running on the solaris container.
Currently we have finished the container failover testing, Oracle listener failover testing, all of them get passed. Also the VCS Oracle Agent has been installed.
My Question is why the Oracle 11g failover testing is failed(Even we got the listener failover testing passed)? We can not switch this resource from one node to another also we can not bring online the database with using Cluster Manager (Java Console). DB could be started manually in sqlplus.
Looking forward for your reply. Thanks a lot.
Leo
Solved! Go to Solution.
09-01-2011 01:22 AM
Hi, All, thanks for your great help. Actually we need install SF5.1 SP1 to support the Oracle 11g DB because its Oracle 11gR2.
Thanks again for everyone's great help and I would close this ticket now.
08-24-2011 11:45 PM
Engine_A.log or Oracle_A.log should tell you why Oracle fails to start (in /var/VTRSvcs/log) - if you can't find issue from logs, then please post main.cf
Mike
08-25-2011 12:02 AM
Here comes Oracle_A.log
2011/08/24 18:48:45 VCS ERROR V-16-2-13066 Thread(3) Agent is calling clean for resource(testzone_Oracle_testdb) because the resource is not up even after online completed.
2011/08/24 18:48:46 VCS ERROR V-16-2-13069 Thread(3) Resource(testzone_Oracle_testdb) - clean failed.
Engine_A.log:
==============================================
VCS WARNING V-16-1-52529 Login Incorrect, Invalid username/password
==============================================
2011/08/24 18:52:19 VCS INFO V-16-1-10298 Resource testzone_testzonedb_listener (Owner: unknown, Group: testzone_SG) is online on node1 (VCS initiated)
2011/08/24 18:52:19 VCS NOTICE V-16-1-10447 Group testzone_SG is online on system node1
2011/08/24 18:59:18 VCS NOTICE V-16-1-10016 Agent /opt/VRTSagents/ha/bin/Oracle/OracleAgent for resource type Oracle successfully started at Wed Aug 24 18:59:18 2011
2011/08/24 18:59:23 VCS INFO V-16-1-10304 Resource testzone_oracle_testdb (Owner: unknown, Group: testzone_SG) is offline on node2 (First probe)
2011/08/24 18:59:24 VCS INFO V-16-1-10304 Resource testzone_oracle_testdb (Owner: unknown, Group: testzone_SG) is offline on node1 (First probe)
2011/08/24 18:59:33 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group testzone_SG on all nodes
2011/08/24 18:59:33 VCS NOTICE V-16-1-10301 Initiating Online of Resource testzone_oracle_testdb (Owner: unknown, Group: testzone_SG) on System node1
2011/08/24 19:01:36 VCS ERROR V-16-2-13066 (node1) Agent is calling clean for resource(testzone_oracle_testdb) because the resource is not up even after online completed.
2011/08/24 19:01:37 VCS ERROR V-16-2-13069 (node1) Resource(testzone_oracle_testdb) - clean failed.
2011/08/24 19:08:28 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group testzone_SG on all nodes
2011/08/24 19:11:04 VCS ERROR V-16-2-13079 (node1) Resource(testzone_oracle_testdb): The last 10 invocations of the clean procedure have failed.
2011/08/24 19:12:59 VCS NOTICE V-16-1-10022 Agent Oracle stopped
2011/08/24 19:12:59 VCS NOTICE V-16-1-10447 Group testzone_SG is online on system node1
2011/08/25 12:17:09 VCS NOTICE V-16-1-10208 Initiating switch of group testzone_SG from system node1 to system node2
2011/08/25 12:17:09 VCS NOTICE V-16-1-10300 Initiating Offline of Resource testzone_testzonedb_listener (Owner: unknown, Group: testzone_SG) on System node1
2011/08/25 12:17:15 VCS INFO V-16-2-13716 (node1) Resource(testzone_testzonedb_listener): Output of the completed operation (offline)
08-25-2011 01:18 AM
Hi
So at a high level the logs are saying:
Resource testzone_oracle_testdb was probed on both systems. Probes (monitors) completed and resource to be offline on both nodes
Resource testzone_oracle_testdb attempted to go online on node1
The online routine for the resource completed, but the monitor routine did not declare the resource was online. *Possibly* there is a configuration issue with the resource
Can you post the main.cf extract for the resource/service group
Are there any other dependencies that the Resource/SG requires ?
08-25-2011 01:24 AM
Thanks. I will post the main.cf in another comment.
Just now I switch the resource group to node2 and manually start the oracle db, make the dependency but the db could not be offline! It print out below error:
2011/08/25 12:18:38 VCS NOTICE V-16-1-10447 Group testzone_SG is online on system node2
2011/08/25 15:05:32 VCS NOTICE V-16-1-10016 Agent /opt/VRTSagents/ha/bin/Oracle/OracleAgent for resource type Oracle successfully started at Thu Aug 25 15:05:32 2011
2011/08/25 15:05:37 VCS INFO V-16-1-10304 Resource testzone_oracle_testdb (Owner: unknown, Group: testzone_SG) is offline on node1 (First probe)
2011/08/25 15:05:38 VCS INFO V-16-1-10297 Resource testzone_oracle_testdb (Owner: unknown, Group: testzone_SG) is online on node2 (First probe)
2011/08/25 15:05:38 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group testzone_SG on all nodes
2011/08/25 15:06:17 VCS NOTICE V-16-1-10208 Initiating switch of group testzone_SG from system node2 to system node1
2011/08/25 15:06:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource testzone_oracle_testdb (Owner: unknown, Group: testzone_SG) on System node2
2011/08/25 15:06:19 VCS ERROR V-16-2-13064 (node2) Agent is calling clean for resource(testzone_oracle_testdb) because the resource is up even after offline completed.
2011/08/25 15:06:21 VCS ERROR V-16-2-13069 (node2) Resource(testzone_oracle_testdb) - clean failed.
2011/08/25 15:07:22 VCS ERROR V-16-2-13077 (node2) Agent is unable to offline resource(testzone_oracle_testdb). Administrative intervention may be required.
2011/08/25 15:15:39 VCS ERROR V-16-2-13079 (node2) Resource(testzone_oracle_testdb): The last 10 invocations of the clean procedure have failed.
2011/08/25 15:25:59 VCS ERROR V-16-2-13079 (node2) Resource(testzone_oracle_testdb): The last 20 invocations of the clean procedure have failed.
2011/08/25 15:36:19 VCS ERROR V-16-2-13079 (node2) Resource(testzone_oracle_testdb): The last 30 invocations of the clean procedure have failed.
2011/08/25 15:46:46 VCS ERROR V-16-2-13079 (node2) Resource(testzone_oracle_testdb): The last 40 invocations of the clean procedure have failed.
2011/08/25 15:57:17 VCS ERROR V-16-2-13079 (node2) Resource(testzone_oracle_testdb): The last 50 invocations of the clean procedure have failed.
2011/08/25 16:07:38 VCS ERROR V-16-2-13079 (node2) Resource(testzone_oracle_testdb): The last 60 invocations of the clean procedure have failed.
2011/08/25 16:17:58 VCS ERROR V-16-2-13079 (node2) Resource(testzone_oracle_testdb): The last 70 invocations of the clean procedure have failed.
08-25-2011 01:30 AM
include "OracleASMTypes.cf"
include "types.cf"
include "Db2udbTypes.cf"
include "OracleTypes.cf"
include "SybaseTypes.cf"
cluster PROD (
UserNames = { admin = aLJnJTiULlJTiOLsJQiULoJUiULrKMiULnJTiR,
1 = bkiHknJtlFieHmkSkrJrkIhf,
2 = INOnNMmTNjNWmUNl,
3 = aLKhMJlHLnMLkKJeIGhHIh,
z_zoners_node1 = eHHeGCdKHpGChDHeHK,
z_zoners_node2 = bopPovMioUprOroPpl }
ClusterAddress = "X.X.X.X"
Administrators = { admin, 1, 2, 3 }
)
system node1 (
)
system node2 (
)
group ClusterService (
SystemList = { node1 = 0, node2 = 1 }
AutoStartList = { node1 }
)
IP Cluster_IP_address (
Device = aggr110001
Address = "10.71.5.203"
)
NIC NIC (
Device @node1 = aggr110001
Device @node2 = aggr110001
)
NotifierMngr Notifier (
SmtpServer = "X.X.X.X"
SmtpReturnPath = "X.X.X.X"
SmtpRecipients = { "X.X.X.X" = Information }
)
Cluster_IP_address requires NIC
Notifier requires NIC
// resource dependency tree
//
// group ClusterService
// {
// IP Cluster_IP_address
// {
// NIC NIC
// }
// NotifierMngr Notifier
// {
// NIC NIC
// }
// }
group testzone_SG (
SystemList = { node1 = 0, node2 = 1 }
ContainerInfo @node1 = { Name = testzone, Type = Zone, Enabled = 1 }
ContainerInfo @node2 = { Name = testzone, Type = Zone, Enabled = 1 }
AutoStartList = { node1, node2 }
Administrators = { z_zoners_node1, z_zoners_node2 }
)
DiskGroup testzone_testzoneBARdg (
DiskGroup = testzoneBARdg
)
DiskGroup testzone_testzoneDATAdg (
DiskGroup = testzoneDATAdg
)
DiskGroup testzone_testzonedg (
DiskGroup = testzonedg
)
Mount testzone_container_mount (
MountPoint = "/export/zones/testzone"
BlockDevice = "/dev/vx/dsk/testzonedg/testzonedg_V01"
FSType = vxfs
FsckOpt = "-n"
MntPtPermission = 700
MntPtOwner = 0
MntPtGroup = 0
)
Mount testzone_oraarch_mount (
MountPoint = "/export/zones/testzone/oraarch"
BlockDevice = "/dev/vx/dsk/testzoneDATAdg/testzoneDATAdg_V02"
FSType = vxfs
FsckOpt = "-n"
)
Mount testzone_oracle_mount (
MountPoint = "/export/zones/testzone/oracle"
BlockDevice = "/dev/vx/dsk/testzoneBARdg/testzoneBARdg_V01"
FSType = vxfs
FsckOpt = "-n"
)
Mount testzone_oradata_mount (
MountPoint = "/export/zones/testzone/oradata"
BlockDevice = "/dev/vx/dsk/testzoneDATAdg/testzoneDATAdg_V01"
FSType = vxfs
FsckOpt = "-n"
)
Netlsnr testzone_testzonedb_listener (
Owner = oracle
Home = "/oracle/product/11.2.0/dbhome_1"
Listener = LISTENER
EnvFile = "/oracle/.profile"
)
Oracle testzone_oracle_testdb (
Sid = testdb
Owner = oracle
Home = "/oracle/product/11.2.0/dbhome_1"
Pfile = "/oracle/product/11.2.0/dbhome_1/dbs/spfiletestdb.ora"
EnvFile = "/oracle/.profile"
)
Volume testzone_container_volume (
Volume = testzonedg_V01
DiskGroup = testzonedg
)
Volume testzone_oraBAR_V01 (
Volume = testzoneBARdg_V01
DiskGroup = testzoneBARdg
)
Volume testzone_oraDATA_V01 (
Volume = testzoneDATAdg_V01
DiskGroup = testzoneDATAdg
)
Volume testzone_oraDATA_V02 (
Volume = testzoneDATAdg_V02
DiskGroup = testzoneDATAdg
)
Zone zoners (
)
testzone_container_mount requires testzone_container_volume
testzone_container_volume requires testzone_testzonedg
testzone_oraBAR_V01 requires testzone_testzoneBARdg
testzone_oraDATA_V01 requires testzone_testzoneDATAdg
testzone_oraDATA_V02 requires testzone_testzoneDATAdg
testzone_oraarch_mount requires testzone_container_mount
testzone_oraarch_mount requires testzone_oraDATA_V02
testzone_oracle_mount requires testzone_container_mount
testzone_oracle_mount requires testzone_oraBAR_V01
testzone_oracle_testdb requires testzone_testzonedb_listener
testzone_oradata_mount requires testzone_container_mount
testzone_oradata_mount requires testzone_oraDATA_V01
testzone_testzonedb_listener requires zoners
zoners requires testzone_container_mount
zoners requires testzone_oraarch_mount
zoners requires testzone_oracle_mount
zoners requires testzone_oradata_mount
// resource dependency tree
//
// group testzone_SG
// {
// Oracle testzone_oracle_testdb
// {
// Netlsnr testzone_testzonedb_listener
// {
// Zone zoners
// {
// Mount testzone_oracle_mount
// {
// Mount testzone_container_mount
// {
// Volume testzone_container_volume
// {
// DiskGroup testzone_testzonedg
// }
// }
// Volume testzone_oraBAR_V01
// {
// DiskGroup testzone_testzoneBARdg
// }
// }
// Mount testzone_oraarch_mount
// {
// Volume testzone_oraDATA_V02
// {
// DiskGroup testzone_testzoneDATAdg
// }
// Mount testzone_container_mount
// {
// Volume testzone_container_volume
// {
// DiskGroup testzone_testzonedg
// }
// }
// }
// Mount testzone_oradata_mount
// {
// Volume testzone_oraDATA_V01
// {
// DiskGroup testzone_testzoneDATAdg
// }
// Mount testzone_container_mount
// {
// Volume testzone_container_volume
// {
// DiskGroup testzone_testzonedg
// }
// }
// }
// Mount testzone_container_mount
// {
// Volume testzone_container_volume
// {
// DiskGroup testzone_testzonedg
// }
// }
// }
// }
// }
// }
08-25-2011 02:00 AM
Sorry forgot to mention, you need to get Oracle_A.log from in the local zone ("testzone"), not the global zone and this should give information about why Oracle won't start under VCS control.
There is not point switching until you can online and offline Oracle using VCS, so you should just try these two actions. As VCS is detecting Oracle is onine, it is working to a certain extent, so I suspect it is something wrong with the /oracle/.profile. Do you need this profile - are there any specific settings you need - even if there are, I would try without setting EnvFile, just to see if Oracle starts under VCS.
Mike
08-25-2011 02:19 AM
Hi, Mike
There is no Oracle_A.log in the local zone.
I have also tried remove the EnvFile and manual online the db in sqlplus and then use Java console to switch this resource...still no use. It would be hang up in the process to shutdown the db.
Thanks,
Leo
08-25-2011 02:28 AM
You could try removing pfile attribute as you don't need this and if you do use this you need to specify pfile and it looks as though you have specified spfile.
Mike
08-25-2011 02:31 AM
Pfile attribute has been removed but still...fail....
08-25-2011 04:54 AM
Is there anyone can look @ this and let me know how to proceed?
Thanks a lot in advance.
08-25-2011 10:41 PM
Is there anyone can help me regarding this?
08-25-2011 11:58 PM
You should open a Symantec Support case. The only other thing I can think you can check is that you have the right types file. Some agents are shipped with a 5.0 and 5.1 types file so you should check you have a ContainerOpts attribute and not a ContainerName attribute.
Mike
08-28-2011 06:57 PM
Thank you very much. Mike.
I will open a ticket to symantec to ask them for help and would let you know when its closed.
Thanks again and have a nice day.
Leo
08-29-2011 06:52 PM
I noticed in the engine_log the following entry:
==============================================
VCS WARNING V-16-1-52529 Login Incorrect, Invalid username/password
==============================================
This can result from an improperly formed .vcspwd/halogin configuration. HAD very well be unable to determine the status of the Oracle instance. One would think that if the listener is working then, the Database should follow suit.
You may want to try running the hazonesetup command or manually try the following:
globalzone> hauser -update z_zoners_node1
set it to "password"
globalzone> hauser -update z_zoners_node2
set it to password
localzone> VCS_HOST=PROD_CLUSTER_IP (You showed x.x.x.x in you main.cf)
localzone> export VCS_HOST
localzone> halogin z_zoners_node1 password
localzone> halogin z_zoners_node2 password
localzone> more /.vcspwd
it should look something like this:
100 PROD_Cluster_IP z_zoners_node1 nkInnNmnKnoMmghu
100 PROD_Cluster_IP z_zoners_node2 fopHojOlpKppNxpJom
This is also a good reference.
http://www.symantec.com/business/support/index?page=content&id=TECH159144
Give the Database Agent another try and tail the engine_logs.
Hope this helps.
Joe D
09-01-2011 01:22 AM
Hi, All, thanks for your great help. Actually we need install SF5.1 SP1 to support the Oracle 11g DB because its Oracle 11gR2.
Thanks again for everyone's great help and I would close this ticket now.
09-01-2011 07:26 AM
It's very interesting that upgrading to 5.1 SP1 resolved the issue. 5.1 natively supports Oracle 11gR2 from a VCS perspective (assuming 64-bit Solaris 10). The SFDB tools however are not. Here is the latest DB support Matrix.
http://www.symantec.com/business/support/index?page=content&id=DOC4039
Take a look at page 8.
Joe D
09-01-2011 12:12 PM
Please remember to mention ALL relevant info in future....
If we knew a week ago this was 11gR2, we would've checked compatibility for you....