03-02-2012 03:47 AM
When I'm trying to put online an Oracle resource via Cluster Manager, even after the instance is really running (processes running), the resource is NOT online and VCS shows the following messages, running the 'clean' entry and SHUTING DOWN an active instance. Apparently not problems of system resources (CPU, speed of disks,...). Version SFHA 5.1 SP1 RP2P2. Any idea?
2012/02/29 19:39:03 VCS NOTICE V-16-1-10301 Initiating Online of Resource ora_pprod10 (Owner: Unspecified, Group: pprod10) on System caspio
2012/02/29 19:44:04 VCS WARNING V-16-2-13012 (caspio) Resource(ora_pprod10): online procedure did not complete within the expected time.
2012/02/29 19:44:04 VCS ERROR V-16-2-13065 (caspio) Agent is calling clean for resource(ora_pprod10) because online did not complete within the expected time.
2012/02/29 19:44:06 VCS NOTICE V-16-20002-25 (caspio) Oracle:ora_pprod10:clean:Oracle shutdown returned the output
+--------------------------------------------------------------------+
LD_LIBRARY_PATH - /database/product/10.2.0.4/lib:/usr/lib:
ORACLE instance shut down.
+====================================================================+
2012/02/29 19:44:06 VCS INFO V-16-2-13068 (caspio) Resource(ora_pprod10) - clean completed successfully.
2012/02/29 19:44:06 VCS INFO V-16-2-13071 (caspio) Resource(ora_pprod10): reached OnlineRetryLimit(0).
2012/02/29 19:44:06 VCS ERROR V-16-1-10303 Resource ora_pprod10 (Owner: Unspecified, Group: pprod10) is FAULTED (timed out) on sys caspio
2012/02/29 19:44:37 VCS INFO V-16-1-10307 Resource ora_pprod10 (Owner: Unspecified, Group: pprod10) is offline on caspio (Not initiated by VCS)
Thanks in advance.
03-02-2012 10:07 AM
Something wrong with Oracle resource definition?
Please post ora_pprod10 resource definition in main.cf.
03-05-2012 02:50 AM
From main.cf:
Oracle ora_pprod10 (
Pfile = "/database/product/10.2.0.4/dbs/initpprod10.ora"
Owner = oracle10
User = vcs
Table = sys_alive
Home = "/database/product/10.2.0.4"
StartUpOpt = STARTUP
Sid = pprod10
Pword = HVNtKVkNInJNkTUvKTk
AgentDebug = 1
LevelTwoMonitorFreq = 1
)
From OracleTypes.cf:
type Oracle (
static boolean IntentionalOffline = 0
static str AgentDirectory = "/opt/VRTSagents/ha/bin/Oracle"
static keylist SupportedActions = { VRTS_GetInstanceName, VRTS_GetRunningServices, DBRestrict, DBUndoRestrict, DBResume, DBSuspend, D
BTbspBackup, "home.vfd", "owner.vfd", getid, "pfile.vfd" }
static int RestartLimit = 2
static str ArgList[] = { Sid, Owner, Home, Pfile, StartUpOpt, ShutDownOpt, DBAUser, DBAPword, EnvFile, AutoEndBkup, DetailMonitor, Us
er, Pword, Table, MonScript, AgentDebug, Encoding, MonitorOption }
static int ContainerOpts{} = { RunInContainer=1, PassCInfo=0 }
static str IMFRegList[] = { Home, Owner, Sid, MonitorOption }
str Pfile
str Owner
str User
str Table
str EnvFile
int DetailMonitor
str Home
str StartUpOpt = STARTUP_FORCE
int MonitorOption
str MonScript = "./bin/Oracle/SqlTest.pl"
boolean AutoEndBkup = 1
str Encoding
str Sid
str DBAPword
str Pword
str DBAUser
str ShutDownOpt = IMMEDIATE
boolean AgentDebug = 0
)
03-05-2012 03:07 AM
Problem is probably that you specified Pfile as Oracle usually uses an SPfile so if this is the case then you should leave pfile attribute blank. If this is not the issue then I would trying unsetting the optional attributes:
User = vcs
Table = sys_alive
Pword = HVNtKVkNInJNkTUvKTk
LevelTwoMonitorFreq = 1
If it works without setting these, then there is a problem with the second level monitoring.
If still having problems, then also give output of "ps -ef | grep pmon"
Mike
03-05-2012 08:26 AM
Processes are running after 'startup' entry is executed, but Monitor (with DetailMonitor=1 and LevelTwoMonitorFreq=1) doesn't seem to realize about this.
At the moment, the resource of Oracle type named "ora_pprod10" with DetailMonitor=0 works fine.
"Pfile" parameter is defined in the same way and DetailMonitor=1 over the other 18 Oracle resources and all works fine.
###################################3
# ps -ef | grep pmon
oracle10 29874 1 0 Feb 29 ? 0:19 /database/product/10.2.0.4/bin/tnslsnr listener_pprod10 -inherit
# ps -ef | grep pprod10
oracle10 27088 1 0 Feb 29 ? 0:20 ora_qmnc_pprod10
oracle10 27041 1 0 Feb 29 ? 2:40 ora_ckpt_pprod10
oracle10 27033 1 0 Feb 29 ? 0:11 ora_psp0_pprod10
oracle10 27045 1 0 Feb 29 ? 0:00 ora_reco_pprod10
oracle10 27035 1 0 Feb 29 ? 0:13 ora_mman_pprod10
oracle10 27081 1 0 Feb 29 ? 0:06 ora_arc0_pprod10
oracle10 27037 1 0 Feb 29 ? 1:11 ora_dbw0_pprod10
oracle10 27051 1 0 Feb 29 ? 8:07 ora_mmnl_pprod10
oracle10 27043 1 0 Feb 29 ? 0:32 ora_smon_pprod10
oracle10 27371 1 0 Feb 29 ? 0:00 ora_q001_pprod10
oracle10 27039 1 0 Feb 29 ? 3:46 ora_lgwr_pprod10
oracle10 27031 1 0 Feb 29 ? 3:21 ora_pmon_pprod10
oracle10 27049 1 0 Feb 29 ? 0:52 ora_mmon_pprod10
oracle10 27047 1 0 Feb 29 ? 6:11 ora_cjq0_pprod10
oracle10 15361 1 0 11:17:42 ? 0:00 ora_q000_pprod10
oracle10 1518 1517 0 Mar 01 ? 23:47 oraclepprod10 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle10 27083 1 0 Feb 29 ? 0:11 ora_arc1_pprod10
oracle10 3680 1 0 05:06:07 ? 0:01 ora_q004_pprod10
oracle10 21923 1 0 10:50:20 ? 0:00 ora_q002_pprod10
oracle10 12072 1 0 Mar 01 ? 0:09 oraclepprod10 (LOCAL=NO)
...
# ps -ef | grep -i oracle|grep -i agent
root 8829 1 0 Feb 29 ? 31:41 /opt/VRTSagents/ha/bin/Oracle/OracleAgent -type Oracle -agdir /opt/VRTSagents/h
# ptree 8829
8829 /opt/VRTSagents/ha/bin/Oracle/OracleAgent -type Oracle -agdir /opt/VRTSagents/h
03-05-2012 09:39 AM
Additional parameters needed for Level Two Monitoring.
See Setting up detail monitoring for VCS agents for Oracle in https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/vcs_oracle_agent_51sp1_sol.pdf
03-06-2012 06:56 AM
Looking at Oracle VCS agent 5.1SP1, it's pretty confusing as Symantec have introduced new MonitorOption attirbute so now there is:
Primary or basic monitoring consisting of:
Process check
OR (NOT and) Healthcheck using Oracle Health Check API
This is determined by new MonitorOption attirbute
User = vcs
Table = sys_alive
Pword = HVNtKVkNInJNkTUvKTk
LevelTwoMonitorFreq = 1
It maybe you double encrypted password - i.e if you add password in GUI you do not need to encrpt as GUI encrypts for you. If this is not the issue then it is probably an Oracle issue like Oracle user "vcs" does not have the right privileges to update table "sys_alive". You can have a look at the perl script the agent uses - /opt/VRTSagents/ha/bin/Oracle/SqlTest.pl and try the SQL statement manually in Oracle (I think it just updates a row in the table).
Mike
03-06-2012 03:37 PM
As an update, we've forwarded this thread along to our documentation team and the group responsible for rolling upgrades, so that they can make any necessary changes. Thanks for pointing this out, Mike!
Best,
Kimberley
04-24-2012 04:08 AM
Online entry point getting timed out is a completely different issue than the one in which detail monitoring does not work.
The below error suggests that the time taken by Oracle to start a perticular database instance is significantly high (5+ minutes).
"2012/02/29 19:44:04 VCS WARNING V-16-2-13012 (caspio) Resource(ora_pprod10): online procedure did not complete within the expected time."
The VCS online operation makes a blocking call to sqlplus utility. The call returns immediately in case of any failure. The timeouts mean that the call to sqlplus did not return.
***
The Issue with detail monitoring not working seems to be due to access permissions. The user needs to have read and write permisssions to the table.have you granted those permissions to the user ?