Forum Discussion

Robur's avatar
Robur
Level 3
14 years ago

Oracle agent: "online procedure did not complete within the expected time" but Oracle instance is UP

When I'm trying to put online an Oracle resource via Cluster Manager, even after the instance is really running (processes running), the resource is NOT online and VCS shows the following messages, running the 'clean' entry and SHUTING DOWN an active instance. Apparently not problems of system resources (CPU, speed of disks,...). Version SFHA 5.1 SP1 RP2P2. Any idea?

 

2012/02/29 19:39:03 VCS NOTICE V-16-1-10301 Initiating Online of Resource ora_pprod10 (Owner: Unspecified, Group: pprod10) on System caspio
2012/02/29 19:44:04 VCS WARNING V-16-2-13012 (caspio) Resource(ora_pprod10): online procedure did not complete within the expected time.
2012/02/29 19:44:04 VCS ERROR V-16-2-13065 (caspio) Agent is calling clean for resource(ora_pprod10) because online did not complete within the expected time.
2012/02/29 19:44:06 VCS NOTICE V-16-20002-25 (caspio) Oracle:ora_pprod10:clean:Oracle shutdown returned the output
+--------------------------------------------------------------------+
LD_LIBRARY_PATH - /database/product/10.2.0.4/lib:/usr/lib:
ORACLE instance shut down.
+====================================================================+
2012/02/29 19:44:06 VCS INFO V-16-2-13068 (caspio) Resource(ora_pprod10) - clean completed successfully.
2012/02/29 19:44:06 VCS INFO V-16-2-13071 (caspio) Resource(ora_pprod10): reached OnlineRetryLimit(0).
2012/02/29 19:44:06 VCS ERROR V-16-1-10303 Resource ora_pprod10 (Owner: Unspecified, Group: pprod10) is FAULTED (timed out) on sys caspio
2012/02/29 19:44:37 VCS INFO V-16-1-10307 Resource ora_pprod10 (Owner: Unspecified, Group: pprod10) is offline on caspio (Not initiated by VCS)

 

Thanks in advance.

8 Replies

  • Something wrong with Oracle resource definition?

    Please post ora_pprod10 resource definition in main.cf.

  • From main.cf:

            Oracle ora_pprod10 (
                    Pfile = "/database/product/10.2.0.4/dbs/initpprod10.ora"
                    Owner = oracle10
                    User = vcs
                    Table = sys_alive
                    Home = "/database/product/10.2.0.4"
                    StartUpOpt = STARTUP
                    Sid = pprod10
                    Pword = HVNtKVkNInJNkTUvKTk
                    AgentDebug = 1
                    LevelTwoMonitorFreq = 1
                    )

     

    From OracleTypes.cf:

    type Oracle (
            static boolean IntentionalOffline = 0
            static str AgentDirectory = "/opt/VRTSagents/ha/bin/Oracle"
            static keylist SupportedActions = { VRTS_GetInstanceName, VRTS_GetRunningServices, DBRestrict, DBUndoRestrict, DBResume, DBSuspend, D
    BTbspBackup, "home.vfd", "owner.vfd", getid, "pfile.vfd" }
            static int RestartLimit = 2
            static str ArgList[] = { Sid, Owner, Home, Pfile, StartUpOpt, ShutDownOpt, DBAUser, DBAPword, EnvFile, AutoEndBkup, DetailMonitor, Us
    er, Pword, Table, MonScript, AgentDebug, Encoding, MonitorOption }
            static int ContainerOpts{} = { RunInContainer=1, PassCInfo=0 }
            static str IMFRegList[] = { Home, Owner, Sid, MonitorOption }
            str Pfile
            str Owner
            str User
            str Table
            str EnvFile
            int DetailMonitor
            str Home
            str StartUpOpt = STARTUP_FORCE
            int MonitorOption
            str MonScript = "./bin/Oracle/SqlTest.pl"
            boolean AutoEndBkup = 1
            str Encoding
            str Sid
            str DBAPword
            str Pword
            str DBAUser
            str ShutDownOpt = IMMEDIATE
            boolean AgentDebug = 0
    )

     

  • Problem is probably that you specified Pfile as Oracle usually uses an SPfile so if this is the case then you should leave pfile attribute blank.  If this is not the issue then I would trying unsetting the optional attributes:

                    User = vcs
                    Table = sys_alive
                    Pword = HVNtKVkNInJNkTUvKTk

                    LevelTwoMonitorFreq = 1

    If it works without setting these, then there is a problem with the second level monitoring.

    If still having problems, then also give output of "ps -ef | grep pmon"

    Mike

  • Processes are running after 'startup' entry is executed, but Monitor (with DetailMonitor=1 and LevelTwoMonitorFreq=1) doesn't seem to realize about this.

    At the moment, the resource of Oracle type named "ora_pprod10" with DetailMonitor=0 works fine.

    "Pfile" parameter is defined in the same way and DetailMonitor=1 over the other 18 Oracle resources and all works fine.

    ###################################3

    # ps -ef | grep pmon

    oracle10 29874     1   0   Feb 29 ?           0:19 /database/product/10.2.0.4/bin/tnslsnr listener_pprod10 -inherit
     

    # ps -ef | grep pprod10

    oracle10 27088     1   0   Feb 29 ?           0:20 ora_qmnc_pprod10
    oracle10 27041     1   0   Feb 29 ?           2:40 ora_ckpt_pprod10
    oracle10 27033     1   0   Feb 29 ?           0:11 ora_psp0_pprod10
    oracle10 27045     1   0   Feb 29 ?           0:00 ora_reco_pprod10
    oracle10 27035     1   0   Feb 29 ?           0:13 ora_mman_pprod10
    oracle10 27081     1   0   Feb 29 ?           0:06 ora_arc0_pprod10
    oracle10 27037     1   0   Feb 29 ?           1:11 ora_dbw0_pprod10
    oracle10 27051     1   0   Feb 29 ?           8:07 ora_mmnl_pprod10
    oracle10 27043     1   0   Feb 29 ?           0:32 ora_smon_pprod10
    oracle10 27371     1   0   Feb 29 ?           0:00 ora_q001_pprod10
    oracle10 27039     1   0   Feb 29 ?           3:46 ora_lgwr_pprod10
    oracle10 27031     1   0   Feb 29 ?           3:21 ora_pmon_pprod10
    oracle10 27049     1   0   Feb 29 ?           0:52 ora_mmon_pprod10
    oracle10 27047     1   0   Feb 29 ?           6:11 ora_cjq0_pprod10
    oracle10 15361     1   0 11:17:42 ?           0:00 ora_q000_pprod10
    oracle10  1518  1517   0   Mar 01 ?          23:47 oraclepprod10 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
    oracle10 27083     1   0   Feb 29 ?           0:11 ora_arc1_pprod10
    oracle10  3680     1   0 05:06:07 ?           0:01 ora_q004_pprod10
    oracle10 21923     1   0 10:50:20 ?           0:00 ora_q002_pprod10
    oracle10 12072     1   0   Mar 01 ?           0:09 oraclepprod10 (LOCAL=NO)
    ...

    # ps -ef | grep -i oracle|grep -i agent
        root  8829     1   0   Feb 29 ?          31:41 /opt/VRTSagents/ha/bin/Oracle/OracleAgent -type Oracle -agdir /opt/VRTSagents/h

    # ptree 8829
    8829  /opt/VRTSagents/ha/bin/Oracle/OracleAgent -type Oracle -agdir /opt/VRTSagents/h
     


     

  • Additional parameters needed for Level Two Monitoring.

    See Setting up detail monitoring for VCS agents for Oracle in https://sort.symantec.com/public/documents/sfha/5.1sp1/solaris/productguides/pdf/vcs_oracle_agent_51sp1_sol.pdf

     

  • Looking at Oracle VCS agent 5.1SP1, it's pretty confusing as Symantec have introduced new MonitorOption attirbute so now there is:

    Primary or basic monitoring consisting of:

      Process check 

      OR (NOT and) Healthcheck using Oracle Health Check API

      This is determined by new MonitorOption attirbute

     
    AND you can optionally use Detail or secondlevel monitoring which is determined by the DetailMonitor attribute and this has changed since 5.1 (with no SP1) as Detail Monitor could be greater than 1 to denote running second level monitor every nth monitor interval, but now you need to use LevelTwoMonitorFreq  attribute for this.
     
    The "DetailMonitor" monitor was not present in your initial post of main.cf so Detailed monitored was set to 0, and you say in your last post that Oracle Monitor works with DetailMonitor=0, but if you set to 1, you say it does not work, but you seem to have the right additional parameters set:
     

                    User = vcs
                    Table = sys_alive
                    Pword = HVNtKVkNInJNkTUvKTk

                    LevelTwoMonitorFreq = 1

    It maybe you double encrypted password - i.e if you add password in GUI you do not need to encrpt as GUI encrypts for you.  If this is not the issue then it is probably an Oracle issue like Oracle user "vcs" does not have the right privileges to update table "sys_alive".  You can have a look at the perl script the agent uses - /opt/VRTSagents/ha/bin/Oracle/SqlTest.pl and try the SQL statement manually in Oracle (I think it just updates a row in the table).

    Mike

     
  • As an update, we've forwarded this thread along to our documentation team and the group responsible for rolling upgrades, so that they can make any necessary changes. Thanks for pointing this out, Mike!

    Best,

    Kimberley

  • Online entry point getting timed out is a completely different issue than the one in which detail monitoring does not work.

    The below error suggests that the time taken by Oracle to start a perticular database instance is significantly high (5+ minutes). 

    "2012/02/29 19:44:04 VCS WARNING V-16-2-13012 (caspio) Resource(ora_pprod10): online procedure did not complete within the expected time."


    The VCS online operation makes a blocking call to sqlplus utility. The call returns immediately in case of any failure. The timeouts mean that the call to sqlplus did not return.

    ***

    The Issue with detail monitoring not working seems to be due to access permissions. The user needs to have read and write permisssions to the table.have you granted those permissions to the user ?