cancel
Showing results for 
Search instead for 
Did you mean: 

Monitor Application w/in Solaris Zone - but not failover zone

Suki
Level 3

Hello everyone -   I actually have two questions.  The first centers around installing VCS in Solaris 10 u8 global zone - but only monitoring applications within a non-global zone. (ie: Apache).  The zone-root is installed on local storage and the application is also installed on local storage.  We only want to have the apache application shut down and started up on the other node.  How would we do this?  Every resource dependency example i see shows DG, Mount, etc.

 

The second question is can VCS be installed within a sparse non-global zone?  And if so - how do we accomplish the same scenario?

thanks

 

Julie  julie.lamothe@sita.aero

1 ACCEPTED SOLUTION

Accepted Solutions

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

Well thats good news, so far.

 

Does it failover if you kill them? Have you set the restart attribute to something other than 0?

 

Can you post your engineA and Apache logs?

View solution in original post

10 REPLIES 10

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi

 

Please check the attached diagrams attached, your configuration should fit one of the two. If not, please let me know how the file systems used by the zones is mounted, and where its located (on a diskgroup, or simply in the O/S disks). If its on the O/S and always available you should be able to simply remove the diskgroup and mount resources.

Suki
Level 3

 

Riaan -

I actually have the non-global sparse zone configured with using zpools - so that takes the place of the mount and diskgroup. I have the application zfs filesystems set as datasets within the zone and the zfs mountpoint set to /www and /app within the zone.  

We decided to make it shared storage and fail the zone (with Apache) from one node to the other.   However, when i now try to bring up the Apache resource - it states it can not find the ConfigFile /zones/z2/root/www/current/conf/httpd.conf.

I have the ContainOpts attribute set and RunInContainer 1  PassCInfo 0.

Here is the zone config:

# zonecfg -z z2 info
zonename: z2
zonepath: /zones/z2
brand: native
autoboot: true
bootargs:
pool: UASLQYY001Q_zpool_vcs
limitpriv:
scheduling-class:
ip-type: shared
hostid:
[cpu-shares: 1000]
inherit-pkg-dir:
        dir: /lib
inherit-pkg-dir:
        dir: /platform
inherit-pkg-dir:
        dir: /sbin
inherit-pkg-dir:
        dir: /usr
fs:
        dir: /export/home
        special: /export/home/z2
        raw not specified
        type: lofs
        options: [rw]
net:
        address: 10.128.16.141/22
        physical: bge0
        defrouter not specified
dataset:
        name: UASLQYY001Q_apppool_vcs/app
dataset:
        name: UASLQYY001Q_apppool_vcs/www
rctl:
        name: zone.cpu-shares
        value: (priv=privileged,limit=1000,action=none)

 

And here is the Apache agent resource config:

#Resource    Attribute          System          Value
sliq_apache  Group              global          sliq_sg
sliq_apache  Type               global          Apache
sliq_apache  AutoStart          global          1
sliq_apache  Critical           global          0
sliq_apache  Enabled            global          1
sliq_apache  LastOnline         global
sliq_apache  MonitorOnly        global          0
sliq_apache  ResourceOwner      global          unknown
sliq_apache  TriggerEvent       global          0
sliq_apache  ArgListValues      UASLQYY001Q-GLZ ResLogLevel     1       INFO    State   1       1       IState  1  0httpdDir        1       /zones/z2/root/www/current/conf/        SharedObjDir    1       ""      EnvFile 1       "" PidFile  1       /zones/z2/root/www/logs/httpd.pid       HostName        1       ""      Port    1       80      User1       www     SecondLevelMonitor      1       0       SecondLevelTimeout      1       30      ConfigFile      1  /zones/z2/root/www/current/conf/httpd.conf       EnableSSL       1       0       DirectiveAfter  0       DirectiveBefore     0
sliq_apache  ArgListValues      UASLQYY002Q-GLZ ResLogLevel     1       INFO    State   1       0       IState  1  0httpdDir        1       /zones/z2/root/www/current/conf/        SharedObjDir    1       ""      EnvFile 1       "" PidFile  1       /zones/z2/root/www/logs/httpd.pid       HostName        1       ""      Port    1       80      User1       www     SecondLevelMonitor      1       0       SecondLevelTimeout      1       30      ConfigFile      1  /zones/z2/root/www/current/conf/httpd.conf       EnableSSL       1       0       DirectiveAfter  0       DirectiveBefore     0
sliq_apache  ConfidenceLevel    UASLQYY001Q-GLZ 0
sliq_apache  ConfidenceLevel    UASLQYY002Q-GLZ 0
sliq_apache  ConfidenceMsg      UASLQYY001Q-GLZ
sliq_apache  ConfidenceMsg      UASLQYY002Q-GLZ
sliq_apache  Flags              UASLQYY001Q-GLZ
sliq_apache  Flags              UASLQYY002Q-GLZ
sliq_apache  IState             UASLQYY001Q-GLZ not waiting
sliq_apache  IState             UASLQYY002Q-GLZ not waiting
sliq_apache  MonitorMethod      UASLQYY001Q-GLZ Traditional
sliq_apache  MonitorMethod      UASLQYY002Q-GLZ Traditional
sliq_apache  Probed             UASLQYY001Q-GLZ 1
sliq_apache  Probed             UASLQYY002Q-GLZ 1
sliq_apache  Start              UASLQYY001Q-GLZ 1
sliq_apache  Start              UASLQYY002Q-GLZ 0
sliq_apache  State              UASLQYY001Q-GLZ FAULTED
sliq_apache  State              UASLQYY002Q-GLZ OFFLINE
sliq_apache  ComputeStats       global          0
sliq_apache  ConfigFile         global          /zones/z2/root/www/current/conf/httpd.conf
sliq_apache  ContainerInfo      global          Type    Zone    Name    z2      Enabled 1
sliq_apache  ContainerOpts      global          RunInContainer  1       PassCInfo       0
sliq_apache  DirectiveAfter     global
sliq_apache  DirectiveBefore    global
sliq_apache  EnableSSL          global          0
sliq_apache  EnvFile            global
sliq_apache  HostName           global
sliq_apache  PidFile            global          /zones/z2/root/www/logs/httpd.pid
sliq_apache  Port               global          80
sliq_apache  ResLogLevel        global          INFO
sliq_apache  ResourceInfo       global          State   Stale   Msg             TS
sliq_apache  SecondLevelMonitor global          0
sliq_apache  SecondLevelTimeout global          30
sliq_apache  SharedObjDir       global
sliq_apache  User               global          www
sliq_apache  httpdDir           global          /zones/z2/root/www/current/conf/
sliq_apache  MonitorTimeStats   UASLQYY001Q-GLZ Avg     0       TS
sliq_apache  MonitorTimeStats   UASLQYY002Q-GLZ Avg     0       TS

 

Any help would be greatly appreciated.

 

thanks

 

JUlie

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi Julie,

 

Ok so I'm not an expert on zones but you need to look at each component.

 

First thing I want to know each about the storage. You said you've decided to make it shared storage, and you're using ZFS. So my question is, how do you intend to failover/move the data from one node to the other. From what I know we can't do that with VCS. Usually a web application can be configured with local disk which the cluster would mount/unmount as required.

 

So assuming you're got some method to do this, the next thing I want to know is if the cluster can start your zone (resource), and whether all the mounts required to start Apache are present.

 

Can you also post your main.cf please.

g_lee
Level 6

Julie,

What version (and patch level) of VCS are you using? (eg: VCS 5.0 MP3? VCS 5.1?)

Although most dependency examples do tend to show DiskGroup and Mount dependencies rather than Zpool, provided your VCS version has the Zpool bundled agent, you should be able to configure dependencies using the same logic ie: it sounds like you need to configure a Zpool (and possibly also a Mount) resource, and configure the dependencies with the Apache resource accordingly.

eg: VCS 5.0 MP3 Bundled Agents Reference Guide (Solaris) -> Storage Agents -> Zpool Agent
http://sfdoccentral.symantec.com/sf/5.0MP3/solaris/html/vcs_bundled_agents/ch_sol_storage_agents47.html

Note the following limitations for Zpool agent:
----------
Limitations
The agent does not support the use of logical volumes in ZFS. If ZFS logical volumes are in use in the pool, the pool cannot be exported, even with the -f option. Sun does not recommend the use of logical volumes in ZFS due to performance and reliability issues.
----------

Dependencies example:
http://sfdoccentral.symantec.com/sf/5.0MP3/solaris/html/vcs_bundled_agents/ch_sol_storage_agents50.html

From same document: Mount agent notes -> ZFS file system and pool creation example
http://sfdoccentral.symantec.com/sf/5.0MP3/solaris/html/vcs_bundled_agents/ch_sol_storage_agents45.html#1184901

Hope that helps,
Grace

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Sheesh,

 

I seem to be out of date with the world. Can you let us know if you have the zpool agent available in your version. If you do, you'd  need to add it as per the dependency diagram listed, and possibly add a mount resources as well. Depending on who you want to be responsible for mounting /app and /www (vcs or the zone).

 

 

Suki
Level 3

Hi Riaan -

 

I do have the zpool configured and now all is working as far as failing over the zone, zpool and apache (within the zone). 

Since i am using datasets for the seperate zpool/zfs filesystems configured within the zone - i had to actually point Apache to the filesystem as it resides within the zone (/www/current/etc) as opposed to how it is physically mounted in the global as /zones/z2/root/www/etc....

The only problem i seem to have now is that the Apache agent that comes bundled is not able to effectively monitor or restart the apache procs if they die (forcefully by me to test).  It believes it is cleaning - but never actually restarts the procs.

 

Any ideas?

 

thanks

 

Julie

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi,

 

Well thats good news, so far.

 

Does it failover if you kill them? Have you set the restart attribute to something other than 0?

 

Can you post your engineA and Apache logs?

Suki
Level 3

it turned the debug level up to :

Apache        LogDbg                  DBG_1     DBG_2   DBG_3   DBG_4   DBG_AGINFO      DBG_AGDEBUG

Here is the Apache_A.log:

2010/12/14 18:39:12 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Adding timer for Agent with tmo 522366
        VCSAgTimer.C:_add[724]
2010/12/14 18:39:12 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Timer id is 1
        VCSAgTimer.C:_add[740]
2010/12/14 18:39:55 VCS DBG_AGDEBUG V-16-50-0 Thread(2) name(Agent) op(1606)
        VCSAgTimer.C:check_timers[298]
2010/12/14 18:39:55 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Sending I am alive message
        VCSAgTimer.C:check_timers[479]
2010/12/14 18:39:55 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Sending IAA at 522366 seconds
        VCSAgNotifier.C:send_iamalive[910]
2010/12/14 18:39:55 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Resetting periodic timer for resource Agent op 160
6 to expire at 522409
        VCSAgTimer.C:_reset_periodic_timer[1000]
2010/12/14 18:39:55 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Adding timer for Agent with tmo 522409
        VCSAgTimer.C:_add[724]
2010/12/14 18:39:55 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Timer id is 1
        VCSAgTimer.C:_add[740]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(2) name(sliq_apache) op(1619)
        VCSAgTimer.C:check_timers[298]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Resetting periodic timer for resource sliq_apache
op 1619 to expire at 522677
        VCSAgTimer.C:_reset_periodic_timer[1000]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Adding timer for sliq_apache with tmo 522677
        VCSAgTimer.C:_add[724]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Timer id is 16
        VCSAgTimer.C:_add[740]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Appending command minor code 1607 for resource sli
q_apache
        VCSAgRes.C:append_cmd[466]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(2) Scheduled resource sliq_apache
        VCSAgSched.C:put_req[173]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(4) Picked Res(sliq_apache) from Scheduler
        VCSAgSched.C:_dequeue[64]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(4) Resource (sliq_apache) received cmd minor code (MS
G_AGI_MONITOR_TIMER)
        VCSAgRes.C:process_cmd[5594]

2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(4) Res(sliq_apache) - incremented _monitor_count to (
1)
        VCSAgRes.C:inc_and_compare_monitor_count[10404]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(4) Res(sliq_apache) - _leveltwo_monitor_freq : 1
        VCSAgRes.C:inc_and_compare_monitor_count[10436]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(4) Res(sliq_apache) - _monitor_levels : 3
        VCSAgRes.C:inc_and_compare_monitor_count[10449]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(4) _monitor_count reached _monitor_count_overflow; re
-wound _monitor_count to 0
        VCSAgRes.C:inc_and_compare_monitor_count[10456]
2010/12/14 18:40:06 VCS DBG_AGDEBUG V-16-50-0 Thread(4) Res(sliq_apache) - _monitor_count is (0), _monitor
_count_overflow is (1)
        VCSAgRes.C:inc_and_compare_monitor_count[10460]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) Resource sliq_apache transitioning from Offline to
Monitoring
        VCSAgRes.C:internal_state[4901]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) The values of Container Object attributes are given
 below
        VCSAgRes.C:get_container_object[2308]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) The RIC for the Resource is 1 and PassCInfo is 0
        VCSAgRes.C:get_container_object[2361]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) Container created with Name z2, Type Zone and Enabl
ed 1
        VCSAgRes.C:get_container_object[2376]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) Creating object of Zone container
        VCSAgContainer.C:create_container[222]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) The values of ArgList attributes are given below
        VCSAgRes.C:call_entry_point[1129]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) arg[0] is (ResLogLevel)
        VCSAgRes.C:call_entry_point[1140]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) arg[1] is (1)
        VCSAgRes.C:call_entry_point[1140]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) arg[2] is (TRACE)
        VCSAgRes.C:call_entry_point[1140]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) arg[3] is (State)
        VCSAgRes.C:call_entry_point[1140]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) arg[4] is (1)
        VCSAgRes.C:call_entry_point[1140]
2010/12/14 18:40:06 VCS DBG_AGINFO V-16-50-0 Thread(4) arg[5] is (1)
        VCSAgRes.C:call_entry_point[1140]

Here is output from engine_A.log:

2010/12/14 18:29:54 VCS ERROR V-16-2-13067 (UASLQYY001Q-GLZ) Agent is calling clean for resource(sliq_apac
he) because the resource became OFFLINE unexpectedly, on its own.
2010/12/14 18:29:54 VCS NOTICE V-16-10061-20143 (UASLQYY001Q-GLZ) Apache:sliq_apache:clean:VCSagentFW:Setu
pLogging:[clean] Entered by resource instance [sliq_apache] with clean reason [4][Unexpected Offline]
2010/12/14 18:30:05 VCS INFO V-16-2-13068 (UASLQYY001Q-GLZ) Resource(sliq_apache) - clean completed succes
sfully.
2010/12/14 18:30:06 VCS INFO V-16-1-10307 Resource sliq_apache (Owner: unknown, Group: sliq_sg) is offline
 on UASLQYY001Q-GLZ (Not initiated by VCS)

I have just set the RestartLimit to 2 and the RestartLimit to 2

Suki
Level 3

That was it!!!!!    Changed the RestartLimit to 2; restarted the apache process - then killed the two httpd procs from the global and it restarted!

 

thanks for all your help!

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Glad its working :)