cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with Application Agent in VCS

Koven1
Level 4

Hi

i just want to know why there're some problems when I'm trying to put into my cluster an Ops Center (Oracle not Netbackup).

This is the configuration

group opscenter-sg (
        SystemList = { bck01a = 0, bck01b = 1 }
        Enabled @bck01b = 0
        AutoStartList = { bck01b }
        OnlineRetryLimit = 2
        OnlineRetryInterval = 180
        )

        Application opscenter_app (
                StartProgram = "/opt/sun/xvmoc/bin/ecadm start"
                StopProgram = "/opt/sun/xvmoc/bin/ecadm stop -w"
                MonitorProcesses = { OCLISTENER }
                )

        DiskReservation reservation_dev_sdd (
                Enabled = 0
                Disks = { "/dev/opscenter" }
                )

        IP ops-app-ip (
                Device = "bond0.620"
                Address = "10.10.102.104"
                NetMask = "255.255.255.128"
                )

        IP ops-dmz-ip (
                Device = "bond0.622"
                Address = "10.10.102.244"
                NetMask = "255.255.255.192"
                )

        IP ops-ges-ip (
                Device = "bond0.610"
                Address = "10.10.101.104"
                NetMask = "255.255.255.0"
                )

        IP ops-int-ip (
                Device = "bond0.621"
                Address = "10.10.102.184"
                NetMask = "255.255.255.192"
                )

        Mount opscenter-mount (
                MountPoint = "/var/opt/sun"
                BlockDevice = "/dev/opscenter"
                FSType = ext4
                MountOpt = rw
                FsckOpt = "-y"
                )

        Proxy proxy-bck-app-nic (
                TargetResName = bck-app-nic
                )

        Proxy proxy-bck-dmz-nic (
                TargetResName = bck-dmz-nic
                )

        Proxy proxy-bck-ges-nic (
                TargetResName = bck-ges-nic
                )

        Proxy proxy-bck-int-nic (
                TargetResName = bck-int-nic
                )

        ops-app-ip requires proxy-bck-app-nic
        ops-dmz-ip requires proxy-bck-dmz-nic
        ops-ges-ip requires proxy-bck-ges-nic
        ops-int-ip requires proxy-bck-int-nic
        opscenter-mount requires ops-app-ip
        opscenter-mount requires ops-dmz-ip
        opscenter-mount requires ops-ges-ip
        opscenter-mount requires ops-int-ip
        opscenter_app requires opscenter-mount

This is the error I get all the time and I've tried setting OnLineRetryLimit and OnlineRetryInterval with no luck I'm getting always the same error. The only one resource doesn't get online is opscenter-app, IP, Mount are working fine

 

Aug 20 12:10:30 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13068 Thread(4146064240) Resource(opscenter_app) - clean completed successfully.
Aug 20 12:10:30 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13071 Thread(4146064240) Resource(opscenter_app): reached OnlineRetryLimit(0).
Aug 20 12:10:32 bck01a Had[5441]: VCS ERROR V-16-1-54031 Resource opscenter_app (Owner: Unspecified, Group: opscenter-sg) is FAULTED on sys bck01a
Aug 20 12:10:34 bck01a Had[5441]: VCS ERROR V-16-1-10205 Group opscenter-sg is faulted on system bck01a
Aug 20 12:12:53 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13066 Thread(4146064240) Agent is calling clean for resource(opscenter_app) because the resource is not up even after online completed.
Aug 20 12:12:53 bck01a Had[5441]: VCS ERROR V-16-2-13066 (bck01a) Agent is calling clean for resource(opscenter_app) because the resource is not up even after online completed.
Aug 20 12:12:54 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13068 Thread(4146064240) Resource(opscenter_app) - clean completed successfully.
Aug 20 12:12:54 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13071 Thread(4146064240) Resource(opscenter_app): reached OnlineRetryLimit(0).
Aug 20 12:12:56 bck01a Had[5441]: VCS ERROR V-16-1-54031 Resource opscenter_app (Owner: Unspecified, Group: opscenter-sg) is FAULTED on sys bck01a
Aug 20 12:12:58 bck01a Had[5441]: VCS ERROR V-16-1-10205 Group opscenter-sg is faulted on system bck01a

Thanks for your help solving this issue.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

What are the exit codes set in the monitor script for opscenter_app resource ? If it is set to 0 (unsuccessful) & 1 (successful), it is incorrect, VCS doesn't understand those exit codes.

You should set 110 (successful) & 100 (unsuccessful) in monitor script.

 

G

View solution in original post

8 REPLIES 8

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

What are the exit codes set in the monitor script for opscenter_app resource ? If it is set to 0 (unsuccessful) & 1 (successful), it is incorrect, VCS doesn't understand those exit codes.

You should set 110 (successful) & 100 (unsuccessful) in monitor script.

 

G

Koven1
Level 4

Hi Gaurav

I see in ecadm start/stop script there're a lot of exit codes so you're telling me I should replace 100 instead 1 and 110 instead 0.

is there any way to change exit code in VCS? It will be easier for me because there're a lot of lines to change in ecadm script.

 

Thank you!

 

Gaurav_S
Moderator
Moderator
   VIP    Certified
Hello, Exit codes are hard coded to my knowledge and can't be changed, I am afraid you would need to change your script only.. Regarding code, 110 would indicate successful (usually exit 0), and 100 would indicate unsuccessful (exit 1) G

Koven1
Level 4

OMG Gaurav that's a lot of work, let me try some different configurations in that script but I'm not sure if I can put that application agent into Ops Center.

Last question is: How I'm 100% sure that exit codes is my problem? Symantec has some troubleshooting to try?

Thank you

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

I don't really believe its too much of work, all you need to do is find & replace the exit codes. If its a shell/perl script, you can simply grep the exit statements & use a "substitute" function in "sed" to replace the exit codes. OR else you can import the file into a good text editor like "notepad ++" & use the find & replace function to change exit codes. Once done, import back the script into server, move the existing script to a different name & use the modified script as monitor script.

To answer your second point, if VCS has to declare any resource as online, atleast 1 monitor cycle has to run successfully..  so that means if you online the application resource, online script will execute & post online, a monitor script will execute which will declare the resource online. In this case, online script is completing however monitoring is failing to declare resource online as exit codes are unknown to VCS.

Hope that answers

 

G

Koven1
Level 4

Hi Gaurav

I'm attaching the script i modified, I replaced exit codes ( 0 --> 110 & 1 --> 100) with no luck, I've a Monitor process called "launch" so this is the process that VCS should be monitoring and as you can see there's a cacao process but I don't know how to debug what VCS is doing.

root     13840     1  0 10:14 ?        00:00:00 /opt/sun/cacao2/private/bin/x64/launch -w /var/opt/sun/cacao2/instances/oem-ec -r /var/run/opt/sun/cacao2/instances/oem-ec/run/retries -R /var/run/opt/sun/cacao2/instances/oem-ec/run/cacao_v2.pid -s 1 -U root -G root -L 16384 -A /opt/sun/cacao2/private/bin/proc_analysis -W /var/opt/sun/cacao2/instances/oem-ec -T 300 -P /var/run/opt/sun/cacao2/instances/oem-ec/run/hb.pipe -i /etc/opt/sun/cacao2/instances/oem-ec/security/password -DPATH=/usr/java/x86_64/jdk1.7.0_45/bin:/bin:/usr/bin -DLD_LIBRARY_PATH=/opt/sun/cacao2/share/lib/shared -- /usr/java/x86_64/jdk1.7.0_45/bin/java -Xms200M -Xmx8192M -server -XX:StringTableSize=27001 -XX:PermSize=128m -XX:MaxPermSize=384m -Xss384k -XX:+UseParallelOldGC -XX:SoftRefLRUPolicyMSPerMB=10000 -XX:-UseCompressedOops -Dsun.security.pkcs11.enable-solaris=false -Djava.endorsed.dirs=/opt/sun/cacao2/share/lib/endorsed -Dxvmserver=false -classpath /opt/sun/jdmk/5.1/lib/jdmkrt.jar:/opt/sun/jdmk/5.1/lib/jmxremote_optional.jar:/opt/sun/cacao2/share/lib/cacao_cacao.jar:/opt/sun/cacao2/share/lib/cacao_j5core.jar:/opt/sun/cacao2/private/lib/bcprov-jdk14.jar -Djavax.management.builder.initial=com.sun.jdmk.JdmkMBeanServerBuilder -Dcacao.config.dir=/etc/opt/sun/cacao2/instances/oem-ec com.sun.cacao.container.impl.ContainerPrivate

I really don't know how to cluster this application because nothing is working yet.

Thanks again for your time.

 

Koven1
Level 4

Hi G,

 

Finally solved, it's a really big issue when you challenge with so many exit codes, but you gave me the key since your first post because my problem is exit codes, with no doubt and it was a matter of try and error.

 

Thank you very much Gaurav

Gaurav_S
Moderator
Moderator
   VIP    Certified

Glad to have helped :)

 

G