Forum Discussion

Koven1's avatar
Koven1
Level 4
11 years ago

Problem with Application Agent in VCS

Hi

i just want to know why there're some problems when I'm trying to put into my cluster an Ops Center (Oracle not Netbackup).

This is the configuration

group opscenter-sg (
        SystemList = { bck01a = 0, bck01b = 1 }
        Enabled @bck01b = 0
        AutoStartList = { bck01b }
        OnlineRetryLimit = 2
        OnlineRetryInterval = 180
        )

        Application opscenter_app (
                StartProgram = "/opt/sun/xvmoc/bin/ecadm start"
                StopProgram = "/opt/sun/xvmoc/bin/ecadm stop -w"
                MonitorProcesses = { OCLISTENER }
                )

        DiskReservation reservation_dev_sdd (
                Enabled = 0
                Disks = { "/dev/opscenter" }
                )

        IP ops-app-ip (
                Device = "bond0.620"
                Address = "10.10.102.104"
                NetMask = "255.255.255.128"
                )

        IP ops-dmz-ip (
                Device = "bond0.622"
                Address = "10.10.102.244"
                NetMask = "255.255.255.192"
                )

        IP ops-ges-ip (
                Device = "bond0.610"
                Address = "10.10.101.104"
                NetMask = "255.255.255.0"
                )

        IP ops-int-ip (
                Device = "bond0.621"
                Address = "10.10.102.184"
                NetMask = "255.255.255.192"
                )

        Mount opscenter-mount (
                MountPoint = "/var/opt/sun"
                BlockDevice = "/dev/opscenter"
                FSType = ext4
                MountOpt = rw
                FsckOpt = "-y"
                )

        Proxy proxy-bck-app-nic (
                TargetResName = bck-app-nic
                )

        Proxy proxy-bck-dmz-nic (
                TargetResName = bck-dmz-nic
                )

        Proxy proxy-bck-ges-nic (
                TargetResName = bck-ges-nic
                )

        Proxy proxy-bck-int-nic (
                TargetResName = bck-int-nic
                )

        ops-app-ip requires proxy-bck-app-nic
        ops-dmz-ip requires proxy-bck-dmz-nic
        ops-ges-ip requires proxy-bck-ges-nic
        ops-int-ip requires proxy-bck-int-nic
        opscenter-mount requires ops-app-ip
        opscenter-mount requires ops-dmz-ip
        opscenter-mount requires ops-ges-ip
        opscenter-mount requires ops-int-ip
        opscenter_app requires opscenter-mount

This is the error I get all the time and I've tried setting OnLineRetryLimit and OnlineRetryInterval with no luck I'm getting always the same error. The only one resource doesn't get online is opscenter-app, IP, Mount are working fine

 

Aug 20 12:10:30 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13068 Thread(4146064240) Resource(opscenter_app) - clean completed successfully.
Aug 20 12:10:30 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13071 Thread(4146064240) Resource(opscenter_app): reached OnlineRetryLimit(0).
Aug 20 12:10:32 bck01a Had[5441]: VCS ERROR V-16-1-54031 Resource opscenter_app (Owner: Unspecified, Group: opscenter-sg) is FAULTED on sys bck01a
Aug 20 12:10:34 bck01a Had[5441]: VCS ERROR V-16-1-10205 Group opscenter-sg is faulted on system bck01a
Aug 20 12:12:53 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13066 Thread(4146064240) Agent is calling clean for resource(opscenter_app) because the resource is not up even after online completed.
Aug 20 12:12:53 bck01a Had[5441]: VCS ERROR V-16-2-13066 (bck01a) Agent is calling clean for resource(opscenter_app) because the resource is not up even after online completed.
Aug 20 12:12:54 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13068 Thread(4146064240) Resource(opscenter_app) - clean completed successfully.
Aug 20 12:12:54 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13071 Thread(4146064240) Resource(opscenter_app): reached OnlineRetryLimit(0).
Aug 20 12:12:56 bck01a Had[5441]: VCS ERROR V-16-1-54031 Resource opscenter_app (Owner: Unspecified, Group: opscenter-sg) is FAULTED on sys bck01a
Aug 20 12:12:58 bck01a Had[5441]: VCS ERROR V-16-1-10205 Group opscenter-sg is faulted on system bck01a

Thanks for your help solving this issue.

 

 

  • Hi,

    What are the exit codes set in the monitor script for opscenter_app resource ? If it is set to 0 (unsuccessful) & 1 (successful), it is incorrect, VCS doesn't understand those exit codes.

    You should set 110 (successful) & 100 (unsuccessful) in monitor script.

     

    G

  • Hi,

    What are the exit codes set in the monitor script for opscenter_app resource ? If it is set to 0 (unsuccessful) & 1 (successful), it is incorrect, VCS doesn't understand those exit codes.

    You should set 110 (successful) & 100 (unsuccessful) in monitor script.

     

    G

  • Hi Gaurav

    I see in ecadm start/stop script there're a lot of exit codes so you're telling me I should replace 100 instead 1 and 110 instead 0.

    is there any way to change exit code in VCS? It will be easier for me because there're a lot of lines to change in ecadm script.

     

    Thank you!

     

  • Hello, Exit codes are hard coded to my knowledge and can't be changed, I am afraid you would need to change your script only.. Regarding code, 110 would indicate successful (usually exit 0), and 100 would indicate unsuccessful (exit 1) G
  • OMG Gaurav that's a lot of work, let me try some different configurations in that script but I'm not sure if I can put that application agent into Ops Center.

    Last question is: How I'm 100% sure that exit codes is my problem? Symantec has some troubleshooting to try?

    Thank you

  • Hi,

    I don't really believe its too much of work, all you need to do is find & replace the exit codes. If its a shell/perl script, you can simply grep the exit statements & use a "substitute" function in "sed" to replace the exit codes. OR else you can import the file into a good text editor like "notepad ++" & use the find & replace function to change exit codes. Once done, import back the script into server, move the existing script to a different name & use the modified script as monitor script.

    To answer your second point, if VCS has to declare any resource as online, atleast 1 monitor cycle has to run successfully..  so that means if you online the application resource, online script will execute & post online, a monitor script will execute which will declare the resource online. In this case, online script is completing however monitoring is failing to declare resource online as exit codes are unknown to VCS.

    Hope that answers

     

    G

  • Hi Gaurav

    I'm attaching the script i modified, I replaced exit codes ( 0 --> 110 & 1 --> 100) with no luck, I've a Monitor process called "launch" so this is the process that VCS should be monitoring and as you can see there's a cacao process but I don't know how to debug what VCS is doing.

    root     13840     1  0 10:14 ?        00:00:00 /opt/sun/cacao2/private/bin/x64/launch -w /var/opt/sun/cacao2/instances/oem-ec -r /var/run/opt/sun/cacao2/instances/oem-ec/run/retries -R /var/run/opt/sun/cacao2/instances/oem-ec/run/cacao_v2.pid -s 1 -U root -G root -L 16384 -A /opt/sun/cacao2/private/bin/proc_analysis -W /var/opt/sun/cacao2/instances/oem-ec -T 300 -P /var/run/opt/sun/cacao2/instances/oem-ec/run/hb.pipe -i /etc/opt/sun/cacao2/instances/oem-ec/security/password -DPATH=/usr/java/x86_64/jdk1.7.0_45/bin:/bin:/usr/bin -DLD_LIBRARY_PATH=/opt/sun/cacao2/share/lib/shared -- /usr/java/x86_64/jdk1.7.0_45/bin/java -Xms200M -Xmx8192M -server -XX:StringTableSize=27001 -XX:PermSize=128m -XX:MaxPermSize=384m -Xss384k -XX:+UseParallelOldGC -XX:SoftRefLRUPolicyMSPerMB=10000 -XX:-UseCompressedOops -Dsun.security.pkcs11.enable-solaris=false -Djava.endorsed.dirs=/opt/sun/cacao2/share/lib/endorsed -Dxvmserver=false -classpath /opt/sun/jdmk/5.1/lib/jdmkrt.jar:/opt/sun/jdmk/5.1/lib/jmxremote_optional.jar:/opt/sun/cacao2/share/lib/cacao_cacao.jar:/opt/sun/cacao2/share/lib/cacao_j5core.jar:/opt/sun/cacao2/private/lib/bcprov-jdk14.jar -Djavax.management.builder.initial=com.sun.jdmk.JdmkMBeanServerBuilder -Dcacao.config.dir=/etc/opt/sun/cacao2/instances/oem-ec com.sun.cacao.container.impl.ContainerPrivate

    I really don't know how to cluster this application because nothing is working yet.

    Thanks again for your time.

     

  • Hi G,

     

    Finally solved, it's a really big issue when you challenge with so many exit codes, but you gave me the key since your first post because my problem is exit codes, with no doubt and it was a matter of try and error.

     

    Thank you very much Gaurav