08-20-2014 10:29 AM
Hi
i just want to know why there're some problems when I'm trying to put into my cluster an Ops Center (Oracle not Netbackup).
This is the configuration
group opscenter-sg ( SystemList = { bck01a = 0, bck01b = 1 } Enabled @bck01b = 0 AutoStartList = { bck01b } OnlineRetryLimit = 2 OnlineRetryInterval = 180 ) Application opscenter_app ( StartProgram = "/opt/sun/xvmoc/bin/ecadm start" StopProgram = "/opt/sun/xvmoc/bin/ecadm stop -w" MonitorProcesses = { OCLISTENER } ) DiskReservation reservation_dev_sdd ( Enabled = 0 Disks = { "/dev/opscenter" } ) IP ops-app-ip ( Device = "bond0.620" Address = "10.10.102.104" NetMask = "255.255.255.128" ) IP ops-dmz-ip ( Device = "bond0.622" Address = "10.10.102.244" NetMask = "255.255.255.192" ) IP ops-ges-ip ( Device = "bond0.610" Address = "10.10.101.104" NetMask = "255.255.255.0" ) IP ops-int-ip ( Device = "bond0.621" Address = "10.10.102.184" NetMask = "255.255.255.192" ) Mount opscenter-mount ( MountPoint = "/var/opt/sun" BlockDevice = "/dev/opscenter" FSType = ext4 MountOpt = rw FsckOpt = "-y" ) Proxy proxy-bck-app-nic ( TargetResName = bck-app-nic ) Proxy proxy-bck-dmz-nic ( TargetResName = bck-dmz-nic ) Proxy proxy-bck-ges-nic ( TargetResName = bck-ges-nic ) Proxy proxy-bck-int-nic ( TargetResName = bck-int-nic ) ops-app-ip requires proxy-bck-app-nic ops-dmz-ip requires proxy-bck-dmz-nic ops-ges-ip requires proxy-bck-ges-nic ops-int-ip requires proxy-bck-int-nic opscenter-mount requires ops-app-ip opscenter-mount requires ops-dmz-ip opscenter-mount requires ops-ges-ip opscenter-mount requires ops-int-ip opscenter_app requires opscenter-mount
This is the error I get all the time and I've tried setting OnLineRetryLimit and OnlineRetryInterval with no luck I'm getting always the same error. The only one resource doesn't get online is opscenter-app, IP, Mount are working fine
Aug 20 12:10:30 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13068 Thread(4146064240) Resource(opscenter_app) - clean completed successfully. Aug 20 12:10:30 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13071 Thread(4146064240) Resource(opscenter_app): reached OnlineRetryLimit(0). Aug 20 12:10:32 bck01a Had[5441]: VCS ERROR V-16-1-54031 Resource opscenter_app (Owner: Unspecified, Group: opscenter-sg) is FAULTED on sys bck01a Aug 20 12:10:34 bck01a Had[5441]: VCS ERROR V-16-1-10205 Group opscenter-sg is faulted on system bck01a Aug 20 12:12:53 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13066 Thread(4146064240) Agent is calling clean for resource(opscenter_app) because the resource is not up even after online completed. Aug 20 12:12:53 bck01a Had[5441]: VCS ERROR V-16-2-13066 (bck01a) Agent is calling clean for resource(opscenter_app) because the resource is not up even after online completed. Aug 20 12:12:54 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13068 Thread(4146064240) Resource(opscenter_app) - clean completed successfully. Aug 20 12:12:54 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13071 Thread(4146064240) Resource(opscenter_app): reached OnlineRetryLimit(0). Aug 20 12:12:56 bck01a Had[5441]: VCS ERROR V-16-1-54031 Resource opscenter_app (Owner: Unspecified, Group: opscenter-sg) is FAULTED on sys bck01a Aug 20 12:12:58 bck01a Had[5441]: VCS ERROR V-16-1-10205 Group opscenter-sg is faulted on system bck01a
Thanks for your help solving this issue.
Solved! Go to Solution.
08-20-2014 08:33 PM
Hi,
What are the exit codes set in the monitor script for opscenter_app resource ? If it is set to 0 (unsuccessful) & 1 (successful), it is incorrect, VCS doesn't understand those exit codes.
You should set 110 (successful) & 100 (unsuccessful) in monitor script.
G
08-20-2014 08:33 PM
Hi,
What are the exit codes set in the monitor script for opscenter_app resource ? If it is set to 0 (unsuccessful) & 1 (successful), it is incorrect, VCS doesn't understand those exit codes.
You should set 110 (successful) & 100 (unsuccessful) in monitor script.
G
08-21-2014 05:46 AM
Hi Gaurav
I see in ecadm start/stop script there're a lot of exit codes so you're telling me I should replace 100 instead 1 and 110 instead 0.
is there any way to change exit code in VCS? It will be easier for me because there're a lot of lines to change in ecadm script.
Thank you!
08-21-2014 08:03 AM
08-21-2014 12:42 PM
OMG Gaurav that's a lot of work, let me try some different configurations in that script but I'm not sure if I can put that application agent into Ops Center.
Last question is: How I'm 100% sure that exit codes is my problem? Symantec has some troubleshooting to try?
Thank you
08-21-2014 08:11 PM
Hi,
I don't really believe its too much of work, all you need to do is find & replace the exit codes. If its a shell/perl script, you can simply grep the exit statements & use a "substitute" function in "sed" to replace the exit codes. OR else you can import the file into a good text editor like "notepad ++" & use the find & replace function to change exit codes. Once done, import back the script into server, move the existing script to a different name & use the modified script as monitor script.
To answer your second point, if VCS has to declare any resource as online, atleast 1 monitor cycle has to run successfully.. so that means if you online the application resource, online script will execute & post online, a monitor script will execute which will declare the resource online. In this case, online script is completing however monitoring is failing to declare resource online as exit codes are unknown to VCS.
Hope that answers
G
08-22-2014 08:30 AM
Hi Gaurav
I'm attaching the script i modified, I replaced exit codes ( 0 --> 110 & 1 --> 100) with no luck, I've a Monitor process called "launch" so this is the process that VCS should be monitoring and as you can see there's a cacao process but I don't know how to debug what VCS is doing.
root 13840 1 0 10:14 ? 00:00:00 /opt/sun/cacao2/private/bin/x64/launch -w /var/opt/sun/cacao2/instances/oem-ec -r /var/run/opt/sun/cacao2/instances/oem-ec/run/retries -R /var/run/opt/sun/cacao2/instances/oem-ec/run/cacao_v2.pid -s 1 -U root -G root -L 16384 -A /opt/sun/cacao2/private/bin/proc_analysis -W /var/opt/sun/cacao2/instances/oem-ec -T 300 -P /var/run/opt/sun/cacao2/instances/oem-ec/run/hb.pipe -i /etc/opt/sun/cacao2/instances/oem-ec/security/password -DPATH=/usr/java/x86_64/jdk1.7.0_45/bin:/bin:/usr/bin -DLD_LIBRARY_PATH=/opt/sun/cacao2/share/lib/shared -- /usr/java/x86_64/jdk1.7.0_45/bin/java -Xms200M -Xmx8192M -server -XX:StringTableSize=27001 -XX:PermSize=128m -XX:MaxPermSize=384m -Xss384k -XX:+UseParallelOldGC -XX:SoftRefLRUPolicyMSPerMB=10000 -XX:-UseCompressedOops -Dsun.security.pkcs11.enable-solaris=false -Djava.endorsed.dirs=/opt/sun/cacao2/share/lib/endorsed -Dxvmserver=false -classpath /opt/sun/jdmk/5.1/lib/jdmkrt.jar:/opt/sun/jdmk/5.1/lib/jmxremote_optional.jar:/opt/sun/cacao2/share/lib/cacao_cacao.jar:/opt/sun/cacao2/share/lib/cacao_j5core.jar:/opt/sun/cacao2/private/lib/bcprov-jdk14.jar -Djavax.management.builder.initial=com.sun.jdmk.JdmkMBeanServerBuilder -Dcacao.config.dir=/etc/opt/sun/cacao2/instances/oem-ec com.sun.cacao.container.impl.ContainerPrivate
I really don't know how to cluster this application because nothing is working yet.
Thanks again for your time.
08-22-2014 03:32 PM
Hi G,
Finally solved, it's a really big issue when you challenge with so many exit codes, but you gave me the key since your first post because my problem is exit codes, with no doubt and it was a matter of try and error.
Thank you very much Gaurav
08-23-2014 10:15 AM
Glad to have helped :)
G