Forum Discussion

GeorgeC's avatar
GeorgeC
Level 4
15 years ago

Inactive node is reporting that my resources are failed.

System Details
SUN T5140 running Solaris 10, s10s_u7wos_08 SPARC
I'm running Veritas cluster file system HA, V5.1

I'm having a problem with my two node failover cluster.
I have a service group and it's resources running on nod1, however node2 (where they are not running) is reporting that the resouces have failed. I've a bit confused as to why this is happing. It seems that my monitor program is running on node2 when it shouldn;t be. I have few other questions as well.
Hopefully, this is a configuration/settings issue.

Here is a snippet from the /var/adm/messages file on node two. This output is being generated by my monitoring program, /usr/local/bin/slstatus, which is being called by the cluster. The same program is being ran on node 1, where the service group is running, and is working normally. If I fail the resource group over to node 2, then node 1 starts reporting that the resources are failed.
oot@net-log-02.ns.pitt.edu # tail -f /var/adm/messages
May 27 16:51:29 net-log-02.ns.pitt.edu SYSLOG-NG[22850]: [ID 702911 user.crit] Syslog process for eh-core-2 has failed
May 27 16:51:29 net-log-02.ns.pitt.edu SYSLOG-NG[22854]: [ID 702911 user.crit] Syslog process for fq-core-2 has failed
May 27 16:51:29 net-log-02.ns.pitt.edu SYSLOG-NG[22876]: [ID 702911 user.crit] Syslog process for fr-core-1 has failed
May 27 16:51:29 net-log-02.ns.pitt.edu SYSLOG-NG[22877]: [ID 702911 user.crit] Syslog process for bw-core-1 has failed
May 27 16:51:29 net-log-02.ns.pitt.edu SYSLOG-NG[22878]: [ID 702911 user.crit] Syslog process for gbg-core-1 has failed
May 27 16:51:29 net-log-02.ns.pitt.edu SYSLOG-NG[22879]: [ID 702911 user.crit] Syslog process for cl-core-1 has failed
May 27 16:51:29 net-log-02.ns.pitt.edu SYSLOG-NG[22880]: [ID 702911 user.crit] Syslog process for jhn-core-2 has failed
May 27 16:51:49 net-log-02.ns.pitt.edu xntpd[23386]: [ID 854739 daemon.info] synchronized to 136.142.5.75, stratum=2
May 27 16:51:47 net-log-02.ns.pitt.edu xntpd[23386]: [ID 774427 daemon.notice] time reset (step) -1.263488 s
May 27 16:51:47 net-log-02.ns.pitt.edu xntpd[23386]: [ID 204180 daemon.info] synchronisation lost
May 27 16:53:34 net-log-02.ns.pitt.edu SYSLOG-NG[23043]: [ID 702911 user.crit] Syslog process for rd-dev-core-514 has failed
May 27 16:55:52 net-log-02.ns.pitt.edu SYSLOG-NG[23221]: [ID 702911 user.crit] Syslog process for cl-core-2 has failed
May 27 16:56:25 net-log-02.ns.pitt.edu SYSLOG-NG[23284]: [ID 702911 user.crit] Syslog process for rd-wan3 has failed
May 27 16:56:26 net-log-02.ns.pitt.edu SYSLOG-NG[23330]: [ID 702911 user.crit] Syslog process for all-ios has failed
May 27 16:56:26 net-log-02.ns.pitt.edu SYSLOG-NG[23331]: [ID 702911 user.crit] Syslog process for rd-dev-core-1 has failed
May 27 16:56:26 net-log-02.ns.pitt.edu SYSLOG-NG[23332]: [ID 702911 user.crit] Syslog process for all-asa has failed
May 27 16:56:26 net-log-02.ns.pitt.edu SYSLOG-NG[23333]: [ID 702911 user.crit] Syslog process for ps-core-1 has failed
May 27 16:56:26 net-log-02.ns.pitt.edu SYSLOG-NG[23334]: [ID 702911 user.crit] Syslog process for rd-core-1 has failed
May 27 16:56:26 net-log-02.ns.pitt.edu SYSLOG-NG[23335]: [ID 702911 user.crit] Syslog process for sc-core-1 has failed
May 27 16:56:26 net-log-02.ns.pitt.edu SYSLOG-NG[23336]: [ID 702911 user.crit] Syslog process for bs795-core-1 has failed
May 27 16:56:26 net-log-02.ns.pitt.edu SYSLOG-NG[23337]: [ID 702911 user.crit] Syslog process for mc-core-1 has failed
May 27 16:56:26 net-log-02.ns.pitt.edu SYSLOG-NG[23338]: [ID 702911 user.crit] Syslog process for sc-core-2 has failed
May 27 16:56:27 net-log-02.ns.pitt.edu SYSLOG-NG[23345]: [ID 702911 user.crit] Syslog process for jhn-core-2 has failed
May 27 16:56:27 net-log-02.ns.pitt.edu SYSLOG-NG[23380]: [ID 702911 user.crit] Syslog process for fr-core-1 has failed
May 27 16:56:27 net-log-02.ns.pitt.edu SYSLOG-NG[23381]: [ID 702911 user.crit] Syslog process for gbg-core-1 has failed
May 27 16:56:27 net-log-02.ns.pitt.edu SYSLOG-NG[23382]: [ID 702911 user.crit] Syslog process for cl-core-1 has failed
May 27 16:56:27 net-log-02.ns.pitt.edu SYSLOG-NG[23383]: [ID 702911 user.crit] Syslog process for fq-core-2 has failed
May 27 16:56:27 net-log-02.ns.pitt.edu SYSLOG-NG[23385]: [ID 702911 user.crit] Syslog process for eh-core-2 has failed
May 27 16:56:27 net-log-02.ns.pitt.edu SYSLOG-NG[23384]: [ID 702911 user.crit] Syslog process for bw-core-1 has failed
May 27 16:56:27 net-log-02.ns.pitt.edu SYSLOG-NG[23387]: [ID 702911 user.crit] Syslog process for brd-core-1 has failed
May 27 16:57:10 net-log-02.ns.pitt.edu xntpd[23386]: [ID 854739 daemon.info] synchronized to 136.142.5.76, stratum=2
May 27 16:57:09 net-log-02.ns.pitt.edu xntpd[23386]: [ID 774427 daemon.notice] time reset (step) -1.053500 s
May 27 16:57:09 net-log-02.ns.pitt.edu xntpd[23386]: [ID 204180 daemon.info] synchronisation lost

 

  • It was explained to me that monitoring under VCS takes place on all nodes of the cluster. This is to check and guard against concurrency violations amoung other things. My script, which does it's own logging via the syslog facility was reporting that my resouces were offline on the inactive node (whihc is corret by the way, since it is a failover resouce that was running on the other node). The simple solution would be to disable logging from within my script and let VCS handle the alerts to /var/adm/messages

    Thank you one and all for you help and replies.
    George