Forum Discussion

Zahid_Haseeb's avatar
Zahid_Haseeb
Moderator
10 years ago
Solved

Echo message (Request/Response) Monitor

Environment

OS = Linux 6x/7x

SFHA = Suppose latest i.e 6x

Query:

We have a two nodes local HA. A customized application running on it(HA) fine. The customized application send a tcp echo response (800) on the tcp echo request of (810) from a machine exist on LAN. I want to monitor this echo cycle under HA.

  • Try the following:

    MAX_TIME=300 # 5 mins
    LOGFILE=/tmp/testlog
    ARG=$2
    LOCKFILE=/tmp/.$2.DO_NOT_REMOVE
    case "$1" in
    start)
        # START APP
        touch $LOCKFILE
        ;;
    stop)
        # Stop App
        rm $LOCKFILE
        ;;
    monitor)
        if [ -a $LOCKFILE ]
        then
            # Check date in LOGFILE is less than MAX_TIME ago
            logdate_str=`/usr/bin/tail -1 $LOGFILE | /bin/awk '
              {split($1,slash,"/");print slash[3]"/"slash[2]"/"slash[1]" "$2}'`
            logdate_epoch=`/bin/date -d "$logdate_str" +%s`
            today_epoch=`/bin/date +%s`
    
            (( sec_since_log=$today_epoch - $logdate_epoch ))
    
            if [[ $sec_since_log -gt $MAX_TIME ]]
            then
                # More than max_time so return offline
                rm $LOCKFILE
                exit 100
            else
                exit 110
            fi
        else
            exit 100
        fi ;;
    *)
        echo "Usage: $0 {start|stop|monitor} resource_name"
        exit 1
        ;;
    esac
    
    

     

    Mike

  • Try this:

    MAX_TIME=300 # 5 mins
    ARG=$2
    AWK=/bin/awk
    TAIL=/bin/tail
    DATE=/bin/date
    LOCKFILE=/tmp/.$2.DO_NOT_REMOVE
    case "$1" in
    start)
        # START APP
        touch $LOCKFILE
        ;;
    stop)
        # Stop App
        rm $LOCKFILE
        ;;
    monitor)
        if [ -a $LOCKFILE ]
        then
            LOGFILE=/tmp/Z35xlo.o`$DATE +%m%d`
            # Check date in LOGFILE is less than MAX_TIME ago
            logdate_str=`$TAIL -1 $LOGFILE | $AWK -F: '{print "20"$2":"$3":"$4}'`
            logdate_epoch=`$DATE -d "$logdate_str" +%s`
            today_epoch=`$DATE +%s`
    
            (( sec_since_log=$today_epoch - $logdate_epoch ))
            # echo "DEBUG: $today_epoch, $logdate_str, $logdate_epoch"
    
            if [[ $sec_since_log -gt $MAX_TIME ]]
            then
                # More than max_time so return offline
                rm $LOCKFILE
                exit 100
            else
                exit 110
            fi
        else
            exit 100
        fi ;;
    *)
        echo "Usage: $0 {start|stop|monitor} resource_name"
        exit 1
        ;;
    esac

     

    Mike

9 Replies

  • Create a script, say called tcp-echo.sh, with something like:

    ARG=$2
    LOCKFILE=/tmp/.{$2}.DO_NOT_REMOVE
    case "$1" in
    start)
        # START APP
        touch $LOCKFILE
        ;;
    stop)
        # Stop App
        rm $LOCKFILE
        ;;
    monitor)
        if [ -a $LOCKFILE ]
        then
        # DO TCP echo test
            if [ $? -eq 0 ]
            then
                exit 110
            else
                rm $LOCKFILE
            exit 100
        fi
        else
            exit 100
        fi ;;
    *) 
        echo "Usage: $0 {start|stop|monitor} arg
        exit 1
        ;;
    esac 

     

    and then use Application resource:

    Application tcp-echo1 (
      StartProgram  = "/opt/VRTSvcs/bin/tcp-echo.sh start tcp-echo1"
      StopProgram  = "/opt/VRTSvcs/bin/tcp-echo.sh stop tcp-echo1"
      MonitorProgram = "/opt/VRTSvcs/bin/tcp-echo.sh monitor tcp-echo1"
    )

     

    The lock file ensures you don't get concurrency violation as if you just monitor TCP echo, then all nodes in your cluster will report resource is online.

    Mike

  • Thanks Mike. A little change in plan.

    Last echo cycle (800/810) completion time will be written in to a log file by application. I want to write a cluster application script which read the time from log and compare it to current time. If the result exceeded 5 minutes then I want Cluster trigger fault on application script resource.

  • This is more of a scripting question than VCS, but if you post an example of last few lines of log file, I will take a look when I get time.  If other things are posted in log other than the time, please give examples and if the time log is not always the last line in the log, how many "non-time log" lines may there be at the end of the file.

    Mike

     

  • Try the following:

    MAX_TIME=300 # 5 mins
    LOGFILE=/tmp/testlog
    ARG=$2
    LOCKFILE=/tmp/.$2.DO_NOT_REMOVE
    case "$1" in
    start)
        # START APP
        touch $LOCKFILE
        ;;
    stop)
        # Stop App
        rm $LOCKFILE
        ;;
    monitor)
        if [ -a $LOCKFILE ]
        then
            # Check date in LOGFILE is less than MAX_TIME ago
            logdate_str=`/usr/bin/tail -1 $LOGFILE | /bin/awk '
              {split($1,slash,"/");print slash[3]"/"slash[2]"/"slash[1]" "$2}'`
            logdate_epoch=`/bin/date -d "$logdate_str" +%s`
            today_epoch=`/bin/date +%s`
    
            (( sec_since_log=$today_epoch - $logdate_epoch ))
    
            if [[ $sec_since_log -gt $MAX_TIME ]]
            then
                # More than max_time so return offline
                rm $LOCKFILE
                exit 100
            else
                exit 110
            fi
        else
            exit 100
        fi ;;
    *)
        echo "Usage: $0 {start|stop|monitor} resource_name"
        exit 1
        ;;
    esac
    
    

     

    Mike

  • Thanks MIKE. I tested. Its running awesome. But need your help a bit more. The log file where the time stamp will be updating after every 5 minutes need to create(log file) which required a change in Application.

     

    I discussed with Application dev team and they told me that time stamp already updating in to an existing log file and no ned to create a new log file.

    Details of existing log file name:

    A new log file created on daily basis with the name "Z35xlo.o0712". Z35xlo.o is fixed and 0712 is mmdd(Month and Day).

    Details of existing log file data:

    Z35172.16.24.6.o0620:15/06/20 16:36:11:688 |DBG | TCPCommunication::initializeHeartBeat    Heart Beat Msg Const [ARE_YOU_ALIVE]
    Z35172.16.24.6.o0620:15/06/20 16:42:06:213 |DBG | TCPCommunication::initializeHeartBeat    Heart Beat Msg Const [ARE_YOU_ALIVE]

    ----x---------------x-----------------------x-----------------x------------------x---------------x--------------x------------x

    Query:

    So we need to grep the time stamp from end/bottom of that log file (Z35xlo.ommdd) and compare that time with the current time.

    Z35xlo.ommdd file size = 50MB

  • Try this:

    MAX_TIME=300 # 5 mins
    ARG=$2
    AWK=/bin/awk
    TAIL=/bin/tail
    DATE=/bin/date
    LOCKFILE=/tmp/.$2.DO_NOT_REMOVE
    case "$1" in
    start)
        # START APP
        touch $LOCKFILE
        ;;
    stop)
        # Stop App
        rm $LOCKFILE
        ;;
    monitor)
        if [ -a $LOCKFILE ]
        then
            LOGFILE=/tmp/Z35xlo.o`$DATE +%m%d`
            # Check date in LOGFILE is less than MAX_TIME ago
            logdate_str=`$TAIL -1 $LOGFILE | $AWK -F: '{print "20"$2":"$3":"$4}'`
            logdate_epoch=`$DATE -d "$logdate_str" +%s`
            today_epoch=`$DATE +%s`
    
            (( sec_since_log=$today_epoch - $logdate_epoch ))
            # echo "DEBUG: $today_epoch, $logdate_str, $logdate_epoch"
    
            if [[ $sec_since_log -gt $MAX_TIME ]]
            then
                # More than max_time so return offline
                rm $LOCKFILE
                exit 100
            else
                exit 110
            fi
        else
            exit 100
        fi ;;
    *)
        echo "Usage: $0 {start|stop|monitor} resource_name"
        exit 1
        ;;
    esac

     

    Mike

  • Nice effort Mike. I am really thank ful to you.

     

    /usr/bin/tail need to be done instead of /bin/tail at my lab.