Forum Discussion

symsonu's avatar
symsonu
Level 6
10 years ago

False reporting by VCS Host Monitor

 

 

May 25 03:54:10 node-02 Had[8671]: [ID 702911 daemon.notice] VCS CRITICAL V-16-1-50086 CPU usage on node-02 is 91%
May 25 03:55:09 node-02 Had[8671]: [ID 702911 daemon.notice] VCS CRITICAL V-16-1-50086 CPU usage on node-02 is 95%
May 25 04:52:40 node-02 Had[8671]: [ID 702911 daemon.notice] VCS CRITICAL V-16-1-50086 CPU usage on node-02 is 97%



SAR report for the same day and same time period is saying that  cpu was idle and never went down to less that 10% idle.

SunOS node-02 5.10 Generic_150401-10 i86pc    05/25/2015

00:01:00    %usr    %sys    %wio   %idle
03:51:00      24       9       0      67
03:56:00      58      19       0      23
04:01:00      38      14       0      47
04:06:00      39      13       0      48
04:11:00      45       9       0      46
04:16:00      43      13       0      44
04:21:00      25       7       0      68
04:26:00      26      11       0      63
04:31:00      25       8       0      67
04:36:00      18      10       0      72
04:41:00      30       7       0      63
04:46:00      26      11       0      63
04:51:00      14       5       0      80
04:56:00      58      18       0      25

 

Can you please let us know if there is any bug for VCS Host monitor

 

52 fffffffff79ba000  574b8 264   1  vxdmp (VxVM 5.1SP1RP3 DMP Driver)
 98 fffffffff7dbb000 319a08 265   1  vxio (VxVM 5.1SP1RP3 I/O driver)
 99 fffffffff80ac000   c500 268   1  fdd (VxQIO 5.1SP1RP3HF2 Quick I/O dr)
100 fffffffff80b9000 22b278  21   1  vxfs (VxFS 5.1SP1RP3HF2 SunOS 5.10)
102 fffffffff82e2000   1350 266   1  vxspec (VxVM 5.1SP1RP3 control/status d)
267 fffffffff7db0418    d10 267   1  vxportal (VxFS 5.1SP1RP3HF2 portal driver)
270 fffffffff9397000  64ac0 271   1  vxfen (VRTS Fence 5.1SP1RP3)
271 fffffffff7a09000  27f40 272   1  vxglm (VxGLM 5.1_SP1RP2P1 SunOS 5.10)
272 fffffffff897c000   4550 273   1  vxgms (VxGMS 5.1.0.0,REV=13Sep2009 (So)

  • Sorry, should have seen that you systems are 5.1 and vcsstatlog is only from 6.1 onwards, but theory is still the same - VCS looks at stats more frequently than 5 mins (I think 30 seconds in 5.1) so VCS will spot peaks in CPU usage over short periods which are averaged out in 5 mins which you see in sar.

    So if this is happening a lot, then run a sar at 30 second intervals so you can verify it agrees with VCS (note as it will hard to get the same starting point as VCS, the values may differ slightly, but should be roughly the same)

    Mike

  • How CPU is calculated is discussed here:

    https://www-secure.symantec.com/connect/forums/vcs-601-linux-how-cpu-usage-calculated

    This mentions the CPU is monitored every monitor for VCShm resource which is 30 seconds by default, but looking at my 6.1 system this is 120:

    # hares -value VCShm MonitorInterval
    120

    And looking at how often the .vcs_host_stats.data file gets updated in /var/VRTSvcs/stats, this looks to be correct as I can see the file is updated every 2 minutes.

    So VCS is looking at CPU average over 2 mins (and this maybe 30 seconds for older versions of VCS), so will be different your sar output which is every 5 mins.

    You are supposed to be able to dump the stats database using /opt/VRTSvcs/bin/vcsstatlog, but I can't get this to work following instructions in VCS admin guide in section "Verifying the metered or forecasted values for CPU, Mem, and Swap", so if you can get this to generate any meaningful data, let me know how you did it.

    Mike

  • Ok - I have worked out how vcsstatlog works:

    The first bit is easy:

    Make a copy of stats database as looks like you can't dump stats from a live database and the dump database to a file:

    cd /var/VRTSvcs/stats
    mkdir db
    cp .vcs_host_stats.* db
    /opt/VRTSvcs/bin/vcsstatlog --dump /var/VRTSvcs/stats/db/.vcs_host_stats >  /var/VRTSvcs/stats/db/stats.csv

     

    I had figured this bit out in last post, but couldn't interpret the data for CPU.  For memory, the figure (in last column) is the free memory in MB, so straight forward.  For CPU the figure is the number of free clock cycles.

    So for example we have a T3 where "prtdiag -v "shows 128 CPUs each with a clock speed of 1649 MHz and therfore the total number of CPU cycles (in MHz) is 1649 x 128 = 211072.

    The time in the output is epoch time (seconds since 1970) so you need to translate this for which you can use "date" on a linux system (but Solaris 10 does not support using date command to convert to and from epoch time)

    So for example in the output from "vcsstatlog --dump" I have thefollowing line:

    1430834550,ForecastCPU,208961

    So the date is:

    Linuxsys # date -d @1430834550
    Tue May  5 15:02:30 BST 2015

    (to convert back the other way use: date -d "2015/05/26  15:02:30" +%s

    And the percentage CPU is:

    208961/211072 = 0.99

    And this agrees with the 99% shown in sar

    Mike

     

  • Thank you Mike for the detailed description

    There is no directoty on any of the two nodes

    # cd /var/VRTSvcs/stats
    /var/VRTSvcs/stats: does not exist

     

    Regards

    S.

  • Sorry, should have seen that you systems are 5.1 and vcsstatlog is only from 6.1 onwards, but theory is still the same - VCS looks at stats more frequently than 5 mins (I think 30 seconds in 5.1) so VCS will spot peaks in CPU usage over short periods which are averaged out in 5 mins which you see in sar.

    So if this is happening a lot, then run a sar at 30 second intervals so you can verify it agrees with VCS (note as it will hard to get the same starting point as VCS, the values may differ slightly, but should be roughly the same)

    Mike