cancel
Showing results for 
Search instead for 
Did you mean: 

perl monitoring script times out and builds up if run via crontab

unkn0wnn
Level 4

Hi there,

I need your help because usual call home monitoring script no longer works, my checks showed that issue loading one of ipmi_si module as I have no ipmi device currently enabled/installed,

sweetalabama:/home/maintenance # ps -ef|egrep perl
root 27614 1 8 12:09 pts/2 00:00:00 /opt/NBUAppliance/bin/perl -I /opt/NBUAppliance/scripts /opt/NBUAppliance/scripts/hwmon/scripts/disk_perf_check.pl
root 27618 1 0 12:09 pts/2 00:00:00 sh -c /opt/VRTSperl/bin/perl /opt/NBUAppliance/scripts/hwmon/monitoring.pl sweetalabama --cron --dontdisplay --outputXml --epoch 1657710577 --item all --object all --xmlFile /tmp/an8eooa28D.xml --skipXhost < /tmp/infile_YJm7DX > /tmp/outfile_qbvwul 2> /tmp/errfile_AoE6CH
root 27620 27618 7 12:09 pts/2 00:00:00 /opt/VRTSperl/bin/perl /opt/NBUAppliance/scripts/hwmon/monitoring.pl sweetalabama --cron --dontdisplay --outputXml --epoch 1657710577 --item all --object all --xmlFile /tmp/an8eooa28D.xml --skipXhost
root 27779 21119 0 12:09 pts/2 00:00:00 /bin/grep -E perl
stuck on:
root 32487 32407 0 12:30 pts/2 00:00:00 sh -c /sbin/modprobe ipmi_si; /sbin/modprobe ipmi_devintf; /sbin
root 32489 32487 0 12:30 pts/2 00:00:00 /sbin/modprobe ipmi_si

then kernel reports the below as it can't find no ipmi device as it simply does not exist:

Jan 12 05:17:10 sweetalabama kernel: ipmi_si: Trying "smic" at I/O port 0xca9
Jan 12 05:17:10 sweetalabama kernel: ipmi_si: Trying "bt" at I/O port 0xe4
Jan 12 05:17:10 sweetalabama kernel: ipmi_si: Unable to find any System Interface(s)
Jan 12 05:17:10 sweetalabama kernel: IPMI System Interface driver.
Jan 12 05:17:10 sweetalabama kernel: ipmi_si: Trying "kcs" at I/O port 0xca2

Any chance if you could guide me so that monitoring.pl or any other script works without checking for ipmi?

I have found something in monitoring.pl script here:

# This needs to be run before initialize as we need to get the config object
checkOptions();
# This is a special option which will not need ipmi to be initialized
# and will not be allowed to pass with any other option

5 REPLIES 5

davidmoline
Level 6
Employee

Hi @unkn0wnn 

Your best option is to log a support call with Veritas to investigate. I would not recommend modifying the scripts yourself. 

David

I would rather write my own script as my support has expired.

There must be a way to just ignore any ipmi modules. hmmm

Hi @unkn0wnn 

Okay I see. A question about the IPMI - does it still function and have you tried to reset it?

The reset can be done from the appliance CLISH (Main Menu -> Support -> IPMI Reset) or possibly from the IPMI WebUI itself. The reset will not affect the running appliance. You could also try (if able to power off the appliance and IPMI (to do the later you would have to remove all power from the appliance, not just switch off). 

If all this fails, then just follow the script logic to see where it does the IPMI check and comment this out (I don't have an appliance handy to look, nor the inclination as this is not something we would recommend.

Good luck
David

unkn0wnn
Level 4

Thanks, so I can't find option to reset it in Main menu so I've tried to modprobe reload all 3 modules and still problem as no device exists in /proc/devs etc., Its getting stuck on hardware check, I wish I could just comment out but scripts are very complicated not user friendly, they are complicated scripts and have to be run with caution.

I worry that the development team had not introduced special locks etc. Monitoring scripts provided should end their instances automatically even after some time and instead perl processes build up causing my appliance to be nearly unresponsive, load checked and 70~ units, after terminating all perl instances back to normal, they also start when logged in via WebGui and go to hardware monitoring.

Further to above I think that all I need is a script to monitor hardware so I know when disk is going to misbehave, heavy temp, battery, pci stuff, I do not you write it for me but would be useful to have some handy commands that I can run and query hardware stuff like iostate etc.

Hi @unkn0wnn 

I think you are on your own. The provided monitoring scripts work fine for eveyone else. All the commands are available to view from within the monitoring scrcipts - I acknowledge that it is not the simplest process to decipher, but it is not something Veritas is expecting customer to review.

If the IPMI is indeed broken, then what eles is about to fail? I would suggest you talk to your managment and think about updating the hardware or putting the appliance back under support (if this is possible). 

David