Deb_Techie
12 years agoLevel 4
NetBackup Environment Minitoring Script
I am having 10 Master server installed in Linux, Solaris and Windows, and have monitoring tasks of all those master servers in every 8 hours. And to be honest it’s very difficult to monitor all parameters in every master by logging in. So seeking help from you to make some automation on that so a script can do the Jobs on behalf of me. I am sending the monitoring parameters to get help
- Master servers NBU services status ( Admin GUI)
- Master Server backup jobs queue status ( is it Normal/Specify jobs count)
- Master servers long running jobs status
- Bulk backup jobs failures with Error codes (more than 5 failures with Same Error Code)
- All Masters Java console Functionality ( Polios, Media manager, Volume Pools & Activity Monitor Tabs)
- Scratch tape Counts (In all the master Scratch pool )
- Tape drives status (Check in Media Manger GUI / Tape drives alert Mails)
- Dedup Master Disk Pools Status ( From GUI & Disk pool status alerts Mail)
- Media server status ( Manual check In Devises manger GUI)
- All physical library status (Library Web Console)
- Critical Alert mail, is acknowledged them( yes /no) and action taken
- "Enterprise Vault - Disk space ( correct actions taken in accordance of disk size) , Services checks on Media servers (actions taken if any) , Drives status (actions taken if any),
- Backups checks , Backups count in last xxx hrs "
- Network Issues
- Consecutive backup failures from 3 days or more
This really is way out of scope for this forum, you are looking at consulting levels here realy. Some of the things you ask are a faily major task on their on, eg, long running jobs
Below are the commands for each task I would start with. Most tasks below would require some fairly major scripting to get the format/ output you want.
Master servers NBU services status ( Admin GUI)bpps -x commandMaster Server backup jobs queue status ( is it Normal/Specify jobs count)bpdbjobs commandMaster servers long running jobs statusSearch forum, this has been asked before and scripts are available I thinkBulk backup jobs failures with Error codes (more than 5 failures with Same Error Code)bperror command / /usr/openv/db/jobs/trylogs logs / bpdbjobs commandScratch tape Counts (In all the master Scratch pool )availablemedia script (netbackup/bin/support/goodies dir I think)Tape drives status (Check in Media Manger GUI / Tape drives alert Mails)vmoprcmd commandDedup Master Disk Pools Status ( From GUI & Disk pool status alerts Mail)nbdevquery commandMedia server status ( Manual check In Devises manger GUI)nbemmcmd -listhosts -verbose command (shows if available for disk and/or tape)All physical library status (Library Web Console)You could run robtest from the RCH and check for a responseecho 's s' |tldtest -r <path to robot>Backups checks , Backups count in last xxx hrs "bpdbjobs command I would imageConsecutive backup failures from 3 days or morebpdbjobs command