NBU Monitoring

J__M__MEYER · ‎06-29-2005

Hi everyone,

Actually, I'm using the "Activity Monitor" to monitoring my all backups! But I don't find it complete enough! I'd like to see more informations, like the overview of all drives with backup and so on...
I've tried to make my own scripts too but it isn't easy!

I'm just interested of knowing which tools of Veritas Monitoring which you use? "Activity Monitor" or something else...? Do you like it?
What thinks the experienced's people concerning NBU Monitoring?
What would me advise?

Regards,
jmm

Stumpr2 · ‎06-29-2005

the Activity Monitor" works for me. And it's free!
I also run bpdbjobs to email failures.

Richard_Bannist · ‎06-29-2005

Hi Jerome. I use various scripts that i've written myself over the years. I'm certainly no scripting genius but some would probably say my scripts are simplistic as i do most scripting by manipulating output from commands/files as opposed to variables etc. But hey i'm no developer so my scripts are good enough for me!.

Here's an example of a script I have running via cron a few times a morning. If all backups are complete, we receive a short email stating that backups are complete, otherwise we get an email briefly detailing what backups are still running. This is my personal preference as all successful backup emails from NetBackup are routed to people's waste baskets and all failure emails are allowed thru; therefore we only get the odd 'backups are complete' email as opposed to zero emails when things have worked as you still need to know things have worked. Obviously if you get zero notification then who's to say that email hasn't stopped working for instance..

script itself - nothing really clever here -

#!/bin/ksh
#
# - first check if NetBackup is running or not.
#
if ; then echo "This email was generated by script that checks for overrunning Nightly NetBackups - '/home/richardb/NetBackups-overrun-or-not.sh'" | /usr/ucb/mail -s "NetBackup doesn't appear to be running - pls investigate ASAP - nthsunbackup @`date +%d/%m-%H:%M`" emailaddy@somewhere

else

# - if NetBackup is running, ONLY THEN produce report.
#
/usr/openv/netbackup/bin/admincmd/bpdbjobs -report |grep "CL"|grep -v Done|grep -v JobID|grep -v "DB Backup" > /tmp/blazing clear if ; then

echo "" > /tmp/over1run.txt
echo "" >> /tmp/over1run.txt
echo "\t\t\t`/usr/bin/date`" >> /tmp/over1run.txt
echo "" >> /tmp/over1run.txt
echo "\t\t\tOverrunning Overnight UNIX NetBackups" >> /tmp/over1run.txt
echo "\t\t\t=====================================" >> /tmp/over1run.txt cat /tmp/blazing|awk '{printf("%-6s %-6s %-30s %-25s %-1s %-1s %-1s\n", $1,$2,$3,$4,$6,$7,$8) ;}' >> /tmp/over1run.txt
echo "" >> /tmp/over1run.txt
echo "" >> /tmp/over1run.txt

rm -rf /tmp/blazing1

cat /tmp/blazing|awk '{print $1}' | while read line ; do /usr/openv/netbackup/bin/admincmd/bperror -jobid $line | grep L1 | awk '{print $6,$23,$25,$26,$27}' | tail -1 ; done >> /tmp/blazing1

awk '{print $1," ",$2,$3,$4,$5}' /tmp/blazing1 > /tmp/blazing1x

echo "" >> /tmp/blazing2

/usr/openv/volmgr/bin/vmoprcmd|grep -v STATUS|grep -v Comment|grep -v Drive > /tmp/blazing2

echo "** The JobID's above equate to the following tapes/drives --> **" >> /tmp/blazing2

cat /tmp/blazing2 /tmp/blazing1x >> /tmp/over1run.txt

cat /tmp/over1run.txt | /usr/ucb/mail -s "There are over-running Nightly NetBackups - nthsunbackup @`date +%d/%m-%H:%M`" emailaddy@somewhere

rm -rf /tmp/blazing /tmp/blazing1 /tmp/blazing1x /tmp/blazing2 /tmp/over0run.txt /tmp/over1run.txt

else
clear
echo "" > /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
echo "\t\t`/usr/bin/date`" >> /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
echo "\t\tAll Overnight UNIX NetBackups are complete" >> /tmp/over0run.txt
echo "\t\t==========================================" >> /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
cat /tmp/over0run.txt | /usr/ucb/mail -s "All Overnight UNIX NetBackups are complete - nthsunbackup @`date +%d/%m-%H:%M`" emailaddy@somewhere

rm -rf /tmp/blazing /tmp/blazing1 /tmp/blazing1x /tmp/blazing2 /tmp/over0run.txt /tmp/over1run.txt

fi
fi

Sample output from script when backups are overrunning our backup window -

Wednesday June 29 11:40:52 BST 2005

Overrunning Overnight UNIX NetBackups
=====================================
193884 Active CL_Adhoc_u02_sunbackup Quarterly 06/29/05 11:37:37 000:03:15

PENDING REQUESTS

Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId
0 hcart TLD - No - -
1 hcart TLD - No - -
2 hcart TLD - No - -
3 hcart TLD root Yes 5815L1 5815L1 Yes Yes 0
4 hcart TLD - No - -
5 hcart TLD - No - -

** The JobID's above equate to the following tapes/drives --> **
193884 5815L1 drive index 3

I have written all my own various scripts around standard NetBackup commands, such as full tape reporting, offsiting, downed tape drives, even a script that monitors for ALL drives being in use. I also wrote one to monitor for restores, ie when a restore is kicked off and a tape isn't present within the robot, we get to know about it..

Stumpr2 · ‎06-29-2005

Richard,
Don't sell yourself short concerning your scripting. Some forum members are new to NetBackup or have crippled themselves by only using the GUI's. Your scripts are valuable to this forum as well as your experience with NetBackup. Your posts are informative and helpful. Thank you for sharing.
Stumpr

Richard_Bannist · ‎06-29-2005

Thanks for the kind words Bob, i guess i do sell myself short a lot. I suppose it's because i am self-taught in the Unix world and am limited often by staying in a fairly small firm where new technology/software is often made to last forever. However we now have a new parent company so things are finally a'changing.

I work with people who eat Perl etc scripts for breakfast but my philosophy is always to keep it simple.

J__M__MEYER · ‎06-30-2005

Hi Richard

Thanks a lot, I appreciate your point of view about Monitoring.
I didn't know the function vmoprcmd which makes it possible to know the state of the drive one, nice command!
Now, I hesitate between three things.
First of all
Using some scripts that I'm written myself.
Second
To develop a program or a big script which makes it possible to have a tool of monitoring. (time cost)
The Last
To buy a tool (expensive cost)
Because what I seek - and what I did not find yet - it's to have an overview about the occupation of drives, to know exactly which backup used with which Tape on which drive and the duration time...
I think that I am the only one to want of such complicated things?
Jerome

Richard_Bannist · ‎06-30-2005

Jereme, you're in luck, i have the exact script for monitoring drives, it runs all evening and thru the night from 5pm thru 08:45, and is similar to the above script. I'm sure i've posted it before, so i'll do a search....

Each morning the results are gziped and saved to a unique file

Richard_Bannist · ‎06-30-2005

Took me ages to find it as tek-tips forum search doesn't appear to be the best in the world; i found it eventually -

http://www.tek-tips.com/viewthread.cfm?qid=949271

I'll paste it here also, but not sure if the formatting will come across correctly.

Tape drive usage script (this runs 5pm thru 8am) -

# rrb 270104 - script to check tape drive usage overnight # - 15min intervals

echo >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check
hostname >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check
date >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check
echo >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check
echo " ** Checking tape drives --> **" >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check
echo >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check
/usr/openv/volmgr/bin/vmoprcmd|grep -v " Drive"|grep -v ADDIT|tail -9 >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check
echo " ** Checking for any running backups --> **" >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check
echo >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check

rm -rf /tmp/nfnfek /tmp/nfnfek1 /tmp/nfnfek2 /tmp/nfnfek1x

/usr/openv/netbackup/bin/admincmd/bpdbjobs -report|grep "CL"|grep -v Done|grep -v JobID|grep -v "DB Backup" |awk '{printf("%-6s %-6s %-25s %-12s %-1s %-1s %-1s\n", $1,$2,$3,$4,$6,$7,$8) ;}' >> /tmp/nfnfek

cat /tmp/nfnfek >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check

cat /tmp/nfnfek|awk '{print $1}' | while read line ; do /usr/openv/netbackup/bin/admincmd/bperror -jobid $line | grep L1 | awk '{print $6,$23,$25,$26,$27}' | tail -1 ; done >> /tmp/nfnfek1

awk '{print $1," ",$2,$3,$4,$5}' /tmp/nfnfek1 > /tmp/nfnfek1x

echo "" >> /tmp/nfnfek2
echo "" >> /tmp/nfnfek2
echo " ** The JobID's above equate to the following tapes/drives --> **" >> /tmp/nfnfek2 echo "" >> /tmp/nfnfek2

cat /tmp/nfnfek2 /tmp/nfnfek1x >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check

echo >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check
echo "------------------------------------------" >> /home/richardb/script-output/vmoprcmd-checks/vmoprcmd-check

rm -rf /tmp/nfnfek /tmp/nfnfek1 /tmp/nfnfek2 /tmp/nfnfek1x

Sample output from script (each 15mins that the script runs it appends to the below file) -

nthsunbackup
Tue Apr 19 17:15:00 BST 2005

** Checking tape drives --> **

Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId
0 hcart TLD root Yes 5951L1 5951L1 Yes Yes 1
1 hcart TLD root Yes 5946L1 5946L1 Yes Yes 2
2 hcart TLD root Yes 5754L1 5754L1 Yes Yes 3
3 hcart TLD root Yes 5804L1 5804L1 Yes Yes 4
4 hcart TLD root Yes 5919L1 5919L1 Yes Yes 5
5 hcart TLD root Yes 5807L1 5807L1 Yes Yes 0

** Checking for any running backups --> **

177705 Active CL_ATSI_genprod DailyFull 04/19/05 17:07:12 000:08:04
177707 Active CL_GENEVA_genprod DailyFull 04/19/05 17:07:12 000:08:04
177711 Active CL_ORACLEDB_dim_dimnsns DailyFull 04/19/05 17:07:12 000:08:04
177712 Active CL_ORACLEDB_dms DailyFull 04/19/05 17:07:12 000:08:04
177713 Active CL_UNIX_aros DailyFull 04/19/05 17:07:12 000:08:04
177714 Active CL_UNIX_crmprod DailyFull 04/19/05 17:07:12 000:08:04
177715 Active CL_UNIX_crmweb1 DailyFull 04/19/05 17:07:12 000:08:04
177716 Active CL_UNIX_crmweb2 DailyFull 04/19/05 17:07:12 000:08:04
177717 Active CL_UNIX_dim DailyFull 04/19/05 17:07:12 000:08:04
177718 Active CL_UNIX_dms DailyFull 04/19/05 17:07:12 000:08:04
177719 Active CL_UNIX_empow2 DailyFull 04/19/05 17:07:12 000:08:04
177720 Active CL_UNIX_emprod DailyFull 04/19/05 17:07:12 000:08:04
177721 Active CL_UNIX_genprod DailyFull 04/19/05 17:07:12 000:08:04
177722 Active CL_UNIX_swprod DailyFull 04/19/05 17:07:13 000:08:03
177725 Active CL_ORACLEDB_dms SCH_ORACLEDB 04/19/05 17:12:59 000:02:17
177726 Active CL_ORACLEDB_dms SCH_ORACLEDB 04/19/05 17:13:00 000:02:16
177727 Active CL_ORACLEDB_dms SCH_ORACLEDB 04/19/05 17:13:04 000:02:12
177728 Active CL_ORACLEDB_dim_dimnsns SCH_ORACLEDB 04/19/05 17:14:19 000:00:57

** The JobID's above equate to the following tapes/drives --> **

177705 5807L1 drive index 5
177707 5951L1 drive index 0
177713 5754L1 drive index 2
177714 5754L1 drive index 2
177715 5754L1 drive index 2
177716 5754L1 drive index 2
177717 5754L1 drive index 2
177718 5754L1 drive index 2
177719 5754L1 drive index 2
177720 5754L1 drive index 2
177721 5754L1 drive index 2
177722 5804L1 drive index 3
177725 5919L1 drive index 4
177726 5919L1 drive index 4
177727 5919L1 drive index 4
177728 5919L1 drive index 4

------------------------------------------

2nd script that runs next morning at 9, ie after first script has finished looping thru the night, then renames the output file to a uniquely dated file and gzip up all other output files -

#!/bin/ksh

export mydir=/home/richardb/script-output/vmoprcmd-checks

#gzip all except the newest logfile.

gzip $mydir/vmoprcmd-check_*

mv $mydir/vmoprcmd-check $mydir/vmoprcmd-check_`date '+%d%m%Y'`

rm -rf /tmp/nfnfek /tmp/nfnfek1 /tmp/nfnfek1x /tmp/nfnfek2

ls -l $mydir/vmoprcmd-check_`date '+%d%m%Y'` | /usr/ucb/mail -s "NetBackup - Previous night's tape drive stats complete - nthsunbackup @`date +%d/%m-%H:%M`" emailaddy@somewhere

Cron entries to drive the 2 scripts -(timing the rename of output file was a pain so it was easiest to just cron a simple script to do this at very end)

0,15,30,45 0,1,2,3,4,5,6,7,8,17,18,19,20,21,22,23 * * * /home/richardb/vmoprcmd-check.sh > /dev/null 2>&1
0 9 * * * /home/richardb/vmoprcmd-rename-logs.sh > /dev/null 2>&1

The above was the quickest way i could automate the tracking of drive usage and keeping a history. I tend to knock up this kind of (simplistic) script then find it's still working adequately without hardly any change for years at a time.
I also have separate scripts that run all the time, checking for down drives/no spare drives/drives in odd status etc (so i'll be alerted straight away), the above scripts are just for a history of drive usage that i refer to as and when..

Rich

J__M__MEYER · ‎06-30-2005

It's great! It's what I ever seek!
I'll implement it and will test it.

Thanks a lot
Jerome

Richard_Bannist · ‎06-30-2005

great, i'm pleased that's what you wanted; i guess you'd only need to tweak it if you have different tape drives to monitor.

Rich

Richard_Bannist · ‎07-12-2005

Hi Jerome, how did you get on with the script ?

J__M__MEYER · ‎07-14-2005

Hi Richard,
Thanks for your message!
This scripts is right for some backups but the majorities don't receive the infos concerning the tape (I've changed the Name device in script). The most like messages from bperror are "no entity was found".
The result is like
Thu Jul 14 17:14:49 DFT 2005

** Checking tape drives --> **

0 rmt1 No -
1 rmt2 No netbackup
2 rmt3 No netbackup
3 rmt4 No netbackup
4 rmt6 No netbackup
5 rmt5 No netbackup
6 rmt7 No -
7 rmt8 No netbackup
** Checking for any running backups --> **

109946 Backup Active unix_test test 07/14/05 17:12:38
109947 Backup Active unix_test test 07/14/05 17:12:38
109948 Backup Active unix_test test 07/14/05 17:12:38

** The JobID's above equate to the following tapes/drives --> ** echo
<---- nothing here ---->
------------------------------------------

Ok! Perhaps, this problem occurs because I'm using multiple with my policies. I means netbackup make a backup and in the same time, it make a copy!
Did this script for you get well for all backups?
I'll check still the script and will seek if there're some other command instead of bperror to give the Tape name!
In all the cases, I'll give you a feedback....

Regards,
jerome

NB.- NBU in a few days ago was updated until 5.1MP3A

Richard_Bannist · ‎07-14-2005

you're almost there as there's tape drives listed and 3 jobID's listed...

i'd check the following 4 lines as most likely need tweaking for your environment -

/usr/openv/volmgr/bin/vmoprcmd|grep -v " Drive"|grep -v ADDIT|tail -9 >>

/usr/openv/netbackup/bin/admincmd/bpdbjobs -report|grep "CL"|grep -v Done|grep -v JobID|grep -v "DB Backup" |awk '{printf("%-6s %-6s %-25s %-12s %-1s %-1s %-1s\n", $1,$2,$3,$4,$6,$7,$8) ;}' >> /tmp/nfnfek - this line is the crux of it i think - ALL my policies are named CL_something-or-other

cat /tmp/nfnfek|awk '{print $1}' | while read line ; do /usr/openv/netbackup/bin/admincmd/bperror -jobid $line | grep L1 | awk '{print $6,$23,$25,$26,$27}' | tail -1 ; done >> /tmp/nfnfek1

awk '{print $1," ",$2,$3,$4,$5}' /tmp/nfnfek1 > /tmp/nfnfek1x

- the awk columns are probably different too. Hope that helps...

VOX

NBU Monitoring