cancel
Showing results for 
Search instead for 
Did you mean: 

bperror - final backup status

DPeaco
Moderator
Moderator
   VIP   

Greetings,

NBU 9.1.0.1 running on Redhat Linux 8.6

I'm running a script to dump the backup failures for the previous 24 hours. I don't want any "retry" error codes in the output and only want the real final backup status. I'm creating a ticket for failed backups and if a backup "retries" and is successful, I don't want it included in the output.

bperror -backstat  is what I'm using with specific switches to look from the previous day at 07:00 to the current day at 07:00.

Thoughts or ideas?

Thanks,
Dennis
7 REPLIES 7

DPeaco
Moderator
Moderator
   VIP   

I see I'm getting several "lookers" but nobody offering any suggestions to my questions/queries. ;) 

What I'm thinking is:

Dumping the bperror output for the past 24 hours and then parsing that info. If a backup attempt has "retries" but the end result of the job is successful, then the job id is the same in the bperror -backstat output but the job stream epoch timestamp is at a greater value. Without getting too far into the "weeds" here, I guess I could just parse the output from bperror -backstat, grep everything that is "not" successful and grep everything that is not a return code of "1". That should leave me with a listing of return codes that are greater than 1 and no listing of successful jobs in the remaining output scrub file. Then check the final return code for every listed backup id via "bpdbjobs" and if the return code from bpdbjobs is greater than 1? Then I can dump the info for that line/job and flag it as a true failure and thus cut a ticket on the failed job.

Am I thinking correctly here?

Thanks,
Dennis

Hi @DPeaco 

I think you are on the right track.

I think a simpler way to parse the output would be to record the status of each job in turn (the output is in time sequential order). As a retry job comes through the status for that job will be updated to the new exit code. At the end, ignore all the status 0 (& 1) and you have your list of tickets to raise. 

Potentially even simpler, just record the jobs that meet the fail criteria. If the job comes up again in the output, update the status or throw away if it "succeeds".

Aren't there canned OpsCenter reports that would do this for you though?

David

StefanosM
Level 6
Partner    VIP    Accredited Certified

unfortunately there is no opscenter report that will give you only the real failed jobs.
Also I do not think that bperror will give you enough info to create the report. You may have to deal with a policy with multiple data streams and the job may retry with different backup id.

What I have done is to get the output of bpbdjobs -all_columns for the last 24 hours and extract the status code, policy, schedule, client and file selection. and the status code
Then for every complication of policy, schedule, client and file selection I check the latest status code.
If the status code is 0 then the backup is OK.
if the status code is grater that 0 then the last attempt was failed and I add the line to the report.

It is a very primitive way to get the failed jobs and the process is slow, but it is working.

PS. I've add status code 1 to the failed jobs as a database backup with status 1 is a failed backup (at least a database is missing form backup)

DPeaco
Moderator
Moderator
   VIP   

@StefanosM 

I'd like to see your script code to use as a guide in your logic flow. If you need my email address? Please let me know.

Thanks,
Dennis

StefanosM
Level 6
Partner    VIP    Accredited Certified

As I told you it is somewhat primitive. It can be more efficient but it works and I do not have time to improve it.
I made it for a windows master server and run it  using portable git. That's why the paths are linux like
https://github.com/git-for-windows/git/releases/download/v2.37.1.windows.1/PortableGit-2.37.1-64-bit...

You can change it for Linux master easily.
If someone wants to run it from a windows command line or the scheduler the command must be
C:\admin\scripts\PortableGit-1.7.2.3\bin\bash.exe --login /C/admin/scripts/failed_jobs.sh
(assuming that the GIT and the script are stored at C:\admin\scripts\ and the script named failed_jobs.sh )

 

Spoiler

#set -x
days=1

echo > /c/temp/bpdbjobs_check.txt
ddate=`date --date="$days days ago" +"%m/%d/%y %H:%M"`
bpdbjobs -all_columns -t "$ddate"|grep -v SLP_ >/c/temp/bpdbjobs_no_slp.txt
cat /c/temp/bpdbjobs_no_slp.txt |awk -F"," '{print $4","$5","$6","$7","$33}' >/c/temp/bpdbjobs_working.txt

for i in `cat /c/temp/bpdbjobs_working.txt |awk -F"," '{print $2","$3","$4","$5}' |sort |uniq `
do
echo -ne .
grep $i /c/temp/bpdbjobs_working.txt |head -1 |grep -v "^0," | grep -v "^," |uniq >>/c/temp/bpdbjobs_check.txt
done

echo
date > /c/temp/error.txt
echo Failed Jobs for the last ${days} days >> /c/temp/error.txt
cat /c/temp/bpdbjobs_check.txt | awk -F"," '{print $1,$2,$3,$4,$5}' >> /c/temp/error.txt
echo >> /c/temp/error.txt
echo ---------------------------- >> /c/temp/error.txt
echo running jobs for more than 10H>> /c/temp/error.txt
echo >> /c/temp/error.txt
cat /c/temp/bpdbjobs_no_slp.txt | awk -F"," '{if ($3 =="1") print $1","$5","$6","$7","int($10/60/60)","int($10%(60*60)/60)","$10%60}' |awk -F"," '{if ($5 >= 10) print $1","$2","$3","$4","$5":"$6":"$7}' >> /c/temp/error.txt
echo >> /c/temp/error.txt
echo ---------------------------- >> /c/temp/error.txt
cat /c/temp/error.txt

cat /c/temp/bpdbjobs_check.txt | awk -F"," '{print $1","$2","$3","$4","$5}' > /c/temp/error.csv

echo
echo

I do not check for SLP jobs, you can easily change that.
PS. the last part of the script checks if there are backup jobs that run more than 10 hours. I find it useful.

DPeaco
Moderator
Moderator
   VIP   

Thank you! I'll check it out.

Thanks,
Dennis

StefanosM
Level 6
Partner    VIP    Accredited Certified

I think that the for loop can be improved. Now I check all unique backup jobs.  It will be more time efficient if I run it against only failed jobs.

If you improve it, please share it.