I'm running a script to dump the backup failures for the previous 24 hours. I don't want any "retry" error codes in the output and only want the real final backup status. I'm creating a ticket for failed backups and if a backup "retries" and is successful, I don't want it included in the output.
bperror -backstat is what I'm using with specific switches to look from the previous day at 07:00 to the current day at 07:00.
I see I'm getting several "lookers" but nobody offering any suggestions to my questions/queries. ;)
What I'm thinking is:
Dumping the bperror output for the past 24 hours and then parsing that info. If a backup attempt has "retries" but the end result of the job is successful, then the job id is the same in the bperror -backstat output but the job stream epoch timestamp is at a greater value. Without getting too far into the "weeds" here, I guess I could just parse the output from bperror -backstat, grep everything that is "not" successful and grep everything that is not a return code of "1". That should leave me with a listing of return codes that are greater than 1 and no listing of successful jobs in the remaining output scrub file. Then check the final return code for every listed backup id via "bpdbjobs" and if the return code from bpdbjobs is greater than 1? Then I can dump the info for that line/job and flag it as a true failure and thus cut a ticket on the failed job.
I think a simpler way to parse the output would be to record the status of each job in turn (the output is in time sequential order). As a retry job comes through the status for that job will be updated to the new exit code. At the end, ignore all the status 0 (& 1) and you have your list of tickets to raise.
Potentially even simpler, just record the jobs that meet the fail criteria. If the job comes up again in the output, update the status or throw away if it "succeeds".
Aren't there canned OpsCenter reports that would do this for you though?
unfortunately there is no opscenter report that will give you only the real failed jobs. Also I do not think that bperror will give you enough info to create the report. You may have to deal with a policy with multiple data streams and the job may retry with different backup id.
What I have done is to get the output of bpbdjobs -all_columns for the last 24 hours and extract the status code, policy, schedule, client and file selection. and the status code Then for every complication of policy, schedule, client and file selection I check the latest status code. If the status code is 0 then the backup is OK. if the status code is grater that 0 then the last attempt was failed and I add the line to the report.
It is a very primitive way to get the failed jobs and the process is slow, but it is working.
PS. I've add status code 1 to the failed jobs as a database backup with status 1 is a failed backup (at least a database is missing form backup)
You can change it for Linux master easily. If someone wants to run it from a windows command line or the scheduler the command must be C:\admin\scripts\PortableGit-1.7.2.3\bin\bash.exe --login /C/admin/scripts/failed_jobs.sh (assuming that the GIT and the script are stored at C:\admin\scripts\ and the script named failed_jobs.sh )
I do not check for SLP jobs, you can easily change that. PS. the last part of the script checks if there are backup jobs that run more than 10 hours. I find it useful.