Solved: Verbose output truncated in bpdbjobs -all_columns?

Darren_Dunham · ‎03-07-2011

I have some tools I wrote that trawl through "bpdbjobs -all_columns -jobid <jobid>" to pull out the writing speed over the course of the job. This lets me graph it and see how things are going. (I have a lot of netapps that when they do NDMP incrementals, they do nothing for a while (hours often) and then write at speed. So an "average speed" isn't very useful).

I mostly did this back under 6.0. I've noticed that I have a lot of 6.5 jobs where the information in bpdbjobs just stops. It might be something to do with 6.5, or it might be the jobs themselves have changed somehow.

As an example, this job began at 2/17/2011 21:53 and completed at 2/20/2011 08:58 (over 59 hours, ~7TB job). If I look at the end of bpdbjobs -all_columns for it, I get this:

 [snip]
02/18/11 16:04:39 - 40005 KB written - 41157.113 KB/sec
02/18/11 16:04:39 - 40005 KB written - 41157.348 KB/sec
02/18/11 16:04:40 - 40005 KB written - 41157.586 KB/sec
02/18/11 16:04:41 - 40005 KB written - 41157.816 KB/sec
02/18/11 16:04:41 - 40005 KB written - 41158.043 KB/sec
02/18/11 16:04:42 - 40005 KB written - 41158.277 KB/sec
02/18/11 16:04:42 - 40005 KB written - 41158.504 KB/sec
...
6967081863
46377009
292129
33078 
[snip]

Basically, the last performance/timing bit that it logged was less than 24 hours into the job. I don't have any more performance logs after that point.

Anyone ever seen this or have any idea what limits there might be on how much data is available through bpdbjobs? System is currently Linux, single master/media server, lots of NDMP hosts, 6.5.5, but I've run these tools on 6.0 systems as well.

Darren

Darren_Dunham · ‎03-08-2011

Well, lo and behold, the "bpdbjobs" output is truncated, but the <job>.t trylog file is not.

So I need to adapt my program to be able to pull a trylog file directly. Slightly more restrictive, but much better than not having the data at all.

Last bits from bpdbjobs:

 03/06/11 12:14:27 - 40005 KB written - 47183.555 KB/sec
03/06/11 12:14:28 - 40005 KB written - 47183.691 KB/sec
...
12147569595

End of the same job trylog:

 KBW 1299625965 40005 50522.602
KBW 1299625968 40005 50522.117
KBW 1299625969 40005 50522.062

And 1299625969 => is March 8, 15:12 in my timezone. So bpdbjobs is just truncating it.

View solution in original post

GlenG · ‎03-08-2011

Darren,

I have weekend backups that run for more than 24 hours. I think it's too late for this week but I could check next Monday ...

Can you share your code that extracts the data from bpdbjobs? Just running it from the command line a little hard to read.

GlenG

NBU master 7.0.1 on Sun X4500, Solaris 10

Darren_Dunham · ‎03-08-2011

The code is pretty complex, but talking with someone else I just learned that it's all in the trylogs and much easier to see there:

Under ${install}/netbackup/db/jobs/trylogs/<job>.t, each of the performance lines begins with a "KBW ". Look there at the end of the file.

Now most of my jobs are good and the performance data reaches the end of the job. But some of them don't. I'm going to see if I can find something about the ones that don't (number of lines, size of data, etc.).

Darren

Darren_Dunham · ‎03-08-2011

Well, lo and behold, the "bpdbjobs" output is truncated, but the <job>.t trylog file is not.

So I need to adapt my program to be able to pull a trylog file directly. Slightly more restrictive, but much better than not having the data at all.

Last bits from bpdbjobs:

 03/06/11 12:14:27 - 40005 KB written - 47183.555 KB/sec
03/06/11 12:14:28 - 40005 KB written - 47183.691 KB/sec
...
12147569595

End of the same job trylog:

 KBW 1299625965 40005 50522.602
KBW 1299625968 40005 50522.117
KBW 1299625969 40005 50522.062

And 1299625969 => is March 8, 15:12 in my timezone. So bpdbjobs is just truncating it.

CRZ · ‎03-09-2011

I think we limited it to....50 entries? (Count 'em and see) due to concerns about overflow, memory issues, core dumping, or something equally scary. Or maybe it was 1000 entries? Or I may have this configuration completely mixed up with something else. Come to think of it, this post isn't useful at all. Good thing your question is already answered! ;)

Darren_Dunham · ‎03-09-2011

It's a lot more than that. That's why I wasn't certain what was going on. It's more than 65K.

 # bpdbjobs -all_columns -jobid 294895| perl -F, -lane 'foreach (@F) {print;}' | grep -c "KB written -"
68422

But:

 # grep -c '^KBW' /usr/openv/netbackup/db/jobs/trylogs/294895.t
397304

I can well believe truncating it helps a lot of things (woe unto those that do a 'bpdbjobs -all_columns' on all jobs. I have to do that to collect some information from remote servers. Takes more than 5 minutes on some busy guys).

Thanks, though.

VOX

Verbose output truncated in bpdbjobs -all_columns?