Forum Discussion

Netbackup_learn's avatar
10 years ago

bpbkar logs structure

Hello im having trouble reading the log againts a backup policy, the situation is i have a client that runs multiple policies in it , but one of them is failing , when i want to read the logs and ...
  • mph999's avatar
    10 years ago

    The other thing with logs is to learn how to follow between them.  For a backup, bptm /bpbrm for example it's fairly easy - you could go on the time if not much is running, but the more technical way is to know that bpbrm starts bptm and when it does so it helpfully tells you the PID.

    Eg.  A backup job ...

     


    From bpbrm, INITIATING is a good word to look for ...  the following logparams line with the same PID will contain lots of goodies so you can find the right job, including the job ID, in this case 386.

    00:45:41.032 [18552] <2> bpbrm main: INITIATING (VERBOSE = 5): version NetBackup 7.6 2013092421
    00:45:41.033 [18552] <2> logparams: -backup -S womble -c womble -ct 0 -ru root -cl womble_tape -sched full -bt 1409874338 -dt 0 -st 0 -b womble_1409874338 -medias
    vr womble -jobid 386 -jobgrpid 386 -masterversion 760000 -maxfrag 1048575 -bpstart_time 1409874639 -reqid -1409825373 -mt 2 -to 0 -stunit tape_standalone -rl 0 -r
    p 604800 -eari 0 -cj 1 -D 6 -rt 0 -rn -1 -pool NetBackup -use_ofb -use_otm -jm -secure 1 -kl 28 -rg root -fso -keyword \300\274mseo\300\276KeyType\300\275aes128\3
    00\273\300\240compress\300\275none\300\273\300\274/mseo\300\276 -connect_options 16974338

    Unfortunately, this comes with experience, but I happen to know that when bpbrm starts bptm, it mentions 'bptm' in the log line, so if I search down for the line containing 'bptm' with the same pid [18552], I will find ...


    00:45:41.836 [18552] <2> bpbrm spawn_child: /usr/openv/netbackup/bin/bptm bptm -w -c womble -den 6 -rt 0 -rn -1 -stunit tape_standalone -cl womble_tape -bt 140987
    4338 -b womble_1409874338 -st 0 -cj 1 -p NetBackup -reqid -1409825373 -jm -brm -hostname womble -ru root -rclnt womble -rclnthostname womble -rl 0 -rp 604800 -sl
    full -ct 0 -maxfrag 1048575 -eari 0 -v -mediasvr womble -nonrsvdports -connect_options 0x01020001 -jobid 386 -jobgrpid 386 -masterversion 760000 -bpbrm_shm_id 671
    08984 -blks_per_buffer 128 -shm
    00:45:41.846 [18552] <2> set_job_details: Tfile (386): LOG 1409874341 4 bpbrm 18552 bptm pid: 18557
    00:45:41.846 [18552] <2> send_job_file: job ID 386, ftype = 3 msg len = 45, msg = LOG 1409874341 4 bpbrm 18552 bptm pid: 18557

    From these lines I see the bptm PID is 18557

    So now, I can search bptm log for all the lines containing [18557] and I find ...

    00:45:43.169 [18557] <2> SetMaxDataLimit: maximum data size: current=-3 max=-3
    00:45:43.169 [18557] <2> initialize: fd values STDOUTSOCK=4 STDERRSOCK=5
    00:45:43.181 [18557] <2> bptm: INITIATING (VERBOSE = 5): -w -c womble -den 6 -rt 0 -rn -1 -stunit tape_standalone -cl womble_tape -bt 1409874338 -b womble_1409874338 -st 0 -cj 1 -p NetBackup -reqid -1409825373 -jm -brm -hostname womble -ru root -rclnt womble -rclnthostname womble -rl 0 -rp 604800 -sl full -ct 0 -maxfrag 1048575 -eari 0 -v -mediasvr womble -nonrsvdports -connect_options 0x01020001 -jobid 386 -jobgrpid 386 -masterversion 760000 -bpbrm_shm_id 67108984 -blks_per_buffer 128 -shm
    00:45:43.182 [18557] <2> main: bptm.c.1591: maximum fragment size is 1048575000 Kbytes
    00:45:43.182 [18557] <2> bptm: PORT_STATUS = 0x01020001
    <snip>

    With legacy logs, it's as simple as that ...

    Unfortunetlay you don't always get a jobid mentioned, so it can be a case of going on time to get the approx area of the log, and then just reading through.

    The other thing to learn, is which process talks to which - for example for a backup ...

    nbjm on the master starts bpbrm on the media server
    bpbrm starts bptm on the media server and also bpbkar on the client
    bptm connects to nbjm to get resources, and jm sends the resources back on a seperate connection
    bpbkar sends the data to backup to bptm and file metadata to bpbrm
    bpbrm sends the file metadata to bpdbm on the client, which is sent to the image database / catalog

    So, knowning whereabouts the problem appears in say Activity Monitor, you can usually zone in to the process log, and a bit of knowledge (or Google or even the troubleshooting guide) will turn up a process flow, so you can get the logs for the process close by, for example like in the example above - if the issue is something in bpbrm, I would probably get bpbrm, nbjm, bpdbm as these are the processes that are talking to each other at the time.

    Sometimes the error isn't where you think, for example if bptm starts, but never gets the resources from nbjm, the problem looks to be in bptm, but it could be the comms to nbjm, or jm could have a problem talking to nbrb, or nbrb talking to nbemm, so you can end up chasing from log to log until you narrow it down.

    I won't mention vx logs in detail, they are a bit harder as they are multi-threaded, so one line with PID xxx could be doing something for job 123 but the next line with PID xxx could actually be related to job 127 ...  The trick is to know what the process does in what order and the keywords to look for, but this is way easier said then done.  Often I just read through them looking for things that could be related to the job in question.

  • watsons's avatar
    10 years ago

    In addition to Marianne's great reply, a few more simple points here:

    1) First check the bpbkar PID of the job details. If PID = 0, stop looking because bpbkar is not even started.
    2) Look into bpbkar logs, if you see:

    [8123.1122]  <== this is Windows system, but basically you can ignore the 2nd pid "1122".
    [8123]  <== this is non-Windows (Unix, Linux etc.) system

    3) In command prompt, you can do something like:

    C:\Program Files\Veritas\Netbackup\logs\bpbkar>  findstr "[8123"  log.090314 >  output1.txt

    (in Unix/Linux, you can use grep to replace findstr)

    This will give you the progress of bpbkar PID 8123, scroll through to look for error. But as pointed out by Martin, some errors are not obvious and some are negligible - so it takes time to learn to pinpoint the actual one. Look for those with string "<16>" which usually mean major error.