cancel
Showing results for 
Search instead for 
Did you mean: 

Troubles understanding Bpbkar log

RISSOULY_Abdelh
Level 4
hi everyone, can someone please explain to me what these lines below means : 23:59:53.919 [5052.6036] <2> fwrite_and_log: fail to write for track journal, backup id:, file:, err_num:<2>, to write:<1>, wrote:<0> 23:59:53.919 [5052.6036] <4> dos_backup::fscp_add_tj_entry(): ct_cat_add_entry() failed, error:<14>. 23:59:53.919 [5052.6036] <16> tar_backup_tfi::fscp_finishfile_state: ERR - fscp_add_tj_entry() failed, error (14). this is the only error i found in bpbkar logs of job that failed whit status 14. Netbackup client version : 7.5.0.6 Master server version : 7.6.0.2 Operation system : Windows Server 2008 R2 Standard 64 bit opeation system Best regards,
39 REPLIES 39

RISSOULY_Abdelh
Level 4

Here is the ooutput of the commands :

 

  •  dir /s :   

Total Files Listed:
      96125 File(s) 33 028 760 357 bytes
      67144 Dir(s)  25 208 684 544 bytes free

  •  dir /s /ah :

Total Files Listed:
       1090 File(s)  4 560 275 719 bytes
       1093 Dir(s)  25 208 684 544 bytes free

  • dir /s /as 

Total Files Listed:
       2215 File(s)  4 578 978 748 bytes
       1238 Dir(s)  25 208 619 008 bytes free

  •  dir /s /ahs

Total Files Listed:
        662 File(s)  4 541 087 941 bytes
        827 Dir(s)  25 208 619 008 bytes free 

sdo
Moderator
Moderator
Partner    VIP    Certified

Ok - I asked for the file counts, as I recently had a backup of a Hyper-V guest (using plain client inside the guest) that was failing with status 14/13 - because there were 1.8 million small files in a user's internet explorer cache folder.   You don't have this problem.

1) Any suspicious messages in the Windows Application Event log - between the start and end times of the backup job?  (you need to check the whole time range - and not just the time that the backup fails).

2) Anything in System Event log too, re IO or disks?

3) How big are the VxCJ*.dat files at the root of C: ?

4) Have you excluded the track journal folder tree from backups ?

5) Have you excluded the track journal folder tree from A/V scanning ?

6) Have you excluded the C:\VxCJ*.dat files from backups ?

7) Is 'Use Change Journal' enabled in client properties?

sdo
Moderator
Moderator
Partner    VIP    Certified

Looking in your bkbkar log:

23:59:53.919 [5052.6036] <2> fwrite_and_log: fail to write <fix size entry block> for track journal, backup id:<NULL>, file:<desktop.ini>, err_num:<2>, to write:<1>, wrote:<0>
23:59:53.919 [5052.6036] <4> dos_backup::fscp_add_tj_entry(): ct_cat_add_entry() failed, error:<14>.
23:59:53.919 [5052.6036] <16> tar_backup_tfi::fscp_finishfile_state: ERR - fscp_add_tj_entry() failed, error (14)

...we see the NetBackup error 14, indicating a file write failed, but what's more interesting is the field 'err_num:<2>' just before it... and if we lookup Windows 'error code 2' here:

https://msdn.microsoft.com/en-gb/library/cc231199.aspx

...we see that Win32 status code '2' means 'File not found'.

I'm wondering if something is removing, or restricting access to the change journal folder/files.

Question 8)  Is the NetBackup client running as 'Local system account' or some other username?  i.e. what does this command show:    sc qc "NetBackup Client Service"

Question 9)  Can you show the output from this command:    set | find /i "proc"

 

RISSOULY_Abdelh
Level 4

hi,

thanks SDO, i know why you asked for these command's results :)

Application and system events logs are the two things i always ckeck after Netbackup log : not even one single error during the backup job.

Even after excluding track journal folder tree and C:\VxCJ*.dat from backup, job is still failling whit status 14 in the first run and 12 in the second run : more details in attachments if you want to take a look

3) : About the size of C:\VxCJ*.dat :

 Volume in drive C is SYSTEM
 Volume Serial Number is 8CDD-BF06

 Directory of C:\

28/02/2015  16:52         1 576 960 VxCJDelete.dat
28/02/2015  18:00             1 126 VxCJInfo.dat
28/02/2015  16:52           589 824 VxCJMon.dat
               3 File(s)      2 167 910 bytes
               0 Dir(s)  25 101 058 048 bytes free

 

5) No, but i will as soon as a get the password from the AV Team :)

6) Yes, change journal is enabled 

i noticed the error that you mentionned above, and it's not exclusif to this client, many clients fails whit status 14/13 whit the same error. but when i re-run the job int the morining ==> no errors 

8) The NetBackup client is running as 'Local system account' :

C:\>sc query "NetBackup Client Service"

SERVICE_NAME: NetBackup Client Service
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 4  RUNNING
                                (STOPPABLE, PAUSABLE, ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0

9) C:\>set | find /i "proc"


NUMBER_OF_PROCESSORS=2
PROCESSOR_ARCHITECTURE=AMD64
PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 15 Stepping 1, GenuineIntel
PROCESSOR_LEVEL=6
PROCESSOR_REVISION=0f01

RISSOULY_Abdelh
Level 4

bpbkar_status_14

sdo
Moderator
Moderator
Partner    VIP    Certified

Do you have Shadow_Copy_Components: and System_State: specified in the same backup policy selection list?

If so, then there really is no need to specify System_State: again, because System_State: is included within Shadow_Copy_Components: -   in which case take System_State: out of the backup policy selection list, and try again?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

It seems this backup has been failing for more than a month now and we are unable to pinpoint the problem.

Please log a Support call with Symantec.
You will need to work with backup admins to collect info/logs from NBU master and/or media server if needed.

sdo
Moderator
Moderator
Partner    VIP    Certified

Ah - so there are several backup clients that are failing with status 14/13 - which helps me begin to understand that this isn't really an individual client side issue.  I agree with Marianne, re getting a proper support case opened.  It looks like you have a deeper/wider problem at the networking layer, with comms to the media and/or master servers.

When you do find root cause - I'm sure several of us would like to know what was causing all this.

RISSOULY_Abdelh
Level 4

hello thank you guys :),

acctually the situation is a little bit complicated, as it's not our environnement i'm just managing the Netbackup Client part the master and media part is managed by another team and these are the ones who have right to contact symantec support, and they do but via email as a colleague explained to me. 

i'll send a email to the Master team to open a call if the job fails one more time, but i'm not sure they will say "Yes", anyway the job didn't fail the past two days and i'm keeping and eye on it.

all you're recommendation SDO are done :) except removing system-state from backup policy selection list ==> i don't have permissions lol.

i als increased the debuggin level to five just in case the fail failed.

sdo
Moderator
Moderator
Partner    VIP    Certified

Ok - only remove System_State:    IF     Shadow_Copy_Components:   (or ALL_LOCAL_DRIVES) is already present.

Remember:

ALL_LOCAL_DRIVES already includes Shadow_Copy_Components:.

And Shadow_Copy_Components: already includes System_State:.

 

Looking back - I didn't mention to exclude the VxCJ*.dat files from A/V scanning.   The VxCJ*.dat files are the NTFS Change Journal log/tracking files for NetBackup (which should not be confused with Accelerator track files - which are something else, and stored elsewhere).  These VxCJ*.dat files need to be excluded from A/V scanning.

HTH.

RISSOULY_Abdelh
Level 4

the problem is solved finaly and here is how :

in the past few weeks a major incident appeared in the master server, several jobs were failling whit status 83 and they didn't know exactly what the problem was, so they oppened a Symantec Call ==>  08290678. 

So Symantec engineerer suggested some changes in some parameters (have no idea whish parameters) and  status 13/14/50 disapeared. After all, the issue was in the master server side.... 

mph999
Level 6
Employee Accredited

Seems the valuse set were the kernels internal ARP cache size

> net.ipv4.neigh.default.gc_thresh3 = 4096
> net.ipv4.neigh.default.gc_thresh2 = 2048
> net.ipv4.neigh.default.gc_thresh1 = 1024

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

After battling with an issue for almost 3 months and only trying to troubleshoot from the client side, we can conclude that troubleshooting can never be one-sided.

The entire process flow must be investigated. 
Backup admins and server owners need to co-operate. No way around this.

sdo
Moderator
Moderator
Partner    VIP    Certified

Thanks mph999 - for noting the technical fix.

Can I probe as to 'why' those values fixed the problem?   Those values must have been identified as being the fix, so how did the support engineer arrive at that conclusion?

mph999
Level 6
Employee Accredited

I've sent the TSE a note to ask, as I was wondering the same thing, it will be in the case notes, but as the case is in French ...

That said, the following error may have been a clue ...

NOTE: I've changed the machine name and removed the IP address

 /var/log/messages on node machine_x shows hundreds (maybe thousands) of errors like the following, while the other nodes have no errors in messages logs
Mar 25 19:12:52 machine_x kernel: Neighbour table overflow.
Mar 25 19:12:52 machine_x nmbd[5018]: Packet send failed to <some IP address> (138) ERRNO=No buffer space available

 

RISSOULY_Abdelh
Level 4

Well, i can do translation

RISSOULY_Abdelh
Level 4

@Marianne : yes i agree, but some people just wanna watch the world burn.

mph999
Level 6
Employee Accredited

Think the TSE is away at the moment, I'll follow up ASAP

mph999
Level 6
Employee Accredited

Finally spoke to the TSE today ...

He found a previous issue by searching the error from /var/log/messages whih lead him to the recommened values.

 

sdo
Moderator
Moderator
Partner    VIP    Certified

Thank you Martin.