02-12-2016 06:39 AM
Hi,
2-3 times a week we have the backup of a client that hangs. Like this morning it has been running for 11hours, but it's just not backing anything up. Looks like it's stalled. The only thing I can do is cancel the backup, stop the services and even kill the processes, then restart the backup.
My question is, which logs can I look at to know what's going on? I enabled all the logs, but have no idea where to start looking.
Thanks!
Phil
02-12-2016 06:48 AM
Quite often you will find a bpfis or bpbkar32 left running on the client and this stops future backups doing anything.
Kill them off and if possible reboot the client and you should be OK in the future
Then check for things like Anti virus which could be grabbing hold of the backup or snapshot process and causing this sort of issue
02-12-2016 07:15 AM
There's only 1 bpbkar32 process running, no bpfis. Yes if I kill the process and restart the services and then the backup, it will work. But it's happening several times a week.
I'll check the Anti-virus like you suggested. Thanks
02-12-2016 07:26 AM
OK - it is usually AV, or a VSS issue, firewall keepalive timeouts and things like that ... something that will break the communication and leave a backup process behind which then causes the follow on error .. but it will usually follow an initial backup error so pounce on it the next time you get a failure as that first one will hold the clue
02-12-2016 07:45 AM
How much is data you are trying to backup for this client...
Is it possible to use Flashbackup-Windows for this client ?
02-12-2016 07:48 AM
The client and media server are at the same site with no firewall between them.
I'll concentrate the investigation on the AV and VSS.
Thanks
02-12-2016 07:54 AM
The backup has been complete for at least 10min, but the process bpbkar32.exe is still running and taking CPU.
Is this a normal behavior?
02-12-2016 07:58 AM
I'm seeing messages like this in the bpbkar log folder.
10:54:34.554 [6360.9044] <4> dos_backup::tfs_scannext: parent of <E:\Somefolder\Somefile.txt> has been forced, forcing backup
Is the process still processing the backup even if it's over in the Admin Console? I don't get why it still doing stuff.
02-12-2016 08:07 AM
When it finishes in the admin console it should finish on the client ... unless you have a bpbrm process on the media server that is firing it off
Check task manager on the media server and show the "command line" column so that you can see any processes relating to your client
You coudl do the same on the client to see if the bpbkar is the one that relates to the job you just ran or something else
02-12-2016 10:37 AM
I'll check the next time. I wnet to lunch and the process was done when I came back.
Got some error in the bpbkar log though.
11:17:37.269 [6360.9044] <4> dos_backup::tfs_scannext: INF - no more path list entries
11:17:37.269 [6360.9044] <4> dos_backup::V_PostCleanup: INF - clear archive bits: processing secondary items
11:17:37.269 [6360.9044] <4> dos_backup::V_PostCleanup: INF - clear archive bits: stop
11:17:37.269 [6360.9044] <4> dos_backup::V_PostCleanup: INF - ================================================================================
11:17:38.299 [6360.9044] <4> dos_backup::tfs_reset: INF - Snapshot deletion start
11:17:38.299 [6360.9044] <4> OVStopCmd: INF - EXIT - status = 0
11:17:38.299 [6360.9044] <16> dtcp_shutdown: TCP - failure: shutdown socket (476) (TCP 10054: Connection reset by peer)
11:17:38.299 [6360.9044] <2> tar_base::V_Close: closing...
11:17:38.299 [6360.9044] <4> dos_backup::tfs_reset: INF - Snapshot deletion start
11:17:38.377 [6360.9044] <2> ov_log::V_GlobalLog: INF - BEDS_Term(): enter - InitFlags:0x00000101
11:17:38.377 [6360.9044] <2> ov_log::V_GlobalLog: INF - BEDS_Term(): ubs specifics: 0x001d0000
11:17:38.657 [6360.9044] <16> dtcp_read: TCP - failure: recv socket (476) (TCP 10053: Software caused connection abort)
11:17:38.657 [6360.9044] <4> OVShutdown: INF - Finished process
11:17:38.657 [6360.9044] <4> WinMain: INF - Exiting C:\Program Files\VERITAS\NetBackup\bin\bpbkar32.exe
02-12-2016 03:05 PM
Is the problem client an old Windows 2003 server with lots and lots of files?
02-14-2016 10:41 AM
have you tried multistreaming the job and check if it is always failing for particular stream (drive) ? checking for AV is good option aswell, also does this always happen around same time ???
02-15-2016 06:29 AM
It's a Windows 2008 R2 server.
It's already multistreamed and from what I can remember it's the same drive.