cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

Backup hanging in Active state, won't go past "begin writing"

bbot
Level 4

Out of our 100+ clients, we have one client that has been getting stuck in "Active" state, but hangs at Begin writing. The job has been running for over 36 hours. In the past, they successfully completed around 12-18 hours. This server does have a large data store, ~10 TB worth of data.

About 6 days earlier, the same client errored with a exit status 41:network timeout. After this, they have been hanging.

Here's the job details below:

1/20/2016 11:00:00 PM - Info nbjm(pid=4184) starting backup job (jobid=37269) for client lasfs01.corp.tlcinternal.us, policy LASFS01_SC, schedule Wednesday  
1/20/2016 11:00:00 PM - Info nbjm(pid=4184) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=37269, request id:{4BE50878-5D68-4336-A93E-8D8B41177185})  
1/20/2016 11:00:00 PM - requesting resource LAS
1/20/2016 11:00:00 PM - requesting resource .NBU_CLIENT.MAXJOBS.server#####
1/20/2016 11:00:00 PM - requesting resource .NBU_POLICY.MAXJOBS.LASFS01_SC
1/20/2016 11:00:00 PM - granted resource .NBU_CLIENT.MAXJOBS.server####
1/20/2016 11:00:00 PM - granted resource .NBU_POLICY.MAXJOBS.LASFS01_SC
1/20/2016 11:00:00 PM - granted resource MediaID=@aaaab;DiskVolume=lasbackup;DiskPool=lasbackup;Path=lasbackup;StorageServer=10.64.128.40;MediaServer=masterserver#####
1/20/2016 11:00:00 PM - granted resource LAS
1/20/2016 11:00:01 PM - estimated 17604715 Kbytes needed
1/20/2016 11:00:01 PM - Info nbjm(pid=4184) started backup (backupid=lasfs01.corp.tlcinternal.us_1453359601) job for client server######, policy LASFS01_SC, schedule Wednesday on storage unit LAS
1/20/2016 11:00:02 PM - Info bpbrm(pid=6892) server###### is the host to backup data from     
1/20/2016 11:00:02 PM - Info bpbrm(pid=6892) reading file list for client        
1/20/2016 11:00:02 PM - started process bpbrm (6892)
1/20/2016 11:00:02 PM - connecting
1/20/2016 11:00:03 PM - Info bpbrm(pid=6892) starting bpbkar32 on client         
1/20/2016 11:00:03 PM - connected; connect time: 0:00:01
1/20/2016 11:00:05 PM - Info bpbkar32(pid=4940) Backup started           
1/20/2016 11:00:05 PM - Info bpbkar32(pid=4940) change time comparison:<disabled>          
1/20/2016 11:00:05 PM - Info bpbkar32(pid=4940) archive bit processing:<enabled>          
1/20/2016 11:00:06 PM - Info bptm(pid=9452) start            
1/20/2016 11:00:06 PM - Info bptm(pid=9452) using 262144 data buffer size        
1/20/2016 11:00:06 PM - Info bptm(pid=9452) setting receive network buffer to 1049600 bytes      
1/20/2016 11:00:06 PM - Info bptm(pid=9452) using 30 data buffers         
1/20/2016 11:00:09 PM - Info bptm(pid=9452) start backup           
1/20/2016 11:00:09 PM - Info bptm(pid=9452) backup child process is pid 6296.6936       
1/20/2016 11:00:09 PM - Info bptm(pid=6296) start            
1/20/2016 11:00:09 PM - begin writing

 

On the client, I pulled the bpbkar log and it shows about 22 hours of the below over and over..

09:28:17.002 [4940.832] <2> dtcp_read: TCP - success: recv socket (580), 4 of 4 bytes
09:28:17.002 [4940.832] <4> bpio::read_string: INF - read non-blocking message of length 1
09:28:17.002 [4940.832] <2> dtcp_read: TCP - success: recv socket (580), 1 of 1 bytes
09:28:17.002 [4940.832] <4> tar_backup::readServerMessage: INF - keepalive message received
09:28:17.002 [4940.832] <4> tar_base::keepaliveThread: INF - sending keepalive
09:28:17.002 [4940.832] <2> dtcp_write: TCP - success: send socket (492), 1 of 1 bytes

 

1 ACCEPTED SOLUTION

Accepted Solutions

areznik
Level 5

Its gonna work fine after reboot, as soon as you forget about it, vss will break quietly again, you'll discover it after a few days of missed backups and the whole cycle will begin again. Welcome to wonderful world of backing up microsoft products :) 

View solution in original post

9 REPLIES 9

sdo
Moderator
Moderator
Partner    VIP    Certified

Check VSS first.  Do these all work?

vssadmin list providers

vssadmin list volumes

vssadmin list shadowstorage

vssadmin list shadows

vssadmin list writers

vssadmin list writers | find /i "last"

vssadmin list writers | find /i "state"

bbot
Level 4

@sdo All commands run fine on the master server.



On the client, all commands work up until "writers." It says it will be delayed if a shadow copy is being prepared. I currently have 2 active backup jobs on this server which could be causing the delay. (I wanted to see what happened if I left it running after restarting all the services.

shadowstorage and shadows come up with "no items found that satisfy the query"

sdo
Moderator
Moderator
Partner    VIP    Certified

Sorry - commands were meant for the client.  Did one of the vssadmin commands hang/fail and not complete?

Michael_G_Ander
Level 6
Certified

Have experienced a similar issue, in that case the problem was a bpbkar32 process hanging waiting on IO, but after you description of the vssadmin list writers behaviour, I would guess that a vss snaphot never completes.

I would ask for a reboot of the client and create the bpfis folder under ../NteBackup/logs for future troubleshooting of the snapshots

if a reboot is not possible, stop the NetBackup services and kill any lingering bpbkar32, bpfis processes in taskmanager

Wait at least 5 minutes and then start the Netbackup Client Service, which should start all the neeeded services

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

nbutech
Level 6
Accredited Certified

 

Is it possible to back this client in a separate policy using Flashbackup-Windows policy type?

bbot
Level 4

None of these commands worked. All active jobs are stopped from the master server. They hang at "Waiting for responses. These may be delayed if a shadow copy is being prepared".

vssadmin list writers

vssadmin list writers | find /i "last"

vssadmin list writers | find /i "state"

Stopping the services and killing the processes didn't seem to help.

 

We're going to reboot this client tonight and try again.

nbusers
Not applicable

try to run backup in multiple streams and look for a particular drive or folder where backup gets stuck. 

areznik
Level 5

Its gonna work fine after reboot, as soon as you forget about it, vss will break quietly again, you'll discover it after a few days of missed backups and the whole cycle will begin again. Welcome to wonderful world of backing up microsoft products :) 

bbot
Level 4

Reboot fixed this. Thanks.