03-10-2012 06:40 AM
Hi guys
i m facing problem in my environment. One of client server OS windows 2003 hav 2 disk on it C: and E:\. backup of C:\ goes fine.but nbu take backup of E:\ for about 100000KB , 34000 files approx then stays in hung state for about 2, 3 days. new backup schedule do not start. we have cancel hung backup job and then start a new one. but same problem comes up again with E:\
few of my observations:
3:50:18.390 AM: [1100.2888] <4> tar_backup_tfi::backup_send_
4:30:13.437 AM: [1100.2888] <16> tar_tfi::processException:
An Exception of type [SocketWriteException] has occured at:
Module: @(#) $Source: src/ncf/tfi/lib/
Local Address: [0.0.0.0]:0
Remote Address: [0.0.0.0]:0
OS Error: 10053 (An established connection was aborted by the software in your host machine.) Expected bytes: 32768
4:30:13.437 AM: [1100.2888] <2> tar_base::V_vTarMsgW: FTL - socket write failed
4:30:13.437 AM: [1100.2888] <4> ov_log::OVLoop: INF - Cycling log file
4:30:13.437 AM: [1100.2888] <4> ov_log::OVClose: INF - Closing log file: C:\Program Files\VERITAS\NetBackup\logs\
12:58:53.672 AM: [836.4168] <4> tar_backup_tfi::backup_send_
1:04:10.605 AM: [836.4168] <4> tar_backup_tfi::backup_send_
1:43:16.653 AM: [836.4168] <16> tar_tfi::processException:
An Exception of type [SocketWriteException] has occured at:
Module: @(#) $Source: src/ncf/tfi/lib/
Local Address: [0.0.0.0]:0
Remote Address: [0.0.0.0]:0
OS Error: 10053 (An established connection was aborted by the software in your host machine.) Expected bytes: 32768
1:43:16.653 AM: [836.4168] <2> tar_base::V_vTarMsgW: FTL - socket write failed
1:43:16.653 AM: [836.4168] <4> ov_log::OVLoop: INF - Cycling log file
1:43:16.653 AM: [836.4168] <4> ov_log::OVClose: INF - Closing log file: C:\Program Files\VERITAS\NetBackup\logs\
03-10-2012 09:56 AM
can you paste bpbkar logs for this backup job. or better attach bpbkar logs here.
03-14-2012 07:03 AM
In my experience 10053 often is related to timeouts, creating/increasing the registry keys CLIENT_READ_TIMEOUT & CLIENT_CONNECT_TIMEOUT might help
I would also run chkdsk on E: to see if there was any indication of problems with file system
Try to do backup of different areas of E:, to see if the problem is related to a specific folder/file
Regards
Michael
03-14-2012 07:59 AM
Also you could try letting bpbkar run through the E drive without sending the data over to the media server to see if it also aborts, if it does it's not CLIENT_READ_TIMEOUT issues or the network
C:\program files\veritas\netbackup\bin\bpbkar32 -nocont E:\ > NUL 2> c:\temp.txt
Ensure you have netbackup\logs\bpbkar\ logging enabled and logging increased on the client to maximum.
03-22-2012 08:35 AM
Hi Rookie,
Check whether you have enough disk space for the drive in which you have configured the log file.
If not please free space in the drive or add some additional disk space.
If the backup is unable to update the log file the backup goes to hung state.
The same issue we faced in our environment.
04-16-2012 06:57 AM
Hi guys
I tried C:\program files\veritas\netbackup\bin\bpbkar32 -nocont E:\ > NUL 2> c:\temp.txt on client which works completly fine.
Disk space n time out options already checked.
This problem not just present on 1 client but close to around 15 clients in my IT enviornment.
Please suggest wat other options i can check.
04-16-2012 07:04 AM
Rookie,
If it is affecting 15 clients, then this is definitely either network or timeout issue.
I would first check the network is good.
Also on the media server check what the CLIENT_READ_TIMEOUT is set to.
04-16-2012 10:02 AM
client _read_timeout is set as 500 on all 4 media servers.
to check network; is there specific software which symantec recommends or any command netbackup based or data domain command[ it my storage unit] or OS based command [media are window servers]
04-16-2012 12:16 PM
Rookie, there is some tools we can use for checking the network, but you will need to raise a case for this.
CLIENT_READ_TIMEOUT is on the low side. I would increase that to 1200.
Not sure if it will help as you say the clients stay hung for 2-3 days. I think setting up bpbkar trace logging may help. Increase logging to maximum on the client and media server. Ensure bpbkar and bpfis log directory is in place on the client under netbackup\logs\
Create empty file (with no extensions) in the parent netbackup directory called bpbkar_path_tr
On media server ensure logging is enabled for bpbrm and bptm.
Run a backup, when it starts hanging, take a look at the bpkar and bpfis logs.
04-17-2012 12:37 AM
Ok, let's summarize:
Your opening post made it look like ONE client has a problem with D-drive only.
Now we see that about 15 clients have this problem.
What is the common factor? One media server? All media servers?
Is the same NIC on the media server(s) used to receive data from clients as well as send data to DD?
Have you checked/verified that latest drivers and firmware settings have been applied to the NIC?
Have you obtained network/NIC settings for DD to ensure optimum performance?
Have you tried to monitor incoming data on DD itself while backup is running?
I have some time ago seen that specific model Broadcom NIC had a problem with high I/O. I found the information by Goog'ling the NIC model number. Latest drivers/NIC solved the problem.
04-17-2012 05:21 AM
04-18-2012 05:22 AM
Not sure this is the correct place to post my problem
recently I have installed Backup exec 2012 on windows 2008 R2 64bit server, I have Symantec vault server 8.02 when I schedule a full backup it working only once. next time when the same job runs its getting stuck after EVmonitir backup (approximately 8MB) after that backup not responding even if i keep it running for 24hrs. If I reboot my Vault server backup works but next backup same problem. No error reporting
I have logged call with Symantec tech support but no luck and very poor respond from them
I have changed storage from tape to disk but no luck any idea.. I checked sgmonitor but dont know where to look
04-25-2012 10:40 PM
HI guys
on some clients i hav set CLIENT_CONNECT_TIMEOUT = 3600
04-25-2012 11:32 PM
Info bptm(pid=3300) waited for full buffer 9408 times, delayed 28351 times
This shows that the bptm process was delayed 28000 times waiting to be sent data from the client. It is only an indication, and without knowing what the similar line in bpbkar shows, is virtually useless (as if bpbkar is dalayed more times, then this would be the more important value, if it is less, then the bptm value is more important).
Also, 28351 looks like a big value, but that depends on how big the backup is - if the backup is a small amount, then yes, this is a big value, if the back is large then it is less relevant.
So, it indicates that there is a delay, where the clients are not sending data to the media server when they should be - how much of a fact or this is depends.
Regarding this:
to check network; is there specific software which symantec recommends or any command netbackup based or data domain command[ it my storage unit] or OS based command [media are window servers]
No - not really. The network is not the responsibility of Symantec (sorry) . It (the network) is on the same level as the operating system in terms of 'support'. However, there is a tool called 'Camel' *no idea why) that can give some performance figures, and AppCritical which can be very useful. You will need to log a call with Symantec to use these.
Martin
04-26-2012 12:54 AM
bptm tells us that the backup was no really hung, just d-o-g slow.
Have you disabled TCP Chimney on all the W2003 clients?
I have see how disabling it dramatically increased backup performance.
http://www.symantec.com/docs/TECH60844
Network connectivity tuning to avoid network read/write failures and increase performance
04-26-2012 12:59 AM
04-26-2012 01:11 AM
Have a look at above TN.
TCP Chimney causes various 'horror' problems on W2003 - slow throughput, network errors, etc...
04-26-2012 01:43 AM
u forgot technote marianne
04-26-2012 02:05 AM
I did not - it's in 2 posts ago: https://www-secure.symantec.com/connect/forums/backup-hung-state#comment-7045481
http://www.symantec.com/docs/TECH60844