01-13-2014 01:34 AM
Hi,
Having managed to get the Flashbackup-Windows policy to validate (problem with the firewall) now I am getting an Error 13 each time the backup runs.
The snapshot part of the policy completes without error and the backup start Ok but gets to about 400128Kb and 325500 Files and terminates with the following in the activity monitor:
Solved! Go to Solution.
03-03-2014 05:54 AM
Looks like we fixed this problem ourselves... we found that the application was creating recursive directories for whatever reason and one of these was also causing Storage Essentials to fail on a discovery, we did scan the area and this came back Ok so we deleted it and re ran the SE which worked (we still have a few other recursive directories but these seem Ok) so I re-ran the Flashbackup policy and this is now working Ok.
Not sure what was wrong with the directory but that was definately the issue.
Kev
01-13-2014 03:23 AM
Although it may be hitting a corrupt file the most likely is that this is a timeout issue
Increase the Client Read timeout on your Media Server to 3600 and see if that resolves it for you
If the job fails after an hour next time then you will need to increase it further
Hope this helps
01-13-2014 04:16 AM
Hi,
I have increased the client read timeout to 3600 but it is failing at the same point, if this is a corrupt file is there any way to determine which this is and waht log should I be looking at.
Thanks
Kev
01-13-2014 04:24 AM
You would want the bpfis (in case it crashes during the snapshot process) and the bpbkar logs - both with the logging level increased
These could become very large for this type of job but if it fails quickly then you may be OK
bpcd would also be useful in case it is a communications issue
01-13-2014 04:29 AM
Thanks Mark,
the bpfis is clean and the snapshot works without any problems, I will take a look at the other two logs and get back to the forum
Kev
01-13-2014 04:40 AM
If the bpkar log doesn't give enough there is a touch file you can add to make it write every file it backs up into the bpkar log - hopefully you wont need that but lets see what you get first
01-13-2014 04:54 AM
Backup fails with 7 minutes - I doubt that this is a timeout:
01/13/2014 09:18:11 - begin writing 01/13/2014 09:25:52 - Error bpbrm (pid=21133) socket read failed: errno = 104 - Connection reset by peer
Please check bpbrm and bptm logs on media server in addition to client's bpbkar log.
You may need level 5 logs to trace all info/comms.
Please ask firewall admins once again to monitor network traffic between the media server and client when backup is kicked off. The problem may still be with the firewall, terminating the connection.
01-13-2014 05:30 AM
It certainly looks like something is dropping, I just cannot fathom out at what point and why, I have attached the bpbrm and bptm logs from the media server that is being used for the backup, I will engage our F/W team to take a look also.
Thanks in advance for all the assitance with this
Kev
01-13-2014 05:40 AM
from the log:
bpbrm Exit: unexpected termination of client lonbfbelvis1.corp.ad.timeinc.com
and a few in the other log
Sounds like a crash or an Anti Virus interruption - you dont use McAfee do you?
Anything in the event logs?
Is the client OK for free disk space?
#edit#
Oddly though when looking at the logs the job kicks in at 13:11 and boms out at 13:21 - that is 10 minutes - are you sure your client read timeouts are all set?
01-13-2014 05:51 AM
Hi Mark,
The disk is a 10Tb disk which is only used by 3Tb of data so the snap area is ok > 15%
Just looked at the application logs on the server and got this at 13:21
01-13-2014 05:54 AM
Just had a look on the Symantec tech notes and found this http://www.symantec.com/business/support/index?page=content&id=TECH202598
Looks like a reinstall of the client is required.............
01-13-2014 06:00 AM
There are a lot of fixes in 7.5.0.6 / 7
It may just be worth trying 7.5.0.7 on that client first to see if that resolves it
But if you do have any Anti Virus / Access protection running on the client make sure all NBU processes are excluded
#edit#
That server doesnt have DFSR running on it does it? If so that causes the crash as well so 7.5.0.7 would help
01-13-2014 06:05 AM
Can I use NBU 7.5.0.7 with the lower revision on the media server? didn't think that was Ok, I am looking to upgrade the Master/Media servers to 7.5.0.7 in the next 4 weeks or so.
I could try putting the 7.5.0.6 patch on the client and see what happens, if this requires a reboot (like most things on Windows) then this will have to be change controlled as it is a production server.
Kev
01-13-2014 06:10 AM
Double dot releases are OK - you could have a 7.5.0.1 Master with 7.5.0.6 client
As long as the Master and Media are 7.5.0.x then the client can be 7.5.0.6 or 7.5.0.7
#edit#
Doesnt usually need a reboot on Windows - not since we got rid of VSP in 6.x
01-13-2014 06:18 AM
In that case I will place the 7.5.0.7 patch on the client, I have raised a change fopr Wednesday evening so I will let you know if this works.
Thanks for all your help
Kev
01-15-2014 12:56 PM
Performed an upgrade to 7.5.0.7 and ran the backup, again this failed after 10mins with the dll error in the Win application log, so I unistalled Netbackup, removed the directories and then did a fresh client install of 7.5 and then the 7.5.0.7 patch.
Ran the backup, this created a larger shadow copy on the disk than previously, it ran for approx 19 mins and then bombed out again with the same error.
01-15-2014 02:20 PM
Could you be experiencing the issue in this TechNote?
Check your bpbkar log. You might need to open a case and get an escalation if the listed workaround (which this UNIX guy does not understand) doesn't help.
(I found this document by doing a SymWISE search on "ntdll.dll" [no quotes] with NetBackup Enterprise Server specified as the product.)
01-16-2014 01:19 AM
Just in case it is struggling to cope could you try this...
Open the media servers host properties but from the Clients section (you may have to add it to a test policy if you dont see it there)
Go to the Windows section and then Client Settings section
At the very bottom is the Raw partition read buffer size setting
It defaults to 32kb - change it to 1024 and try it again.
I am fairly sure it is the media server that needs this but it wont hurt to change it on the client too.
It may need a service re-start to register this change
See if that helps at all
01-16-2014 02:33 AM
Hi,
Many thanks for that link CRZ, I will give that a go and let you know.
Mark,
All my NBU estate is on RHEL so there is no option in the Media Server to change that parameter I have modified this on the client and will run a test to see what happens.
01-16-2014 02:45 AM
Hi Mark,
After making that change on the client this has still bombed out with a file read failure (13) after approx 7 minutes of running, I will try the suggestion by CRZ but this will have to be a change request as it means rebooting the client a couple of times.
Kev