07-30-2010 06:21 AM
07-30-2010 09:49 AM
08-09-2010 03:11 AM
08-09-2010 03:21 AM
11-10-2010 09:44 AM
Edwork, Rob, and indeed everyone else :D
I raised support cases with both Symantec and Microsoft. For the last 6 months I have been trying to resolve this with both companies without success, but I think we have got closer to what is causing the problem.
First, lets recap the problem:
We have EV 9 FSA set up to migrate older files from disk archive onto tape using the NetBackup migrator. When a user tries to recall a file that has been migrated to tape, sometimes the following error is shown:
Network error
There is a problem accessing \\server\share\file.name
Make sure you are connected to the network and try again
Or sometimes they get this one:
Copy file
An unexpected error is keeping you from copying the file. If you continue to receive this error, you can use the error code to search for help with this problem
Error 0x80070057: The parameter is incorrect.
Try again Skip Cancel
Here is what we have found:
From 6 months of working with both Symantec and Microsoft on this issue, we have found this problem occurrs under the following circumstances:
1. The file server containing the archived files is running Windows 2008 or Windows 2008 R2
2. The user who is recalling the files is on a workstation running Windows Vista, 7, or any version of Windows 2008
3. The recall from archive takes longer than 2 minutes
Point 3 is particularly important as We have found that this problem occurrs not just with archived files migrated to tape, but also with archived files that are on disk when the recall from disk takes longer than 2 minutes - this would happen if the archived file is particularly large, for example a 20GB video file, it would take EV some time to recall such a large file from a disk archive.
In addition, we have also been able to repeat the problem using Windows XP and 2003, but it is much less likely to occurr on these platforms.
Cause of the problem:
Windows XP and Server 2003 use the SMB 1.0 protocol. This has a timeout value which controls how long it will wait for an offline file. The registry key that controls this is:
HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\parameters\OfflineFileTimeOutIntervalInSeconds
The default for this key is 15 minutes. If you lower this value to say 1 second, you will see the same error as I have put at the start of this email.
Starting with Windows Vista, Microsoft included a new version of the SMB protocol - SMB 2.0. Microsoft have stated that there has been a design change in SMB 2.0 over how it handles offline files and this registry key no longer works. In our particular set up, the workstation and server will default to using the new SMB 2.0 protocol - the server is Windows 2008 and the workstation Windows 7.
I suspect that this design change is what is causing the problem. We have found that if we disable the SMB 2.0 protocol and force the workstation/server to use SMB 1.0 (the same as what XP and 2003 use), we do not see the problem and the recall from archive works perfectly (subject to the reg key above).
This is not a fix, but a workaround and there are performance implications through doing it. Here is how to disable SMB 2.0:
HKLM\SYSTEM\CurrentControlSet\serivices\LanmanWorkstation\DependOnService
Remove the entry MRxSmb20 from this key
HKLM\SYSTEM\CurrentControlSet\Services\mrxsmb20\Start
Change this from 3 to 4 to disable it
I am a little surpised that Symantec has not come across this problem with other customers.
Richard.
11-10-2010 11:04 AM
I'm not here to give you any input on your issue but to give you props for your detailed investigation.
The world need more dedicated and detailed IT people like you.
11-10-2010 11:19 AM
I agree with Scanner... Great investigative job and great info. We all benefit from each others experience in cases like this.
11-10-2010 12:50 PM
Well Richard, I echo what other people are saying... Thank you for not giving up.
Please send me a private message, or email through the forums with the case reference in it, and I will follow up with the Support person.
The issue has so far not gotten as far as the Customer Focus Team, and therefore has not been "shown" to Engineering.... and so any potential fix/workaround/reproduction has not yet been performed - at least not by us.