cancel
Showing results for 
Search instead for 
Did you mean: 

EV FSA and Windows 7

Richard_Senda3
Level 4
Hello,

We are seeing a problem with Enterprise Vault File System Archiving and Windows 7 when archived files have been migrated to tape using the NetBackup migrator.

The problem
We have EV FSA configured to initially archive a file server off onto a disk and then to move the older archived files onto tape using the EV Collector and NetBackup.  We have noticed an issue in recalling archived files from tape on the Windows 7 platform.  When a user on a Windows 7 workstation tries to recall an archived file that has been moved to tape, sometimes the following error message is shown to the user:

Network error
There is a problem accessing \\server\share\file.name
Make sure you are connected to the network and try again


EV FSA does correctly recall the file - it is returned to the file server, and if the user presses the “Try Again” button, the recall continues with the other files the user has requested.

Diagnostics
From extensive testing, we have found that this behaviour occurs under the following circumstances:
  1. The workstation accessing the file share containing the archived files is running Windows 7
  2. The archived file requested is contained on tape and that the recall from tape takes longer than 2 minutes 30 seconds to complete
This problem does not occur on Windows XP, Windows 2003 and Windows 2008.  We only see the problem in Windows 7.  Unfortunately, we have not been able to test Windows Vista.

To try and see what is going wrong, we used "WireShark" to see what the SMB protocol was doing on the network interface.  We see the following:

1. Windows 7 SMB tries to query the file to obtain its properties.  As the file is “offline”, it is unable to do so, so times out
2. Windows 7 SMB then waits 60 seconds to retry
3. Windows 7 SMB makes a second attempt to obtain the file properties, again, it is unable to do so, so times out
4. Windows 7 SMB then waits another 60 seconds
5. Windows 7 SMB tries a 3rd time to obtain the file properties, by this time, the recall from tape has still not been completed and so Windows 7 is still unable to obtain the file properties
6. At this point, the above error is shown to the user.  This process has taken roughly 2 minutes 30 seconds to get to this point.

It seems that if Windows 7 does not receive the file property details after 3 attempts, then this error is shown to the user.

How to reproduce the problem
The problem can easily be repeated by doing the following:
1. Use EV to archive some files from a file server and leave placeholder stubs behind
2. Migrate the archived files to tape using the EV collector and NetBackup
3. Once the migration has completed, open up the NetBackup Administration Console
4. Navigate to Media and Device Management -> Device Monitor.  A list of the tape drives should be shown on the screen
5. Select all of the tape drives and select Actions -> Down.  This will disable the tape drives
6. Using a Windows 7 workstation, attempt a recall of the archived files by copying them to the desktop or C drive
7. After a few seconds, a restore job should appear in NetBackup.  If you examine the restore job, it should state:
  awaiting resource TB0634 A pending request has been generated for this resource request.
  Operator action may be required. Pending Action: All drives down.,
8. At this point, wait at least 3 minutes
9. After 3 minutes, in NetBackup, navigate to Media and Device Management -> Device Monitor.  A list of the tape drives should be shown on the screen in red.
10. Select all of the tape drives and select Actions -> Up.  This will enable the tape drives and the restore should now complete
11. After the restore is complete and a CAB/DVS file decompressed, the network error message should be shown on the client in Windows 7
The key thing is that we only see the problem after 2 and a half minutes.  By downing the tape drives, we can force NetBackup to take longer than 2 and a half minutes to restore the files from tape thereby reliably reproducing the fault.

Has anyone else seen this issue?  Any ideas how to fix it or work around it?

Thanks.
7 REPLIES 7

Rob_Wilcox1
Level 6
Partner
It sounds like you have done a fairly thorough investigation, and, I would advise you to contact Support to work through them for reproduction and escalation purposes.
Working for cloudficient.com

Edwork
Level 4
Would any update from Symantec support for your case ? my support PC will upgrade to W7 also, I would like to see what fix we should plan to use.

Rob_Wilcox1
Level 6
Partner

To Edwork...

Did Richard create a support call? 

If not, then you should, and you can reference the thorough work which Richard did.
Working for cloudficient.com

Richard_Senda3
Level 4

Edwork, Rob, and indeed everyone else :D

I raised support cases with both Symantec and Microsoft.  For the last 6 months I have been trying to resolve this with both companies without success, but I think we have got closer to what is causing the problem.

First, lets recap the problem:

We have EV 9 FSA set up to migrate older files from disk archive onto tape using the NetBackup migrator.  When a user tries to recall a file that has been migrated to tape, sometimes the following error is shown:

Network error
There is a problem accessing \\server\share\file.name
Make sure you are connected to the network and try again

Or sometimes they get this one:

Copy file
An unexpected error is keeping you from copying the file.  If you continue to receive this error, you can use the error code to search for help with this problem
Error 0x80070057: The parameter is incorrect.
Try again Skip Cancel

Here is what we have found:

From 6 months of working with both Symantec and Microsoft on this issue, we have found this problem occurrs under the following circumstances:
1. The file server containing the archived files is running Windows 2008 or Windows 2008 R2
2. The user who is recalling the files is on a workstation running Windows Vista, 7, or any version of Windows 2008
3. The recall from archive takes longer than 2 minutes

Point 3 is particularly important as We have found that this problem occurrs not just with archived files migrated to tape, but also with archived files that are on disk when the recall from disk takes longer than 2 minutes - this would happen if the archived file is particularly large, for example a 20GB video file, it would take EV some time to recall such a large file from a disk archive.

In addition, we have also been able to repeat the problem using Windows XP and 2003, but it is much less likely to occurr on these platforms.

Cause of the problem:
Windows XP and Server 2003 use the SMB 1.0 protocol.  This has a timeout value which controls how long it will wait for an offline file.  The registry key that controls this is:
HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\parameters\OfflineFileTimeOutIntervalInSeconds
The default for this key is 15 minutes.  If you lower this value to say 1 second, you will see the same error as I have put at the start of this email.


Starting with Windows Vista, Microsoft included a new version of the SMB protocol - SMB 2.0.  Microsoft have stated that there has been a design change in SMB 2.0 over how it handles offline files and this registry key no longer works.  In our particular set up, the workstation and server will default to using the new SMB 2.0 protocol - the server is Windows 2008 and the workstation Windows 7.   

I suspect that this design change is what is causing the problem.  We have found that if we disable the SMB 2.0 protocol and force the workstation/server to use SMB 1.0 (the same as what XP and 2003 use), we do not see the problem and the recall from archive works perfectly (subject to the reg key above).

This is not a fix, but a workaround and there are performance implications through doing it.  Here is how to disable SMB 2.0:
HKLM\SYSTEM\CurrentControlSet\serivices\LanmanWorkstation\DependOnService
Remove the entry MRxSmb20 from this key

HKLM\SYSTEM\CurrentControlSet\Services\mrxsmb20\Start
Change this from 3 to 4 to disable it

I am a little surpised that Symantec has not come across this problem with other customers.

Richard.

Liam_Finn1
Level 6
Employee Accredited Certified

I'm not here to give you any input on your issue but to give you props for your detailed investigation. 

 

The world need more dedicated and detailed IT people like you. 

WiTSend
Level 6
Partner

I agree with Scanner...  Great investigative job and great info.  We all benefit from each others experience in cases like this.

Rob_Wilcox1
Level 6
Partner

Well Richard, I echo what other people are saying...  Thank you for not giving up.

Please send me a private message, or email through the forums with the case reference in it, and I will follow up with the Support person.

The issue has so far not gotten as far as the Customer Focus Team, and therefore has not been "shown" to Engineering.... and so any potential fix/workaround/reproduction has not yet been performed - at least not by us.

Working for cloudficient.com