03-01-2012 02:16 AM
Hi All,
Recently; i've been experiancing issues in a SAN client. Let me explain how its all setup,
- San Client: Windows 2008 R2 EE, on QLE2462.
- FT Media Server: RHEL 5.5 x64, targer HBA on QLE2462
- Zoning: WWPN of one port in SAN Client is zoned to both target HBA port's WWPN.
- Master Server: Windows 2008 R2 EE
Initially 4 ARCHIVE Python device were showing up in the client device manager and backups were running fine. Now, i see the ARCHIVE Python devices disappering most of the time. Its unusual, that sometimes i see 4 ARCHIVE Python devices and then they decrease to 3 or 2 and so on; also i do see SCSI devices (attached the screen shot). I can feel that you may question on Zoning, i did get same thought and checked there were no changes in zoning.
I checked the FT target device state, they are active and FT service are running.
[root@FTSERVER admincmd]# ./nbftconfig -listtargetsm FTSERVERd 0 0 w # active FABRIC QLE246x Series FC Hba Qlogicd 0 1 w # active FABRIC QLE246x Series FC Hba Qlogicd 1 0 w # active FABRIC QLE246x Series FC Hba Qlogicd 1 1 w # active FABRIC QLE246x Series FC Hba Qlogic
Solved! Go to Solution.
03-02-2012 03:53 AM
You have either a faulty HBA, faulty cables, a bad switch or a bad library - but somethiung is sending resets to the device and then it shows as being in use
You need to see if you can get logs from everything to see if you can pin it down.
There are setting on some switches that can cause a reset as well as on the HBA itself.
Also worth checking your media servers for ophaned bptm processes that can be causing issues in view of the "resource is in use comment" - this is what you get when Removable Storage Service or TUR has not been dealt with but also when bptm / bpbrm processes have been orphaned.
Are you sure nothing else is zoned to it that shouldn't be?.
03-01-2012 02:41 AM
Windows 2008 doesn't have the Removable Storage Service installed by default but check that it does not have it and if it does stop and disable it as this will take the drives away from NetBackup.
Next make sure that Windows is not interferring with the tape drives via its drivers by disabling the Windows Tape Unit Ready commands - see this tech note:
http://support.microsoft.com/kb/842411
This applies in the same way to all Windows versions
03-01-2012 08:00 PM
Mark,
There is no Removable Storage Service; i have disabled TUR as per the MS technote and rebooted the client. The issue still persists.
03-02-2012 01:51 AM
Firewall / Antivirus blocking something - you don't use McAffee do you?
03-02-2012 02:28 AM
Any errors in Event Viewer System and/or Application logs?
Windows Storport driver is known for 'misbehaving'.
03-02-2012 03:47 AM
Hi Marianne,
Event viewer for system was reporting warning for source QL2300
Reset to device, \Device\RaidPort1, was issued.
Event messages for application source QLManagementAgentJava
Error: RetrieveTargetDataForTargets: Unable to get target data (0xaa) (The requested resource is in use. )
Warning: RetrieveLunDataForTargets: Unable to get lun data (0xaa) (The requested resource is in use. )
Performed OS updates, upgraded the firmware and drivers for SAN client and still i see target LUNs disappearing after a while. Is the HBA adapter is faulty?
03-02-2012 03:50 AM
Mark,
Yes , i have McAfee running in the SAN client.
03-02-2012 03:53 AM
You have either a faulty HBA, faulty cables, a bad switch or a bad library - but somethiung is sending resets to the device and then it shows as being in use
You need to see if you can get logs from everything to see if you can pin it down.
There are setting on some switches that can cause a reset as well as on the HBA itself.
Also worth checking your media servers for ophaned bptm processes that can be causing issues in view of the "resource is in use comment" - this is what you get when Removable Storage Service or TUR has not been dealt with but also when bptm / bpbrm processes have been orphaned.
Are you sure nothing else is zoned to it that shouldn't be?.
03-02-2012 04:15 AM
Nothing else is zoned. Also, I tried Zoning both ways; but resulted to same issue.
Zone a: Port 1 of target HBA + Client port1
Zone b: Port 2 of target HBA + Client Port1
I'm sure that SAN switch can't be faulty, as other zoned devices are working without issues.
Shut down the NBU service in FT media server, there were no orphaned processes.
I will try changing the FC cable on Monday and update you.
Its evening in India, and i got to knock off. Happy Weekend!!!
03-02-2012 04:30 AM
Just noticed that you have said you do have McAffee - worth disabling it for a while if possible and also rename the \system32\drivers\MFETDIK.sys file then rebooting - this is its own firewall driver and can cause all sorts of issues:
03-05-2012 05:03 AM
Uninstalled McAfee and used different FC, still the same. So this if forcing me to replace the target HBA, if that didn't work, blame Microsoft?
03-09-2012 03:16 AM
Mark,
As you guessed that switch could be faulty; on checking the SAN switch logs for client connected port, appeared the port was going offline many times.
Fabric log reports the switch port going offline many times.
Time Stamp Input and *Action S, P Sn,Pn Port Xid
==================================================================
13:32:56.206 SCN LR_PORT (0);g=0x65b D2,P0 D2,P0 6 NA
13:32:56.209 SCN Port Online;g=0x65b D2,P0 D2,P1 6 NA
13:33:00.223 SCN Port Offline;g=0x65d D2,P1 D2,P0 6 NA
13:33:00.243 *Removing all nodes from port D2,P0 D2,P0 6 NA
13:33:02.270 SCN LR_PORT (0);g=0x65d D2,P0 D2,P0 6 NA
13:33:02.276 SCN Port Online;g=0x65d D2,P0 D2,P1 6 NA
Connected the client to different FC port, BINGO!!! issue got resolved. I'm marking your post as solution.
03-09-2012 03:27 AM
Great news! - glad to have helped