Solved: Hi Folks, Just thought I'd - Page 2

Systems_Team · ‎01-13-2014

Hi Folks,

Hoping someone can help shed some light on this one:

Master Server: NetBackup 7.5.0.6 (Windows 2008 R2 SP1 64bit)
Media Server: NetBackup 5230 Appliance Version 2.5.3

Exchange: Exchange 2007 SP3 (Windows 2008 R2 SP1 64bit)

I have a VMware policy configured using application protection for Exchange. This completes successfully, but when trying to do a GRT restore I can expand the Information Store and Storage group with no problems. Attempting to expand the Database however eventually times out and gives the dreaded "ERROR: database system error".

I have been over & over the requirements for Exchange GRT and am sure I have everything correct. To prove it, I created a standard MS-Exchange policy with the "Perform snapshot backups" and "Enable granular recovery" options turned on. This backs up fine (although via LAN - I want be going via SAN and VMware), and I am able to expand all items and go down to individual messages. This seems to indicate I have everything set correctly for GRT restores.

Does anyone have any ideas as to why this might be? I've been banging my head on this one for a little while now. I'd much rather use the VMware policy type with Exchange protection so I get the SAN performance, but also want granular recovery.

Many thanks in advance,

Steve

rawilde · ‎01-24-2014

Steve

I took a look at your problem log and it appears the client is having trouble opening and mounting the VMDK files. From the client logs, I can't make much more out of it than that. Server side (maybe on the appliance side) may need to be investigated more. The client (VDDK library) thinks the file(s) are not able to be opened, almost like its write locked, which it should not be. What is your case number you have opened?

Here is a snippet from your logs where we create a hard link to the vmdk file in the virtual view. This is a virtual directory that should be read/write and VDDK can than access the files. Later on, you see filesystem errors.

1/22/2014 21:42:12.383 V-309-28 [_nbfs_folder_populate()] INF - file Z:\backups\lwsubmail01\allusers\full\1390385094\_vv_lwsubmail01_13903850940\lwsubmail01.vmdk.desc doesn't exists, will link

1/22/2014 21:42:12.383 V-309-28 [_nbfs_folder_populate()] INF - CreateHardLink() succeeded

..

1/22/2014 21:45:24.009 [[fsys\shared] ] <from Producer> VDDK-Warn: FILE: CreateMemberFile FileRename of 'Z:\backups\lwsubmail01\allusers\full\1390385094\_vv_lwsubmail01_13903850940\lwsubmail01.vmdk.desc.lck\E18561.lck' to 'Z:\backups\lwsubmail01\allusers\full\1390385094\_vv_lwsubmail01_13903850940\lwsubmail01.vmdk.desc.lck\M18561.lck' failed: Access is denied! (../BEDSContext.cpp:164)

1/22/2014 21:45:24.056 [[fsys\shared] ] <from Producer> VDDK-Warn: FILE: FileIO_Lock on 'Z:\backups\lwsubmail01\allusers\full\1390385094\_vv_lwsubmail01_13903850940\lwsubmail01.vmdk.desc' failed: Access is denied! (../BEDSContext.cpp:164)

1/22/2014 21:45:24.056 [[fsys\shared] ] <from Producer> VDDK-Log: DISKLIB-PARALLELSSPARSE: AIOMgr_Queue at offset=0, size=64 failed with error 393225.! (../BEDSContext.cpp:164)

1/22/2014 21:45:24.056 [[fsys\shared] ] <from Producer> VDDK-Log: DISKLIB-LINK : "Z:\backups\lwsubmail01\allusers\full\1390385094\_vv_lwsubmail01_13903850940\lwsubmail01.vmdk.desc" : failed to open (The handle is invalid). ! (../BEDSContext.cpp:164)

1/22/2014 21:45:24.056 [[fsys\shared] ] <from Producer> VDDK-Log: DISKLIB-CHAIN : "Z:\backups\lwsubmail01\allusers\full\1390385094\_vv_lwsubmail01_13903850940\lwsubmail01.vmdk.desc" : failed to open (The handle is invalid).! (../BEDSContext.cpp:164)

1/22/2014 21:45:24.056 [[fsys\shared] ] <from Producer> VDDK-Log: DISKLIB-LIB : Failed to open 'Z:\backups\lwsubmail01\allusers\full\1390385094\_vv_lwsubmail01_13903850940\lwsubmail01.vmdk.desc' with flags 0x1e The handle is invalid (393225).! (../BEDSContext.cpp:164)

Systems_Team · ‎01-25-2014

Hi Rawilde,

Many thanks for having a look at that for me. At least I know where to focus the most effort now.

My case number is 05896781. I opened it 3 days ago and have had no contact from Symantec other than the automated email reply acknowledging it so I'll be escalating it as soon as I'm back to work (Sunday here today & tomorrow is a public holiday).

Thanks for your help,

Steve

Bart_S_ · ‎01-28-2014

Finally I've solved the issue. I found in nbfsd log on the media server (vm8r2mbxcok) which is active node of Exchange DAG entries:

11:34:22.276 [15020.9508] <2> logparams: C:\Program Files\Veritas\NetBackup\bin\nbfs mount -server vm8r2mbxsok.nbpdom.win -port 7394 -timeout 60 -retry 11 -cred 8C417A3037223B4BA003E695D58C39B8D46C7ED0211433591911B3C68B4F3A107C348D08D98CCC8CDB79BF02483871602BE1F7ABB87A85F47AFACCB7CD1E0DA9 *

11:34:22.323 [15020.9508] <2> rpc_connect: connecting to vm8r2mbxsok.nbpdom.win

11:34:43.371 [15020.9508] <16> rpc_connect: can't create TCP connection to vm8r2mbxsok.nbpdom.win (12 10060), will retry...

11:34:48.371 [15020.9508] <2> rpc_connect: connecting to vm8r2mbxsok.nbpdom.win

11:35:09.419 [15020.9508] <16> rpc_connect: can't create TCP connection to vm8r2mbxsok.nbpdom.win (12 10060), will retry...

11:35:14.419 [15020.9508] <2> rpc_connect: connecting to vm8r2mbxsok.nbpdom.win

11:35:35.530 [15020.9508] <16> rpc_connect: can't create TCP connection to vm8r2mbxsok.nbpdom.win (12 10060), giving up

It turned out that port 7394 was blocked by Windows firewall on the affected media server (vm8r2mbxsok) which is passive node of Exchange DAG.

So databases (emails) from active node were possible to browsing, becasue image mounting was done locally, whereas databases from passive node had to communicate between two media servers - Exchange nodes. The port was blocked, so there was an error.

What's more, during the weekend GRT duplication from disk to tape failed with error 191 from this affected media server. This led me to another article with port requirements for GRT. After opening port 7394 duplication finished successfully and now I can browse emails.

Thanks for Your help :)

Bart

rawilde · ‎01-28-2014

Bart

Glad to see the issue resolved. I wish the product was easier to diagnose ports being blocked issues.

Systems_Team · ‎02-26-2014

Hi Folks,

Just thought I'd put in an update about what my particular issue was in this case - very interesting. Was working it through with support when we found it. Moderators - you can lock this thread afterwards if you like :) Thx, Steve

We found the answer to this one by accident. One of my network engineers came to me about a week ago and said he had some suspicious traffic between two devices (the appliance and Exchange server for this case). He asked if I would check it out for him.

The data for this NetBackup environment traverses a Fortigate firewall/security appliance with IPS/IDS. I had already checked all possibilities while setting up and trying to troubleshoot the GRT browse issue - so I was aware this firewall was in place, and we had already confirmed (multiple times) that all required ports were open. What I wasn't aware of was that the IDS/IPS was scanning all traffic for vulnerabilities, and where detected engaging an active response (ie: drop traffic). The Fortigate was not set up to send automated alerts when something was detected, so it was only when our network engineers logged in to the device for a check that this was discovered.

The vulnerability as detected by the Fortigate device was: http://www.fortiguard.com/encyclopedia/vulnerability/#id=34667

The Microsoft write-up of this issue from the Windows side is: http://technet.microsoft.com/en-us/security/bulletin/ms13-014

Once we were aware of this, a rule was put in place to ignore traffic between the appliance and Exchange server - GRT browsing then started working. I had some additional issues in getting the restore to actually complete, but that was a minor issue in which the client was attempting to contact the appliance using a short DNS name. A minor tweak and having it use the FQDN resolved that issue and it now works end to end :) We had a similar issue with our production site (although this was throwing a File I/O error rather than Database error) - the same ignore rule was applied at the production site and GRT browsing worked there as well.

I reviewed the Microsoft KB article, and although I could not see the specific patch installed in either Exchange server, judging by the release date I would suspect that this fix has already been applied in a later security rollup update. I reviewed CVE vulnerabilities for NetBackup software and appliances and could find nothing relating to this. Also of interest is that during the initial part of this issue I did have Symantec Endpoint Security installed on the Exchange server, and this had reported nothing.

The previous engineer on this case before yourself pointed out in the NCFLBC logs that we were seeing issues with renaming files. I believe this is where the DOS attack that the Fortigate detected was coming from, as if you read the detail of the Microsoft KB it talks about attempting to rename a file on a read-only file system causing the issue described.

A couple of other points of interest: During an Active Directory GRT browse, and also during a traditional Exchange Snapshot (non-VMware) GRT browse, this issue was not evident. It was only via a VMware Exchange GRT browse that an active response was triggered on the Fortigate appliance. I'm not sure if you want to pass this info on to any of your other teams, as I guess it is quite possible any IDS/IPS system scanning and responding to this vulnerability could create the same scenario.

VOX

Exchange GRT restore issue - NetBackup 7.5.0.6