MSMQ Outgoing Queues referencing failed EV server - using Building Blocks for High Availability
Hi Everyone
We make use of Building Blocks to build High Availability in our EV environment. We have come across a peculiar problem and wanted to know if anyone here has noted a similar issue.
Scenario:
We are in the archiving schedule and there are (say) 5 EV Mailbox Archiving Servers archiving 5 Exchange Servers. There are cross-references across EV and Exchange Servers i.e. the Storage Service for an archive is on an EV server other than the one which is running the Archival Task.
Problem:
If an EV Mailbox Archiving Servers fails, we use Building Blocks to quickly bring up services on a standby server by running Update Service Locations. However, we notice that items are still in archive pending state in user mailboxes being archived by other EV Servers. New items are also going into archive pending state across all user mailboxes.
Analysis:
We have already run a flushdns (or Clear-DNSClientCache in Powershell) across all EV servers. In the MSMQ Outgoing Queues, we noticed that there are queues which still resolve to the old failed servers and are in Failed to Connect or Waiting to Connect state, since MSMQ does not resolve to the EV aliases but the EV server hostnames.
Solution:
To resolve the issue, we need to add a host file entry resolving the hostname of the failed server to the IP address of the now active server and restart the storage/taskcontroller service.
Has anyone of you come across a similar situation? We wish to avoid making host file entries and make the failover process seamless.