07-26-2021 03:34 PM
Longtime listener, first time poster. We're currently running EV 12.1 on 4 servers (2x storage/task and 2x Indexing). Found that in the past 2-3 weeks, get this intermittent issue on 1 of the indexing servers.
Our EV stores go into backup mode at 445am and then come out of this at 615am. For one of the servers, sometimes I find the indexing service has failed to restart after an hour or I find that event id 7318/41334 has not generated.
For the index server failed issue, I generally find that a reboot of the server fixes it.
For the event id, usually index store is stuck in backup mode so I have to run Clear-IndexLocationBackupMode to manually clear it.
Anyone else had a similar experience?
07-26-2021 10:40 PM
when saying the indexing service has failed, are you restarting the service as part of clearing the backup mode and it fails to come up properly afterwards? Do you see any events there?
I believe there could be many reasons for this happening. Seeing that you've mentioned event 7318 which usualy means that there are still 32 Bit index volumes. Are you able to upgrade them to 64 Bit?
Do you see the local resources (cpu, memory) peaking when setting / clearing the backup mode?
There might also be a problem with a specific index volume that leads to it. The past has shown that checking the system-reporting logs within the index metadata (...\indexmetadata\reporting\system-reporting) helps identifying problematic volumes. The files in there are sqlite database which can be viewed using something like dbbrowser for sqlite. You can make use of the following query to check if a specifc volume comes up many times on these daily database files and try to rebuild it.
select c.value, count(*) from error e join collection c on e.ID = c.ID
group by c.value
order by count(*) desc
07-26-2021 11:48 PM
No the restarting the index is independent of the clearing backup mode. If I see the index fail (usually early in the morning like 7:30am) then I need to restart the server to get things going again as iisreset doesn't help. If find in the morning that index location is still in backup mode, then I'll run the clear-indexlocationbackupmode command. However in saying that, I did have an incident yesterday where I was trying to run clear-indexlocationbackupmode, and it appeared to get stuck as the shell window remained open for quite some time, and eventually it caused my Storage Queues to get stuck and the count to grow so I ended up rebooting the index server to get everything functioning again.
Sorry yes we do still have a lot of 32bit indexes (we've had EV in our environment since 2004/05) so we just upgrade the indexes to 64 bit as we need to.
Yes when I clear backup mode, I do see the CPU on the server climb and peak for about an 30mins to an hour then it settles down for the rest of the day.
For the specific index volume that you mentioned, is this the same as the event ID 41352 that appears on the EV logs of the index server?
Thanks for your input.
07-27-2021 01:05 AM - edited 07-27-2021 01:08 AM
I assume you are using an Index Server Group (as you say 2 storage/2index servers).
Is it the same index failing every time? There might then be an issue with either an index file, or there might be an issue on the storage (a flipped bit or something). If possible, do a full verification to see if that reports errors, then try a rebuild for the index, or for the belonging archive (which I hope is not a Journal Archive).
Besides upgrading the indexes to 64-bit (which will make indexing use memory a lot more efficient), it might be worth to re-install the Indexing engine on that one server.
See THIS KB article on how to do it.
Eventually, it might also be worth upgrading to 12.5.3 (or 14.1) but that is your choice.
As final remark, I have experienced the same issue in the past also. Also on 1 particular storage server (clearing backup mode on Vault Stores). It was only after several reboots, and a re-install of the EV Binaries that it was resolved. I do see this sometimes happening after Microsoft patches have been applied (especially .Net ones), but most of the time this disappears after a few days.
07-27-2021 03:36 AM
Correct we have an index server group setup. It's not any particular index, nothing corresponding to the time of the failure directly, it's the EV indexing service itself that trips up, I've corrected the subject title (we don't use journaling).
We currently have 26,000 32-bit and 50,000 64-bit indexes.
Currently I'm reading some doco reading and plan to upgrade to 12.5.3 in the next month since down the track we need to get off the Win2012 servers we're currently on.
Unfortunately or fortunately depending how you view it, we do have a pretty aggressive Windows server team that do perform monthly windows patching and throw in the occasional McAfee update here and there.
The environment has been pretty stable for the past 12 months but the last 3-4 weeks has been a bit of a headache.
Thanks for your feedback.
07-27-2021 04:50 AM
That is quite a number of indexes to host on 2 servers. Could it be they are a bit overloaded?
I would start upgrading those 32-bit ones first. If I recall correct, you can select which ones to do. I would go from small to large, to make sure you cover as much as possible. As you say, it started 3/4 weeks ago, perhaps you can find out from that server team which patches have been applied. You never know...
07-28-2021 02:11 AM
I should clarify that we only have about 10,000 active mailboxes currently so most of those indexes are old inactive mailboxes, does this still affect the performance?
With the patching side, I thought it might have been related but one of the storage/task servers was patched a week earlier and no issues or symptoms. The remaining storage/task server and the 2x indexing servers were patched the following weekend. 2/3 days following this, we had a different issue where all the Storage Queues were getting stuck so no-one could archive. 3 days we spent with tier 1 and tier 2 support trying to resolve the issue (increased drive space, move queues to another drive, tweak archive settings, disable AV, restarted storage/task servers multiple times, re-registering the EV dll files.) and nothing fixed it until I decided to restart one of the indexing servers and everything worked again! 2 days after this, this intermittent issue of the EV indexing service and ev-clearbackupmode started to crop up.
Never a dull moment in IT!
09-13-2021 06:14 PM
Just a quick update for anyone who finds this down the track, for the past few weeks I've been upgrading the 32bit indexes and so far I've done 16,000 of the 26,000 indexes. For the past 2 weeks I haven't had the issue crop up again.