cancel
Showing results for 
Search instead for 
Did you mean: 

1m items waiting to be indexed

Rem_y
Level 4

Morning all, can anyone suggest a direction I can take re troubleshooting Journal and mail indexing?

I have a 41021 in my VAC - Archived items waiting to be indexed for both Journal and Mail archives

There's no build up in either Outlook or the MSMQ's - I've checked the Indexes are not in backup mode (I put them in and out anyway - no change)

Version is 9.0.1.1073

It's not an area I've had much to do with so far, so I'm reading all the Symantec technotes I can find online re troubleshooting etc - I'm just not very adept at applying what I'm reading, and wondered if anyone had any suggestions to help narrow the scope of my research

Best regards

1 ACCEPTED SOLUTION

Accepted Solutions

Ben_Watts
Level 6
Employee Accredited

Just to shed a bit more light on the issues and resolution seen here as well as answer some of the questions raised.

 

Items can be poison pilled due to EV being unable to index them, not simply due to data loss, the reasons for EV being unable to index them can be many and varied.

The PoisonPill count was increased as a precaution and to allow the rebuild to complete, none of the index volumes (all 3182 of them) were in a failed state and neither did they have an excessive amount of failed items present, in total there were less than a 1000 failed items across all of the needed index volumes.

Most of these 3182, around 3174 of them to be precise, were empty and not needed hence they needed to be cleared out of SQL and EV made aware they were not to be tried again.

EV for some reason had an issue with adding items to the index volumes for that Journal Archive, due to the LUN failure, and would therefore create a new Index Volume folder each and every time a new scan and update was started. This had been going on for a little while now hence the over 3180 index volumes.

For a JournalArchive with around 40,000,000 items in it a failure of 1000 items is nothing that cannot be looked into after the index has been rebuilt and then corrected, the main issue is to get the index rebuilt and fully functional so that any needed searches can be carried out on it and the backlog of items awaiting indexing is cleared down.

 

Dtrace was captured and checked, there were no looping items, large or corrupt items etc this was not a normal index volume failure in the common sense of the phrase. Yes we (Symantec) could have delved in MUCH deeper and even asked for other manufacturers Support to get involved to find the exact root cause but I believe that is mentioned above with regards to the LUN presentation and CHKDSK run against the index locations (which not only affected the Journal Index but also a lot of the normal mailbox indexes).

The end result would still have been the same, an entire restore or rebuild (if a good backup is not available) of the index volumes needed to be carried out to clear all the present index volume folders and create new ones. The only other way to correct this issue would have been to delve into SQL and make all the changes manually, either way the it would have been doing the exact same thing except one had a higher chance of something being missed and going wrong in the process.

With regards to running into the exact same issue further down the line I do not believe that will happen, simply because of the reason for the failure and the investigation carried out so far on the situation.

 

p.s. no prizes are given out for guessing who worked the case from the Symantec Support side of things...

View solution in original post

21 REPLIES 21

Arjun_Shelke
Level 6
Employee Accredited

Arjun_Shelke
Level 6
Employee Accredited

How many IndexServers.exe you see in task manager on EV Server? Check if the MaxIndexServers registry key is set on EV Server.

HKLM/software/KVS/Enterprise Vault/indexing
DWord: MaxIndexServers

Pradeep-Papnai
Level 6
Employee Accredited Certified

I would suggest to start troubleshooting with information supplied in TN http://www.symantec.com/docs/TECH50268

If the IndexCommited = 0 query returns a large number of items, run a second SQL Query to determine what specific Indexes are falling behind. The following SQL Query will list all of the Indexes in a VaultStore that require their Indexes to be updated, and will provide how far behind these archives are.

Select ArchivePoint.ArchivePointId, Count (*) As NumberofItemsNotIndexed
From JournalArchive INNER JOIN ArchivePoint ON JournalArchive.ArchivePointIdentity = ArchivePoint.ArchivePointIdentity
WHERE     (IndexCommited = 0)
GROUP BY ArchivePoint.ArchivePointId,JournalArchive.ArchivePointIdentity
 
Look in the EV Event Viewer log for errors relating to these Indexes.  As Index updates to some of the Indexes listed may have failed. If the indexes have failed then refer to TECH54272 in the Related Articles section for information on how to update or rebuild an index.

Rem_y
Level 4

Hi Advisor, I see 1 IndexBroker.exe and 22 IndexServer.exe

When I browse to HKLM/software/KVS/Enterprise Vault/indexing there is no indexing folder?

Would you recommend making the folder location and ading the key then to limit IndexServer.exe?

 

GabeV
Level 6
Employee Accredited

By default, when this reg key is not present, its value is set to 24. Do you see any indexing errors in the event log? Could you try to restart the indexing service and monitor the event logs? Also, the query provided by EV-Counselor might help you to determine which archive has items waiting to be indexed:

Select ArchivePoint.ArchivePointId, Count (*) As NumberofItemsNotIndexed
From JournalArchive INNER JOIN ArchivePoint ON JournalArchive.ArchivePointIdentity = ArchivePoint.ArchivePointIdentity
WHERE     (IndexCommited = 0)
GROUP BY ArchivePoint.ArchivePointId,JournalArchive.ArchivePointIdentity

You can take a look to the index volumes to determine if they are marked as failed. You cna also go to the VAC, right click over 'Archives' and click on 'Index Volumes...' for more details.

I hope this helps.

JesusWept3
Level 6
Partner Accredited Certified

I wrote an article on this way back for what its worth:
http://www.symantec.com/connect/articles/how-long-will-my-index-take-rebuild

T
ypically with indexing falling that far behind, as has been said before, check the JournalArchive table to see if that number is truly correct or not, sometimes the SQL Statistics and what not may be reporting incorrect or stale information 

One query you could run would be the following (assuming your Directory database and vault store database are on the same SQL Server)

SELECT A.ArchiveName "Archive Name",
       CE.ComputerName "Index Server",
       (IRP.IndexRootPath + '\'+ IV.FolderName) "Index Folder", 
       COUNT(JA.SavesetID) "Items Awaiting Index",
       IV.FirstItemSequenceNumber "FirstISN", 
       IV.HighestItemSequenceNumber "Last ISN", 
       IV.IndexedItems "Indexed Items", 
       CASE IV.Rebuilding
        WHEN 0 THEN 'Normal'
        WHEN 1 THEN 'Rebuilding'
       END "Rebuilding Status", 
       CASE IV.Failed
        WHEN 0 THEN 'Normal'
        WHEN 1 THEN 'Failed'
       END "Failed Status",
       CASE IV.Offline 
        WHEN 0 THEN 'Online'
        WHEN 1 THEN 'Offline'
       END "Offline Status", 
       IV.FailedItems "# Failed Items"
FROM   EnterpriseVaultDirectory.dbo.ComputerEntry CE,
       EnterpriseVaultDirectory.dbo.IndexingServiceEntry ISE,
       EnterpriseVaultDirectory.dbo.IndexRootPathEntry IRP,
       EnterpriseVaultDirectory.dbo.IndexVolume IV,
       EnterpriseVaultDirectory.dbo.Archive A,
       EnterpriseVaultdirectory.dbo.Root R,
       EVVSYourVaultStore_1.dbo.ArchivePoint AP,
       EVVSYourVaultStore_1.dbo.JournalArchive JA
WHERE  A.RootIdentity = R.Rootidentity
  AND  R.RootIdentity = IV.RootIdentity
  AND  IV.IndexRootPathEntryId = IRP.IndexRootPathEntryId
  AND  IRP.IndexServiceEntryID = ISE.ServiceEntryId
  AND  ISE.ComputerEntryId = CE.ComputerEntryId
  AND  R.VaultEntryId = AP.ArchivePointId
  AND  AP.ArchivePointIdentity = JA.ArchivePointIdentity
  AND  JA.IndexCommited = 0
GROUP BY A.ArchiveName, CE.ComputerName, 
         IRP.IndexRootPath, IV.FolderName,
         IV.FirstItemSequenceNumber, IV.HighestItemSequenceNumber,
         IV.IndexedItems, IV.Rebuilding, IV.Failed, IV.Offline, IV.FailedItems
ORDER BY "Items Awaiting Index" DESC
https://www.linkedin.com/in/alex-allen-turl-07370146

JesusWept3
Level 6
Partner Accredited Certified

Oh and also to note, you may want to check against JournalUpdate and JournalDelete as well

https://www.linkedin.com/in/alex-allen-turl-07370146

Rem_y
Level 4

Hi Advisor, no it's a single EV - no clustering

 

Rem_y
Level 4

Brilliant, thanks EV Counselor

Arjun_Shelke
Level 6
Employee Accredited

So do you see any failed indexes on the EV Server?

Rem_y
Level 4

Update, looks like we are close to a solution, so for anyone else happening on this thread I hope you find the following useful

I had a Symantec analyst look at it with me and it was suggested we rebuild the whole thing - looking at about 33 days to complete

Here is a precis of what Symantec had to say:

The Journal Archive for had over 3100 index volumes associated with it, none of those were in a failed state but it appeared that EV simply had an issue with one of the volumes and was causing it to create new volumes without reason simply because it couldnt add to existing volumes.

 

The fastest and easiest way to resolve this would be to restore the index folders from backup for that archive then carry out a repair against the index volumes, unfortunately these weren't available so we went for the rebuild against the entire Archive Index so we kicked this off and confirmed the items were being added to the new index and the ~3100 index folders that had previously been present were no longer there or active.

 

We appeared to be adding items at an acceptable rate, keep in mind that this is reliant other outside forces/demands on the EV server and could fluctuate but at a good rate I would expect to see around 50,000 items added to the index each hour.

 

We ran the below query against the SQL DBs to find out how many items were in that archive:-

select count(*) from saveset where archivepointidentity = '8'  (Being the archivepointidentity which related to the Journal Achive)

 

This gave us a count of ~40,000,000 which gives us a rough rebuild time of around 35 days (dependant on server load, restarts, storage retrieval speeds etc)

 

We then added the below regkeys to up the failed items limit to ensure there was a better chance of the rebuild being successful:-

HKLM\Software\KVS\Enterprise Vault\PoisonPillCount  =  DWORD (1)   -  This sets how many times we try a failed item before we mark it as poison pilled and stop trying to index it (Default is 3 times)

HKLM\Software\KVS\Enterprise Vault\MaxConsecutivePoisonPillItems  =  DWORD (1000)  -  This sets how many items can be marked as Poison Pilled before we mark the index volume as failed and stop the rebuild.

 

 

If the rebuild fails due to too many failed items simply change that regkey above, restart the indexing service and start an UPDATE or REPAIR on the index volume in question, there is no need to start the rebuild from scratch again.

 

 

 

Pradeep-Papnai
Level 6
Employee Accredited Certified

Thanks for update Casini.

JesusWept3
Level 6
Partner Accredited Certified
Did they give any reason as to why the update was failing because it sounds like sloppy troubleshooting to me
https://www.linkedin.com/in/alex-allen-turl-07370146

Rem_y
Level 4

Hi JesusWept, it's a good point,

My understanding of Indexing so far leads me to believe in general they fail because of one of the following:

  • A corrupt item
  • A massive item
  • Number of corrupt items hitting threshold
  • Resource issues requiring the reg key NumWordsToWrite

Also:

  • Watch for opportunistic locking
  • Issues relating to TCP offload / Chimney

Does anyone agree/disagree?

In terms of troubleshooting, I can't DTrace the root cause as it takes too long before the problem arises, and aside from the Event log, would the 'IndexMissing.log be the other appropriate option for establishing root cause?

I guess I need to shoulder a fair weight of responsibility for whats lacking in troubleshooting info, so I hope discussions like this will mean next time I can either solve it unaided, or at least provide the full and necessary info required to state exactly what the problem is 

JesusWept3
Level 6
Partner Accredited Certified

By sloppy troubleshooting, i meant from support, not from yourself...

So there's a couple of things that concern me....
1. The Poison Pill items, this means you anticipate you already have dataloss and it hasn't been cleared up.
Indexes being marked as failed can be for a number of reasons, for instance if storage goes down, indexing keeps attempting to retrieve the items, can't get them, and after a few consecutive failures, it will fail the index.

It can also be because of physically corrupt or missing index items.
But heres the thing, you have a massive massive amount of index volumes, 3100 of them
That's a lot

Typically the troubleshooting would be  to enable AVS Logging via the registry
Stop the EV Services, Beging a dtrace of StorageCrawler and IndexServer and then start Directory, Indexing and Storage Services.

You monitor the dtrace to see what its doing in regards to that particular index that has failed
Is it looping attempting to retrieve items? is it SQL not responding? is it something on the storage side not responding correctly?

If nothing happens, you can kick off an Update to the latest index volume and see what it does also.
Plus as far as rebuilds are concerned, you could have rebuilt the last index volume as opposed the entire 3100 index volumes, because there is absolutely zero reason to believe that 3099 index volumes are also corrupt

Logic dictates that if you have that many volumes and then it just stops writing, that the issue would lie with the last volume being written to.

The other issue I have is, is that theres nothing to suggest that in two months time when you get to the items where it stopped indexing that you won't run in to the exact same issue

https://www.linkedin.com/in/alex-allen-turl-07370146

Rem_y
Level 4

That's gold JesusWept, thank you. And I didn't take what you said to be my sloppy troubleshooting, but I figure if I'm going to ask questions here, I'd like to know I'm putting in some time and effort first

It's all a bit of a mess really, and I've only really started working on this so I only have a limited set of resources to work from, backups being one

So the back story to this is the LUN where these indexes are located fell over, and when it came back up was presented to another server. That server didn't actually accept, but the knock on effect is when the LUN was failed back to EV and EV restarted, I immediately got a CHKDSK - not good

Straight after, in the Manage Indexes GUI, I have about 50 users with failed indexes, leading also to the 41021 in the VAC for Mail stores. I renamed the index location folders to .old, added the NumWordsToWrite reg key, and rebuilt them all, and the 41021 number is now decrementing

Then I started to concentrate on the Journal backlog leading to my previous post

The significance of the backups is, if I definitely knew I had a good set of backups, I'd restore from there, to mitigate the current and potential for future problems, but then life is never that easy is it :(

Pradeep-Papnai
Level 6
Employee Accredited Certified

Hi Casini,

Please also have a look at "Best practice Guide" to keep your indexes healthy.

Enterprise Vault Best Practices for large installations
http://www.symantec.com/docs/TECH160065

Enterprise Vault Indexing best practice and troubleshooting.
http://www.symantec.com/connect/articles/enterprise-vault-7-and-enterprise-vault-2007-indexing

(it's mentioned for EV 7 but most of the part is similar till EV 9.0.X).

Indexes should be on local fast drive and should not be scan by any Antivirus or other scanning application.

Regards
EV-C

Ben_Watts
Level 6
Employee Accredited

Just to shed a bit more light on the issues and resolution seen here as well as answer some of the questions raised.

 

Items can be poison pilled due to EV being unable to index them, not simply due to data loss, the reasons for EV being unable to index them can be many and varied.

The PoisonPill count was increased as a precaution and to allow the rebuild to complete, none of the index volumes (all 3182 of them) were in a failed state and neither did they have an excessive amount of failed items present, in total there were less than a 1000 failed items across all of the needed index volumes.

Most of these 3182, around 3174 of them to be precise, were empty and not needed hence they needed to be cleared out of SQL and EV made aware they were not to be tried again.

EV for some reason had an issue with adding items to the index volumes for that Journal Archive, due to the LUN failure, and would therefore create a new Index Volume folder each and every time a new scan and update was started. This had been going on for a little while now hence the over 3180 index volumes.

For a JournalArchive with around 40,000,000 items in it a failure of 1000 items is nothing that cannot be looked into after the index has been rebuilt and then corrected, the main issue is to get the index rebuilt and fully functional so that any needed searches can be carried out on it and the backlog of items awaiting indexing is cleared down.

 

Dtrace was captured and checked, there were no looping items, large or corrupt items etc this was not a normal index volume failure in the common sense of the phrase. Yes we (Symantec) could have delved in MUCH deeper and even asked for other manufacturers Support to get involved to find the exact root cause but I believe that is mentioned above with regards to the LUN presentation and CHKDSK run against the index locations (which not only affected the Journal Index but also a lot of the normal mailbox indexes).

The end result would still have been the same, an entire restore or rebuild (if a good backup is not available) of the index volumes needed to be carried out to clear all the present index volume folders and create new ones. The only other way to correct this issue would have been to delve into SQL and make all the changes manually, either way the it would have been doing the exact same thing except one had a higher chance of something being missed and going wrong in the process.

With regards to running into the exact same issue further down the line I do not believe that will happen, simply because of the reason for the failure and the investigation carried out so far on the situation.

 

p.s. no prizes are given out for guessing who worked the case from the Symantec Support side of things...

JesusWept3
Level 6
Partner Accredited Certified
Well that definitely makes more sense, but couldn't you have gone back to one of the earlier index volumes, ran an index check to verify the file integrity and then rebuilt the index volume on the next index volume? That would have then deleted the other index volumes after it as it started to reindex, that way you may have saved some time at least
https://www.linkedin.com/in/alex-allen-turl-07370146