Index volume failed - ? re: synchronization task

NJBird1427 · ‎08-20-2013

We have a failed index volume for the EV journal mailbox. I opened a ticket with Symantec support and the technician reset the 64-bit indexing engine (per this technote http://www.symantec.com/business/support/index?page=content&id=HOWTO59060) and started a synchronization on the failed volume. The index administration task indicates that it is processing but the indexing task is queued. When I open the report log I see this:

20/08/2013 11:21:26 The synchronize has been asked to stop and is stopping.

20/08/2013 11:21:36 The synchronize has resumed its processing.

20/08/2013 11:21:36 The synchronize has been asked to stop and is stopping.

20/08/2013 11:21:46 The synchronize has resumed its processing.

Should I be concerned about this?

How do I know if the synchronization is actually working? I currently have 682634 items waiting to be indexed and that number is growing.

Thanks

GabeV · ‎08-20-2013

Hi,

If you already have a ticket open with support, you can call the engineer and let him/her know about these messages in the index report log. If you see these messages in the log but you do not see any progress in the index synchronization, the engineer might need to run a dtrace to determine if the synch process is stuck for some reason or if it's waiting for another resource (such as IIS or storage). It could be something else, but the dtrace should provide more info on that.

I hope this helps.

Ben_Watts · ‎08-20-2013

Can I ask if the Technician provided any reason for resetting the Indexing Engine?

Was something seen in the Dtrace that suggested this needed doing?

As much as that can fix some problems it isnt a fix all and should be done really only either as a result of seeing something in the logs that suggests it needs doing or after a lot of other options have been investigated.

I wouldnt worry about those messages above, they are normal and could be caused by absolutely anything, bottom line is that it is expected, the operation was asked to stop for a particular reason but then picked up again, could be backup mode, indexing service stopping, Index Administation Task coming out of schedule time of operation etc

Have you looked at the event logs around the same time as the messages above to see if they coincide with anything?

There are so many reasons for items awaiting indexing growing and not going down it is almost impossible to give you any possible reasons on here without seeing any evidence.
If you have collected DVS files recalls can be slower, if CAB files have been migrated to secondary storage recalls can be even slower, if network is slow between Storage Server and Indexing server/Index location, if you are missing savesets then a sync/rebuild can take a VERY long time, the list goes on.

Are you seeing any errors in the event logs at all?

As Gabe said really, the Tech should be able to help you out by looking into a Dtrace and finding out what is happening with regards to the sync.

NJBird1427 · ‎08-20-2013

I believe he reset the indexing engine based on the reason for the failed volume - indexing engine exception (as seen in the event log). He did not run dtrace. I don't see any errors in the event log but neither do I see anything in the indexing task report that indicates any progress. That's what has me concerned. I did let the technician know what is happening (or not happening as the case may be). I'm just trying to figure out what is normal and pick up a little knowledge in the process.

Ben_Watts · ‎08-20-2013

Ah ok that makes sense then and completely understandable.
Heres a little technote that helps explain EV10 indexing a bit more - http://www.symantec.com/docs/HOWTO56273

You can run a Dtrace of the EVIndexAdminService and, more importantly for Synchronizations, the EVIndexVolumesProcessor to find out what is happening during the sync, even if it is just to ensure we are moving forwards.
Running Dtrace - http://www.symantec.com/docs/TECH38122

Run the dtrace for 10mins, opening the dtrace with a larger buffer (2000000 is a good starting point), then post a link to the dtrace here, we can give you some pointers/adivce as to what is happening (http://www.symantec.com/docs/TECH127876)

Heres some info on EV10 Indexing (get a drink before you start reading) - http://www.symantec.com/docs/HOWTO58947
Look at the Index Troubleshooting and Advanced Index Troubleshooting for docs that will help you a bit more.

Like I said above, those messages 'look' perfectly normal, without other surrounding facts, the process is being asked to stop and is stopping/pausing then picking up again after whatever has asked it to stop has finished itself.

Run a Dtrace to confirm that we are moving through the ISN (index sequence numbers) whilst synchroniztion the volume/s.

I hope the above has helped a bit?
If you want to know anything else just ask.

NJBird1427 · ‎08-20-2013

Ben - thank you for the information. It's very helpful. I have a feeling I am going to learn more about this process than I really want to know. lol :)

I ran two dtraces. One for the EVIndexAdminService and one for the EVIndexVolumesProcessor. I ran them both for about 10 minutes. I have attached the log files.

Ben_Watts · ‎08-20-2013

Ok, having looked through those you do have a problem, the call to pause or stop the operation is legitimate but the reason behind the call is happening because something isnt right.

If you look through the EVIndexVolumesProcessor dtrace that you have gathered you will see us looping through actions on one particular ArchiveID, 11 times in the Dtrace if I have counted correctly, which I presume is the ArchiveID of the Journal Archive that you are Synchronising.

We see the loop start processing the job:-

(EVIndexVolumesProcessor) <Agent Thread for 1B02486879AD74E73A478D977EF3172DB1013b00evault_16CC9F815D3DF7C438B67A869E495FC1B1013a00evault_19E5EDDAAED0B4F4EA0DB9EE16BC109D21110000evault_3772:7772> EV-L {SubTaskRepairWorkItem} Stopped event restart as the processing is starting. WorkItem '1B02486879AD74E73A478D977EF3172DB1013b00evault_16CC9F815D3DF7C438B67A869E495FC1B1013a00evault_19E5EDDAAED0B4F4EA0DB9EE16BC109D21110000evault_3772'

Then at the end of the loop, just before a request to pause/stop the sync you can see the below lines:-

'{EVContentSourceAccessor} Index volume [19E5EDDAAED0B4F4EA0DB9EE16BC109D21110000evault_3772] is not valid for processing. Failed=[True]'

And

'Cannot start processing as the Content Source is not ready. The work item will be thrown away'

Then

{SubTaskRepairWorkItem} Stop is requested - Reason: 'Stopped'. WorkItem '1B02486879AD74E73A478D977EF3172DB1013b00evault_16CC9F815D3DF7C438B67A869E495FC1B1013a00evault_19E5EDDAAED0B4F4EA0DB9EE16BC109D21110000evault_3772'

To me that means that we are unable to access the Index Volume itself because it is marked as failed.

Now a Synchronize should be clearing that 'failed' status and then carry out the work of checking for missing items etc and making sure the Index Volume is synchronized fully.

Only thing I can think of, without seeing the environment, would be that maybe something is holding the file/s open or we are trying to access a file that is now corrupt and we are unable to access it to recall information that we need to actually start the synchronise operation.

One thing I would try is to restore that folder from Backup media, IF you have a good backup of it before all this started, even if we are missing items from that backup the synchronize will update it accordingly. But if we have a corrupt file in there, possibly due to Antivirus or something similar, the Backup of the Index Volume may well restore a good version of it allowing us to carry on with the Synchronize.

I would most definitely ensure that you have ALL of the index locations excluded from AV scanning, and all other EV resources according to this technote http://www.symantec.com/docs/TECH48856 and then carry out the above restore if you can.

If you dont have a backup of the Volume then it may be a case of a rebuild, but on a Journal Archive index this is a Last Resort due to the time it can take, so everything else would need to be ruled out before rebuilding that Volume.

You are right, if you start to delve into the ins and outs of Indexing in EV10 you will most definitely learn more than you want...

Also, just for next time, run both processes (or all if there are more than two being traced) from inside the same Dtrace operation. It can be seperated afterwards if need be but is easier to read what is happening sometimes if they are together as they both play a part, that isnt a critism just a piece of advice for the future if you will be looking through them yourself.

NJBird1427 · ‎08-22-2013

Hi. Just wanted to provde an update. The synchronize task was going nowhere and we had no backup data available from before the failure. Therefore we did a rebuild of the failed index volume and that was successful. Thanks for the help and the education. I learned a great deal from this post.

Ben_Watts · ‎08-22-2013

No problem, glad to have helped in some way.

Remember to mark a post as the closest possible solution, if there were no solutions just mark your own post as the solution.

VOX

Index volume failed - ? re: synchronization task