cancel
Showing results for 
Search instead for 
Did you mean: 

CA Random Sampling (Last Sample) Not Progressing

Next012
Level 3

Hello all,

 

This issue started happening a week or so back and we've tried a number of the resolutions available through Symantec's knowledge base that generally work after a restart of the database, Windows updates gets turned back on and restarts the system at 3 AM (during processing) or otherwise and none have kickstarted the Last Sample metric again.  The database tblConfig appears to be stuck with Sampling : Status - Started and the date for Sampling : Last Sample remains the same.  The odd part is that the Sampling : Last First Pass Sample metric is consistently updated?

We've set the thread count lower, changed the cache timeouts, adjusted the Last Sample date back one day, restarted all connected systems but still, the random sampling never seems to start back up again appropriately.

 

The Indexing service on the CA server appears to not be started and has an error when attempting to start - does this tie into the Random Sampling system?

 

Is there any other way to kickstart the Sampling : Last Sample or Random Sampling in general via script or otherwise?

 

Enterprise Vault 9.0.1

Compliance Accelerator 9

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

Kenneth_Adams
Level 6
Employee Accredited Certified

I'm looking at the dtrace and see it was run on machine remora.  I presume this is the EV server as there are no lines with AcceleratorService threads.

Please run the dtrace from the CA server's Command Prompt as follows:

  1. Log onto the CA server as the Vault Service Account (VSA) if you've logged on as some other account.
  2. Open a Command Prompt.
  3. Change the location within the Command Prompt to the EV installation folder (default location on 64-bit OS is 'C:\Program Files (x86)\Enterprise Vault' or on 32-bit OS is 'C:\Program Files\Enterprise Vault').
  4. Type dtrace at the Command Prompt and press the Enter key.
  5. At the 'DT>' prompt, type the letter v and press the Enter key.
  6. Scroll up throgh the listing of processes as needed to locate the AcceleratorService process and note the number associated to it (typically, this would be the number 2).  Note that will have to manually type the process name at the next prompt if you don't use the 'v' (for View) option to assign a shortcut process number to each process.
  7. At the 'DT>' prompt, type 'set 2 v', without the single quotes and replacing the number 2 with the number associated with the AcceleratorService process if that number is other than 2.
  8. Press the Enter key to complete the process logging selection.
  9. At the 'DT>' prompt, type 'log C:\Logs\AcceleratorServiceSamplingIssue.log', without the single quotes and using the drive letter, folder path and file name of your choice (we just ask that you have the extension as '.log').
  10. Press the Enter key to complete the log file specification and start the actual logging.
  11. At the 'DT>' prompt, type 'Mon' and press the Enter key to monitor the process log entries.
  12. Let the dtrace run for about 10 minutes, monitoring the log through use of the 'Mon' command.
  13. Look carefully as the information may scroll by quickly at times and slowly at others, but you'll be looking for Step 6, then Step 7.  If the processing gets to Step 7, it should complete soon (within a couple of hours or so, depending on the amount of data it has yet to process).
  14. If you see Step 7, you can stop the dtrace and delete it as we know the processing will complete.
  15. If you see any out of memory errors in Step 6, you'll need a Support ticket for us to help you clear the tables and reset Sampling.
  16. To stop the monitoring, just hold the Ctrl key and press the C key.  This combination may have to be tried a few times if the system is busy scrolling lines across the screen.
  17. To end the logging, just type 'log' at the 'DT>' prompt and press the Enter key twice.

I'm hoping you'll report back that the Step 7 has appeared, but I'm ready to review the dtrace log if needed.

 

View solution in original post

27 REPLIES 27

EV_Ajay
Level 6
Employee Accredited

Hi,

1. Could you please let me know whether CA Server & EV Server is on same box ?

2. What's the version of EV Server including Service Pack ?

3. What's the Version of CA Server including Service Pack ?

4. The EV Binaries installed on CA Server should match the EV Binaries installed on EV Server.

5. The CA Version should be equal or greater than EV Version ( It is recommended)

6. All EV Servervices should be in running stage because when CA perform search / sampling it communicate with EV Server Services (Indexing & Storage) to collect the data.

 

Next012
Level 3

Hi Ajay,

Thank you for your prompt response:

 

1. Could you please let me know whether CA Server & EV Server is on same box ?

CA is on a different box than the EV host (where Administration Console resides).  I do see an installation of EV on the CA server but it is not the primary host?

 

2. What's the version of EV Server including Service Pack ?

These values are straight from the console:

EV 9.0 SP1

File version: 9.0.1.1073

 

3. What's the Version of CA Server including Service Pack ?

These are from Add/Remove programs, Support Info, any other location where these details might be present?:

Build 9.0.1.1039

Version: 9.0.5135

 

4. The EV Binaries installed on CA Server should match the EV Binaries installed on EV Server.

The installation of EV on the CA server appears to be 8.0.3.1845.  As stated, this may need to be updated?

 

5. The CA Version should be equal or greater than EV Version ( It is recommended)

Per the CA Version above, this is accurate.

 

6. All EV Servervices should be in running stage because when CA perform search / sampling it communicate with EV Server Services (Indexing & Storage) to collect the data.

The Indexing service on the CA server may tie into the older EV instance that is present.  If this is the case, that might explain the lack of a starting service.

 

 

Some back history on the installation as well.  This environment has been up and running for many years now.  The issue only recently started after restarts of the database server cluster due to updates in the worst time possible (during collection).  The systems were functioning without issue prior to the restart date though it was not noticed if all of the services were running on the CA server as uncovered now.

 

If EV needs to be updated on the CA server, that should be easily accomplished.  If EV needs to be removed from the CA server, then that would be a next step.

 

 

Thank you!

Kenneth_Adams
Level 6
Employee Accredited Certified

Hello, Next012;

In addition to the information requested by EV_Ajay, please also record a 10 minute long dtrace of the AcceleratorService on the CA server.  We need to see if you are getting any Out of Memory Errors during Step 6 of the Random Sampling processing.  If you are, then your solution will require a Support case to be opened as we will need to manipulate some information in the CA Customer database (set the sampling status and date, clear the sampling tables) and also run one or more catch-up searches to close the gap in the review set from when Random Sampling last completed successfully and now.

One issue we've found with Random Sampling not finishing is having too many items to be parsed into the different Department review sets.  When this happens, we get an out of memory error in the AcceleratorService dtrace during Step 6 of the Random Sampling processing.

Now, if you get this cleared and it happens again, you may need to break your Departments into smaller Departments (i.e., move some Monitored Employees from their current Department in to one or more new Departments).  We've found having a large number of Monitored Employees in a Department can cause these out of memory errors as there are just too many messages to hold in the 2 GB of RAM that Windows allocates to the AcceleratorService process.

Next012
Level 3

Hi Kenneth,

 

I'll run a dTrace here once the servers are brought back up - service issues this morning :)

 

As near as I can tell, the count for items waiting to be sampled is over 300k.  The issue was not caught immediately due to common fluctuations in CA review queue counts.  Thus it was caught after the listed 30k mark.  We journal roughly 50k emails per day so it has grown fast and will continue to grow.

 

Thanks!

 

Next012
Level 3

I've successfully updated the EV instance that is on the CA server to 9.0.1.  Confirmed it is the same instance version as what is installed on the storage host.

Kenneth_Adams
Level 6
Employee Accredited Certified

Hello again, Next012;

Your update seems to have occurred while I was writing my post above.  Thank you for answering EV_Ajay's questions.  We now see what is likely wrong.

You noted that the EV binaries on the CA server are version 8.0.3.1845. (EV 8.0 SP3), while the EV binaries on the EV server are 9.0.1.1073 (EV 9.0 SP1).  This binary version mismatch is likely causing the issue as the Step 1 in Random Sampling is to translate the information collected by the Journal Collector into the KVSSavesetID for each message.  With the EV binaries mismatched,that translation is not occurring properly.

The EV binaries or EV Runtime API is required on the CA server in order to interface with the appropriate EV services on the EV server.  With the mismatch, searches will not always run, export will not always complete, and reviews may fail as CA can't communicate properly with the EV server.

Please stop the Enterprise Vault Accelerator Manager Service (EVAMS) on the CA server, upgrade the EV binaries to match your EV server (direct upgrade of EV from 8.0 SP3 to 9.0 SP1 is supported), set the Startup for any EV services other than EVAMS to Manual or Disabled, then reboot the CA server.  Note that you should only have the Enterprise Vault Admin Service and EVAMS installed on the CA server unless there are other EV functions (such as mailbox archiving or Vault Stores) configured on that server.  If you DO have other EV services installed on the CA server, you'll need to leave all EV services startup set to Automatic until such time as you can confirm none are needed on the CA server and uninstall them.

After you've upgraded EV and rebooted the server, monitor the sampling processing by running a dtrace of the AcceleratrorService process and looking for Step 6 instances.  There should be multiple instances for each Department.  If you don't see any progress with the sampling processing, you'll need to open a Support ticket so we can reset the Sampling status and date to allow it to progress.

 

Kenneth_Adams
Level 6
Employee Accredited Certified

Wow!  We're just crossing each other's messages.  I just saw where you've already upgraded the EV binaries on the CA server to match those of the EV server.  That's good.  Sampling should pick up and continue, but the dtrace will tell for sure.

My concern is the KVSSavesetID translation may not have finished and we'll have to reset the Sampling status and time to get it to work.

We look forward to your next update.

Next012
Level 3

I must've missed this response - now I understand your "crossing messages" comment :)

 

Judging by the content of the dTrace, it is not collecting the data appropriately so I'm assuming it's running against the wrong category?  I'm also running the dTrace from our primary EV system via Administration Console.  I can run it from the CA server via command line if preferred.

Next012
Level 3

I've restarted the dTrace - please disregard the previous!

 

Via command line dTrace, I've set against AcceleratorService and will upload the log once a good collection is taken.  It is cycling through and scrolling fast so I can see some parts are working appropriately.

 

A quick question - if I set the Sampling start time and it has already started, if the time is adjusted again (assuming to restart the sampling), could that cause issues?

 

 

Thanks!

Next012
Level 3

Thankfully it's a slow day for our Compliance staff so the system being offline for a little bit has been a blessing in diguise for them.

 

I've started the dTrace but only against the Compliance Accelerator category.  I've also adjusted the start time for the sampling but could not start the dTrace prior to that start time.  I will upload the log once it's finished the 10 minutes lap, in fact it's finished.

 

Edit: I've unlisted the attachment.

Kenneth_Adams
Level 6
Employee Accredited Certified

I'm looking at the dtrace and see it was run on machine remora.  I presume this is the EV server as there are no lines with AcceleratorService threads.

Please run the dtrace from the CA server's Command Prompt as follows:

  1. Log onto the CA server as the Vault Service Account (VSA) if you've logged on as some other account.
  2. Open a Command Prompt.
  3. Change the location within the Command Prompt to the EV installation folder (default location on 64-bit OS is 'C:\Program Files (x86)\Enterprise Vault' or on 32-bit OS is 'C:\Program Files\Enterprise Vault').
  4. Type dtrace at the Command Prompt and press the Enter key.
  5. At the 'DT>' prompt, type the letter v and press the Enter key.
  6. Scroll up throgh the listing of processes as needed to locate the AcceleratorService process and note the number associated to it (typically, this would be the number 2).  Note that will have to manually type the process name at the next prompt if you don't use the 'v' (for View) option to assign a shortcut process number to each process.
  7. At the 'DT>' prompt, type 'set 2 v', without the single quotes and replacing the number 2 with the number associated with the AcceleratorService process if that number is other than 2.
  8. Press the Enter key to complete the process logging selection.
  9. At the 'DT>' prompt, type 'log C:\Logs\AcceleratorServiceSamplingIssue.log', without the single quotes and using the drive letter, folder path and file name of your choice (we just ask that you have the extension as '.log').
  10. Press the Enter key to complete the log file specification and start the actual logging.
  11. At the 'DT>' prompt, type 'Mon' and press the Enter key to monitor the process log entries.
  12. Let the dtrace run for about 10 minutes, monitoring the log through use of the 'Mon' command.
  13. Look carefully as the information may scroll by quickly at times and slowly at others, but you'll be looking for Step 6, then Step 7.  If the processing gets to Step 7, it should complete soon (within a couple of hours or so, depending on the amount of data it has yet to process).
  14. If you see Step 7, you can stop the dtrace and delete it as we know the processing will complete.
  15. If you see any out of memory errors in Step 6, you'll need a Support ticket for us to help you clear the tables and reset Sampling.
  16. To stop the monitoring, just hold the Ctrl key and press the C key.  This combination may have to be tried a few times if the system is busy scrolling lines across the screen.
  17. To end the logging, just type 'log' at the 'DT>' prompt and press the Enter key twice.

I'm hoping you'll report back that the Step 7 has appeared, but I'm ready to review the dtrace log if needed.

 

Kenneth_Adams
Level 6
Employee Accredited Certified

I'm hoping this response won't cross another from you, so you can get the answers you requested.  I just noticed your response with the new dtrace and will look at it shortly.  I also noticed your new posting about resetting Sampling to get a dtrace before and during the Sampling processing.  Until I get the dtrace review done, please do not respond again so we can get our responses synchronized.

To answer your question about adjusting the sampling time, let me provide some background information first.  Random Sampling is designed to look at all items in the tblVaultSamples table that have been added since the last time Sampling ran.  Normally, that last sampling run time would have been 'yesterday', but can be more than just one day in the past (as you currently have).  When Random Sampling completes, it MUST wait at least 24 hours before it can run again.  This requirement is to allow new items to be added to the table so we have something to process.

We can fake that 24 hour period by setting the 'Sampling : Last Sampled' date back 1 day when we also set the sampling time back to its original value.  We set the last sampled date, set the sampling time, then restart EVAMS so both changes will take effect immediately.

For example, Sampling is set to run at its default time of 1:00 AM, but something happens to prevent it from running - say, the SQL Server is down for maintenance.  We can set the Sampling time to 10:00 AM and restart EVAMS to enforce that change, then let Sampling run its course starting at 10:00.  When Sampling has completed, we want to set the Sampling time back to 1:00 AM, so we make that change and restart EVAMS again.  If that's all we do, Sampling will not run tomorrow at 1:00 AM, it will run the next day at 1:00 AM and include items collected today as well as tomorrow.

Now, to make the above example process items tomorrow at 1:00 AM, we change the date on the last sampled date/time entry so that Sampling ran yesterday at 10:00 AM instead of today at 10:00 AM.  This allows Sampling to run tomorrow at 1:00 AM as normal and process messages captured today.

To answer your question about resetting Sampling to obtain a dtrace before and during the Sampling run, please hold off on that until I review the dtrace log.  The results of that review will determine if such action is recommended or needed.

One more thing, we won't process any items that we can't translate their TransactionID into their KVSSavesetID.  We'll try to translate those items for a certain number of days, but will eventually remove them - if all is working properly.

Next012
Level 3

Hi Ken,

 

Attached is the dTrace against the AcceleratorService tag.  The collection time was just over 10 minutes but I could not see any Out of Memory errors.  It appears the service is sampling just fine (into the tblVaultSamples) but until an actual Sample run, I don't think it will process those records.

 

 

Thanks

 

Edit: Unlisted the attachment - it's already present on a response below.

Next012
Level 3

Excellent description!  Thank you for that.  I've had my hands in the EV systems for over a year now and every bit of information helps.  Issues with SQL, Windows Updates or otherwise seem to always cause a hiccup in our environment when it comes to sampling which is one other reason I'm inquiring so much.

 

I look forward to your response.

 

Thanks again Ken!

Kenneth_Adams
Level 6
Employee Accredited Certified

OK.  That was a quick review. Your Sampling is not finding anything to do, as evidenced by the following snippet from the log:

1187 10:57:52.292  [3560] (AcceleratorService) <Guaranteed Sampling Thread:4692> EV-L {GuaranteedSampling} {C2} {JC} Step 1: Processing captured items...
1188 10:57:52.292  [3560] (AcceleratorService) <Guaranteed Sampling Thread:4692> EV-L {GuaranteedSampling} {C2} {JC}     ...there are no captured items to process
1189 10:57:52.292  [3560] (AcceleratorService) <Guaranteed Sampling Thread:4692> EV-L {GuaranteedSampling} {C2} {JC} Step 1: Processed 0 captured items
 

This log only reports Step 1, which is the step we use to translate KVSSavesetID entries.  Please chech the tblConfig table contents to see if the sampling status is still showing as Started.  If it is, we'll have to manually reset things.

Also, please run the following SQL queries against your CA Customer database so we can get an idea of where things are.

-- Query 1: Count total rows in tblVaultSamples table
SELECT COUNT(*) FROM tblVaultSamples

- Query 2: Count total rows with no KVSSavesetID entries
SELECT COUNT(*) FROM tblVaultSamples WHERE KVSSavesetID IS NULL

  • If the counts match, then we're still having a problem translating the KVSSavesetID entries.  If the count's don't match, then we're processing those properly, but we have nothing new captured to process yet.

 - Query 3: Determine the current Sampling status and date/time
SELECT * FROM tblConfig

That should do it for now.  I look forward to seeing the results of these queries.

 

Next012
Level 3

Hi Ken,

 

I can confirm the tblConfig Sampling : Status remains as Started and has consistently remained Started unless changed.  If changed and the sampling service is run, it will change back to Started but never show as Finished.

 

SELECT COUNT(*) FROM tblVaultSamples totals at 364855 (yikes.)

SELECT COUNT(*) FROM tblVaultSamples WHERE KVSSavesetID IS NULL totals at 0

 

Sampling : Last First Pass Sample - 2013-10-29T10:40:02.7229267-05:00

Sampling : Last Sample - 2013-10-13T00:00:00.000

I show the first item to be processed in the sample vault as 2013-10-14 00:11:37.320

 

 

Thanks!

Kenneth_Adams
Level 6
Employee Accredited Certified

364,855 may well be too many items to process.  Let's get a new dtrace of the AcceleratorService process, but let do it a bit differently.

  1. Start the dtrace and get it logging to a file of your path and name choice.
  2. Restart EVAMS.
  3. Let the dtrace run for another 10 to 15 minutes.
  4. Stop the dtrace and send it in so I can review it.

We may just need a restart of EVAMS to kick start the processing.  If that does not work. we'll have to work through a Support ticket to manipulate the database to clear the 4 Sampling related tables and reset the sampling status. We can leave the last sampled date alone, though.

I have to go to a meeting, so I won't be able to respond for a while. I'll get back to you as soon as I can after reviewing the dtrace log.

 

Next012
Level 3

We're on the same page :)  Already posted a trace in the response above.  I've attached it here as well.

 

I'd like to run the trace prior to the start of the sampling and during the sampling processing.  Is it possible to restart the processing job or would that cause underlying issues that you're aware of?

Kenneth_Adams
Level 6
Employee Accredited Certified

I just looked at the new dtrace.  It shows Step 6 processing through the Departments.  We now know that the Random Sampling processing is trying to complete.

Please keep an eye on the Random Sampling processing.  You can do this by monitoring a dtrace of the AcceleratorService, as you've done before, and look for Step.  If you see Step 7, you can relax as any out of memory errors would occur in Step 6.  Getting to Step 7 means the Step 6 processing has completed.

If you see out of memory errors in Step 6, we'll have to clear the 4 tables and reset the last sampling status. Not seeing any out of memory errors just means we're still processing through the items that should complete.  Due to the high number of items you've got to process through, I suspect the Random Sampling processing will take a few hours to complete.