Forum Discussion

kdrone's avatar
kdrone
Level 3
10 years ago

Data Insight - backlogged files

We are running Data Insight 4.5, with multiple indexer/collectors doing file scans and Fpolicy across nearly 2 dozen netapps.

 

Is there a 'best method' to address the lag between an event occurring (such as a file being deleted from a filer) and when the event shows up in Data Insight?  We are running our indexers/preprocessors at 30min, but our file backlock for most things remains over an hour.   We are not hitting much utilization on our collectors (averaging 3% cpu utilization and 40% memory on 64bit 12gb ram 8-core server2008r2).    

On the performance section, the backlog size is generally 30-100 files,  10mb-20mb.

  • kdrone, you seem very familiar with the application being that you are running a healthy sized implementation.

    The process of getting device info from the filer to the indices is different under the two scenarios but let's group them for our discussion.

    Scanning is the Hierarchical crawl of the data structure and is run on a schedule or demand. Auditing is the capture of I/O access to the data structure and runs 24X7. Both come from the device to the collector and are transferred to the indexer for further processing.

    I'll assume all is well and you wish to put the files from the inbox on the indexer to the indices for reporting or display purposes.  I am providing the option to initiate the job manually on demand; there is also the scheduled option where you could lessen the time between runs. Until the accumulated files transferred form the collector are indexed successfully there is no access to the information.

    That occurs at the conclusion of the IndexWriterJob run against the files on a share by share basis. I will not go into it further but anyone in our support organization, technical product management or Sales Engineers could provide you a detailed view of the process. If you would like access to a document detailing each job please private message me and I'll direct you to the content.

    To force the job to run on demand and not wait for the schedule so that you may see the data accumulated on the indexer sooner you may start it manually in the Symantec DataInsight version you have installed and all later versions.

    Simply navigate to the settings tab and locate the DataInsight servers page then select the desired indexer holding the data you wish to see sooner from that page, navigate to the jobs tab and using the button  for action start the job. Depicted in the pictures below is the location of these items.

    console_0.jpg

    start.jpg

    I hope that answers your question. I'll check back after the weekend break  (note - this weekend is  bank holiday in the USA) and see if you have further follow up asks.

     

     

  • This answers exactly, looks like the missing peice was the job.   Thanks again!

  • kdrone, you seem very familiar with the application being that you are running a healthy sized implementation.

    The process of getting device info from the filer to the indices is different under the two scenarios but let's group them for our discussion.

    Scanning is the Hierarchical crawl of the data structure and is run on a schedule or demand. Auditing is the capture of I/O access to the data structure and runs 24X7. Both come from the device to the collector and are transferred to the indexer for further processing.

    I'll assume all is well and you wish to put the files from the inbox on the indexer to the indices for reporting or display purposes.  I am providing the option to initiate the job manually on demand; there is also the scheduled option where you could lessen the time between runs. Until the accumulated files transferred form the collector are indexed successfully there is no access to the information.

    That occurs at the conclusion of the IndexWriterJob run against the files on a share by share basis. I will not go into it further but anyone in our support organization, technical product management or Sales Engineers could provide you a detailed view of the process. If you would like access to a document detailing each job please private message me and I'll direct you to the content.

    To force the job to run on demand and not wait for the schedule so that you may see the data accumulated on the indexer sooner you may start it manually in the Symantec DataInsight version you have installed and all later versions.

    Simply navigate to the settings tab and locate the DataInsight servers page then select the desired indexer holding the data you wish to see sooner from that page, navigate to the jobs tab and using the button  for action start the job. Depicted in the pictures below is the location of these items.

    console_0.jpg

    start.jpg

    I hope that answers your question. I'll check back after the weekend break  (note - this weekend is  bank holiday in the USA) and see if you have further follow up asks.