Forum Discussion

TTree2007's avatar
TTree2007
Level 4
7 years ago

Can the files in the \Data\Classification\content folder can be deleted manually?

Hi

I am classifying files from a 3 TB share and in hindsight probably should have selected a small set of data as it seems to not be purging the files out before trying to process more.

The drive has 400GB space but has reached the max threshold set (10GB) and therefore processing has stopped.  Most the data is here…

D:\DataInsight\Data\Classification\Content\14… 

Can I just delete the files in this folder manually or is there a better course of action?  A setting I have missed maybe?  I cannot realistically keep expanding the drive size!

Not sure it's relevant but I am not using the Windows Filer agent

Thanks

  • TTree:

    Deletion will result in failure of the request. When the files are completed they should be deleted and removed from the temporary location.  It is very probable that the network transfer speed would exceed the speed at which files can be opened and traversed while comparing to all active policies. It is imperative that you have sufficient space to contain the classification request's dataset.

    While continually adding disk space is not desirable there is an alternative.

    By default, the location for storing file content while it is being classified is C:\DataInsight\data. To modify the location, set the following custom configuration properties on the Settings > Inventory > Data Insight Servers > Classification Server > Advanced Settings page > Set custom properties:        
            
    ̶    Property name: classify.fetch.content_dir
    ̶    Property value:  The directory name. such as F:\DIcontent
    Note: To protect the content, Veritas recommends encrypting the folder.
    Note: Service restart is required for the change to the directory to take effect.

    Example:

    • TTree2007's avatar
      TTree2007
      Level 4

      Hi Rod

      Thanks for that information.  It does seem that I perhaps selected too much data to classify in one job.  Do you have any guidelines on roughly how much temp space it's going to need based on how much is selected to classify?  For example, I selected 3TB, would this need 1TB of temp space or the whole 3TB?  Can't the process do it in chunks?

      I've asked the customer to raise the disk temporarily to 900GB but it sounds like it might be best for me to cancel the current task and do it in smaller jobs if it still can't complete. 

      The data directory we've selected is on a separate drive to the DI install (d:\ rather than c:\) but this obviously has lots of other items in their too.  It probably won't make much difference to my current scenario as it will still run out but is it still good practice to move the 'classify.fetch.content_dir' as you mention?

      Thanks

      • Rod_p1's avatar
        Rod_p1
        Level 6

        TTree, one can also stop the network transmission and allow the classification engine to catch up. Basically you set a threshold for stopping transfer then pick it back up when you reach a disk free threshold. This means typically you are operating within that band for the duration of the request but it will not force you to cancel or reduce the  request.

         

        You can achieve it by using the classification settings:

        You set the point at which to stop network transfers and after classification has deleted files and reaches the free space goal in MB or percentage of free space the transfers will begin again. It is set under the Settings > Classification / Configuration page as in below.

         

        You might want to also consider that option versus stopping the request, reducing its' size and reinitializing a new request.


        Rod

         

        Note:

        Select the check box to monitor the disk usage on the Classification Server node, and to prevent it from running out of disk space by implementing safeguards.

        Specify threshold/reset values in MB / Specify threshold/reset values in percentage

        You can specify the threshold for disk utilization in terms of size and percentage. The DataInsightWatchdog service initiates the safeguard mode for the Classification Server node if the free disk space falls under the configured thresholds.

        The DataInsightWatchdog service automatically resets the safeguard mode when the free disk space is more than the configured thresholds.

        You can edit the threshold limits as required. If you specify values in terms of both percentage and size, then the condition that is fulfilled first is applied to initiate the safeguard mode.