cancel
Showing results for 
Search instead for 
Did you mean: 

Safe Guard DataInsight Application settings

Pix_R
Level 5

Hello, hopefully the forum is still monitored. I have a question.

Since Safe guard is a global attribute, it is a one size fits all bucket that means we can only be as effective as our smallest or most demanding application winnas host. Obviously setting it too low could be just a fatal to an application requiring say 25% of its current disk space for temporary storage for a mission critical application but that same setting could be extremely wasteful on the large file share server set up to host intra-domain files.

If we assume the issue then becomes a balance but balance becomes a trade off we find we need another solution from the application.


Consider  

Review of the  current status of company assets and the desire to utilize resources budgeted for IT to their fullest extent.

The node has

82,376,040,448 bytes free of 127GB on c:

245,829,828,608 bytes free of 1.9TB on E:

 

Our space parameter is calculated on a % basis for the safe guard tool to shut down the DI processes to save a full disk condition as a size value is too precise for the entire enterprise. We move our data to a shared file system, our applications to their own drive and keep the system drive for the operating system and production tools.

The paramater has become a mean calcualtion over the different uses of the windows servers across the enterprise:

2020-10-25 15:12:13 INFO:    #{13} [SystemMonitorJob.canResetSafeguard] Unit for safeguard: PERCENT

2020-10-25 15:12:15 INFO:    #{13} [SystemMonitorJob.canResetSafeguard] Current Disk Utilisation (%): 79

2020-10-25 15:12:15 INFO:    #{13} [SystemMonitorJob.canResetSafeguard] Reset (%): 75

2020-10-25 15:13:11 INFO:    #{12} [SystemMonitorJob.enforceSafeguard] Current node state is SAFEGUARD

      as space is increasedly used by other processes running on the server we continue to shut down processes until

2020-10-30 09:33:12 INFO:    #{12} [ServiceUtils.isNTServiceRunning] DataInsightWinnas service is stopped

2020-10-30 09:33:12 INFO:    #{12} [SystemMonitorJob.enforceSafeguard] Current node state is WINNAS_STOPPED

2020-10-30 09:33:12 INFO:    #{12} [SystemMonitorJob.deleteAtticFiles] Deleting files in Attic Folder

2020-10-30 09:33:12 INFO:    #{12} [ServiceUtils.isNTServiceRunning] DataInsightWinnas service is stopped

2020-10-30 09:33:12 INFO:    #{12} [SystemMonitorJob.canResetSafeguard] Unit for safeguard: PERCENT

2020-10-30 09:33:14 INFO:    #{12} [SystemMonitorJob.canResetSafeguard] Current Disk Utilisation (%): 81

2020-10-30 09:33:14 INFO:    #{12} [SystemMonitorJob.canResetSafeguard] Reset (%): 75

 

The service cannot start until space is free below 75% on the drive for this application feature to enable the application.

The loss of audits effects the reputational aspects of the DI application in the organization as their are regulatory and business need for the legal backing of the  audit trail.

Overtime we have utilized the machine to the point of exceeding the threshold but cannot set the value realistically by object.

A reboot did not clear it indicating it is not temporary files but data occupying the space:

System Boot Time:          10/25/2020, 4:12:29 AM

 

Not only are we raising NOCC alerts, our DI data cannot be relied upon for nodes that frequently remain in Safe Guard status in perpetuity after filling the drive beyond the setting.

Lowering the setting on this global attribute could risk applications running on the server that require (X) temporary space.

My question is whether there is a hidden non-documented object attribute or method to manipulate the parameters I could set to override the global attribute by device to allow for a more precise configuration across the multiple business need scenarios in a typical large enterprise?

I would need to isolate 

watchdog.safeguard.enabled=true
watchdog.safeguard.unit=PERCENT
watchdog.safeguard.threshold.percentage=80
watchdog.safeguard.reset.percentage=75

per node desired, otherwise use the global setting.

Thank you

    Pix

0 REPLIES 0