05-04-2020 08:20 PM - edited 06-20-2020 04:51 PM
Our internal Veritas Digital Compliance 'Braintrust' collaborated on a set of Veritas Information Classifier (VIC) policies that may be useful for tagging COVID-19 related content with Enterprise Vault, Enterprise Vault.cloud, eDiscovery Platform, Data Insight & Information Studio:
In the description of the ICD 10 CM policies there are links to some key CDC documents showing the guidance they provided over time.
Tip: Download these files and use them for a quick test of the policies in your VIC environment.
Tip: "Batch Link Downloader" is a cool extension you can use to bulk download docs for testing via Chrome.
Below is a screenshot of a test we performed using these CDC documents as well as a number of public documents from the NY State Office of Mental Health site. Note the classification tags in the 'Tags' column in EV Search below. You can use these tags in facet filters in our compliance and discovery solutions. See my other post in this group on how to import / export these VIC policy packages (the COVID-19 policy packaged is attached to this post).
Please provide feedback if you test these policies and if you make any additions or modification please consider sharing your updated policies here by attaching to a reply.
If you don't have a test environment and want to have a quick peek just send me a note and we can do a quick screen share.
Disclaimer:
These policies are provided as an example (something to help get you started so you are not having to start from a 'blank page'). Please review and test them thoroughly and modify them to suit your requirements before putting them into production.
Note: These sample rules were exported from an EV 12.5.0 / VIC 2.2.1 enviornment. If you are running an older version of VIC you will see a warning saying "Newer package version", this is expected.
06/20/2020 - updated policies to contain optimizations to reduce false positives and changed naming of some items to reflect US centric ICD-10-CM diagnosis codes. ICD-10 is the international list of diagnosis codes and ICD-10-CM (Clinical Modification) is the version that the US NCHS has modified for use in the US. https://www.cdc.gov/nchs/icd/icd10cm.htm
05-14-2020 07:57 AM - edited 05-14-2020 08:04 AM
Rick,
Great work! I downloaded and imported the policies into my EV lab yesterday morning. I have an active SMTP journal that contains email accounts from Office 365 and Gmail. I logged in to EVS this morning and I have already had five tags.
05-14-2020 08:19 AM
Nice work Evan
05-14-2020 08:33 AM
Do we want a way to exclude news articles / newsletters as they might not be relevant? Seems like a false positive to me
05-14-2020 08:42 AM
Thanks Aayush, I was actually considering that but feel that the classification policy is doing it's job and accurately tagging the item. We have other capabilities in our discovery solutions to help filter out things like newsletters such as sender/domain exclusion lists and Intelligent Review.
Also it would be possible to do something here in VIC as well / instead - i.e. create another policy with a pattern to detect things like known newsletters and set the action to 'discard'.
05-21-2020 07:46 PM - edited 05-21-2020 08:00 PM
The polices have been updated to reflect two changes based on testing and feedback we have received:
06-20-2020 05:33 PM
We updated the Covid policies (see main post above) to contain optimizations to reduce false positives and changed naming of some items to reflect US centric ICD-10-CM diagnosis codes. ICD-10 is the international list of diagnosis codes and ICD-10-CM (Clinical Modification) is the version that the US NCHS has modified for use in the US. https://www.cdc.gov/nchs/icd/icd10cm.htm
As an added bonus we performed testing with a sample Zoom 1:1 doctor-patient telehealth interaction. The 8 min .mp4 file was run through an ASR engine and had other advanced processing performed (more on this in a later post) and the artifact was archived and classified with VIC of course. The outcome was amazing - the main Covid policy was triggered as well as the built-in ICD-10-CM policy which was triggered via 5 'unique' hits of 'textual names' of diagnosis indexes.
…but there were 5 unique items so it was just enough to trigger the ICD-10-CM policy, as expected!!!