Highlighted

COVID-19 policies

Our internal Veritas Digital Compliance 'Braintrust' collaborated on a set of Veritas Information Classifier (VIC) policies that may be useful for tagging COVID-19 related content with Enterprise Vault, Enterprise Vault.cloud, eDiscovery Platform, Data Insight & Information Studio:

  • US-COVID-19 - Consists of several patterns for different types of keywords (Common term, people, drugs, and the ICD 10 CM Diagnosis Index patterns below)
  • US-COVID-19 ICD 10 CM Diagnosis Indexes (CDC Feb-Apr-2020) - Consolidated list of CDC diagnosis Indexes across several periods where the CDC provided guidance (coding guidance changed over time due to speed of the pandemic)
  • US-COVID-19 ICD 10 CM Exposure Diagnosis Index (CDC Feb-Apr-2020) - Diagnosis index for confirmed exposure to COVID-19
  • US-COVID-19 ICD 10 CM Ruled Out Diagnosis Index (CDC Feb-Apr-2020) - Diagnosis index for exposure to COVID-19 being ruled out
  • US-COVID-19 ICD 10 CM Signs and Symptoms Diagnosis Indexes (CDC Feb-Apr-2020) - Diagnosis index for signs and symptoms related to COVID-19
  • US-COVID-19 ICD 10 CM Diagnosis Indexes possibly leading to COVID-19 (CDC-Feb-Apr-2020) - Other diagnosis codes potentially related to COVID-19

In the description of the ICD 10 CM policies there are links to some key CDC documents showing the guidance they provided over time. 
Tip: Download these files and use them for a quick test of the policies in your VIC environment.
Tip: "Batch Link Downloader" is a cool extension you can use to bulk download docs for testing via Chrome.

Below is a screenshot of a test we performed using these CDC documents as well as a number of public documents from the NY State Office of Mental Health site.  Note the classification tags in the 'Tags' column in EV Search below.  You can use these tags in facet filters in our compliance and discovery solutions.  See my other post in this group on how to import / export these VIC policy packages (the COVID-19 policy packaged is attached to this post).

Please provide feedback if you test these policies and if you make any additions or modification please consider sharing your updated policies here by attaching to a reply.

If you don't have a test environment and want to have a quick peek just send me a note and we can do a quick screen share.

Disclaimer:
These policies are provided as an example (something to help get you started so you are not having to start from a 'blank page').  Please review and test them thoroughly and modify them to suit your requirements before putting them into production.

COVID-19 VIC Policies.png

Note: These sample rules were exported from an EV 12.5.0 / VIC 2.2.1 enviornment.  If you are running an older version of VIC you will see a warning saying "Newer package version", this is expected.

06/20/2020 - updated policies to contain optimizations to reduce false positives and changed naming of some items to reflect US centric ICD-10-CM diagnosis codes.  ICD-10 is the international list of diagnosis codes and ICD-10-CM (Clinical Modification) is the version that the US NCHS has modified for use in the US.  https://www.cdc.gov/nchs/icd/icd10cm.htm 

6 Replies
Highlighted

Re: COVID-19 policies

Rick,

Great work!  I downloaded and imported the policies into my EV lab yesterday morning.  I have an active SMTP journal that contains email accounts from Office 365 and Gmail.  I logged in to EVS this morning and I have already had five tags. 

ev-covid.png

Highlighted

Re: COVID-19 policies

Nice work Evan

Highlighted

Re: COVID-19 policies

Do we want a way to exclude news articles / newsletters as they might not be relevant? Seems like a false positive to me

Highlighted

Re: COVID-19 policies

Thanks Aayush, I was actually considering that but feel that the classification policy is doing it's job and accurately tagging the item.  We have other capabilities in our discovery solutions to help filter out things like newsletters such as sender/domain exclusion lists and Intelligent Review. 

Also it would be possible to do something here in VIC as well / instead - i.e. create another policy with a pattern to detect things like known newsletters and set the action to 'discard'.

Highlighted

Re: COVID-19 policies

The polices have been updated to reflect two changes based on testing and feedback we have received:

  1. The polices were imported into VIC on an EV 12.5.0.1221 (GA release) version of EV exported again.  Several folks noticed that they would get a warning upon import because the EV / VIC version we were testing with was newer.  You will see this warning pop up when you import these policies if you are running any version of EV below 12.5.0.1221... this is ok.  You should see 6 policies, 11 patterns and 6 tags import sucessfully after saying yes to the following warning message:
    VIC Warning on Import.png
  2. We noticed what appeared to be some false positives on some NDRs in our live mail stream.  After looking into this more we could see that at least one of the ICD 10 CM diagnosis codes was really short (like 3 characters) such as 'R06' which is one of the Symptoms related to COVID  -AND-  we had the 'String Match' option checked in the pattern so this means that VIC was finding this 'R06' string in host names in email headers and even inside long random hashes!  By unchecking the 'String Match' option this should limit this as matches will have to be on 'R06' as a separate word (with spaces around it).
    String Match.png
Highlighted

Re: COVID-19 policies

We updated the Covid policies (see main post above) to contain optimizations to reduce false positives and changed naming of some items to reflect US centric ICD-10-CM diagnosis codes.  ICD-10 is the international list of diagnosis codes and ICD-10-CM (Clinical Modification) is the version that the US NCHS has modified for use in the US.  https://www.cdc.gov/nchs/icd/icd10cm.htm 

As an added bonus we performed testing with a sample Zoom 1:1 doctor-patient telehealth interaction.  The 8 min .mp4 file was run through an ASR engine and had other advanced processing performed (more on this in a later post) and the artifact was archived and classified with VIC of course.  The outcome was amazing - the main Covid policy was triggered as well as the built-in ICD-10-CM policy which was triggered via 5 'unique' hits of 'textual names' of diagnosis indexes.  

ICD-10-CM Policy.jpgICD-10-CM Policy-Test.pngICD-10-CM Policy-Test-Table.jpg

…but there were 5 unique items so it was just enough to trigger the ICD-10-CM policy, as expected!!!

COVID-ASR-POST-1.jpg