cancel
Showing results for 
Search instead for 
Did you mean: 

Seeking SQL query returning per-user count of (non-legal hold) items to be expired from EV

Dan_L
Level 3
Partner Accredited

Hi all,

Long-time listener, first-time caller.

Our customer is looking to turn on expiry in their environment for the first time. They use EV in a non-traditional manner, so of 32MM+ items, upwards of 2/3 (maybe even 9/10) are on legal hold via DA. Effectively the only items not on legal hold are from cases that were active but have since been closed, removing the holds. Many of the holds overlap, as well.

 

We have successfully tested expiry in a near-duplicate QA EV environment and removed about 2MM items without incident. After presenting that result, the customer's IT team has requested that we add an additional step to present to legal before advancing into Production. They'd like as granular a report as possible of what will be removed so they can confirm the data to be expired should be expired. Criteria are below:

  • A simple count of items to be removed per user would be the bare minimum.
  • A better solution would include retention categories
  • The perfect query would provide:
    • Granular item-by-item reporting (for at least some users)
    • A means by which at least a subset of physical savesets could be located and presented 

After digging around the forums for quite some time, I've tested versions of @JesusWept3's query (which looks like it should do exactly what I need) but:

  • In my lab (which does not have DA) I need to remove any references to the holdsaveset table or else it returns no results 
    • I'm considering adding DA to my lab to test it, but I have little doubt that @JesusWept3 knows his stuff. I'm confident that it will work.
  • In the QA environment, looking at just one of three vault stores, I stopped it after about 36 hours of execution, made some modifications and was unable to make it work more efficiently. I started it again and it has still not completed running after another hours. Even if it does conclude after, say, 48 hours, this is not likely to be an acceptable option to management.

Please let me know if you have any tweaks, experience or advice to offer

10 REPLIES 10

ChrisLangevin
Level 6
Employee

I know you're looking for a SQL query, but have you considered using the index for this?

The ndte property represents the number of days until expiry for an item. You can run a search for all items that currently qualify for expiry by searching

ndte:<0

One advantage here is that it is easy to see the complete item-by-item details and previews using EV Search, as opposed to trying to scrounge what item properties are available in SQL (not many).

The other advantage is that if your item is subject to a Retention Plan, then the Retention Category listed on its Saveset record may not be the actual Retention Category against which its eligibility for expiry is calculated. The ndte index property takes this Retention Plan possibility into account and is therefore more accurate (provided your indexes are current, of course).

You can get environment-wide results by running the same search in Discovery Accelerator.

If you prefer to center the search around a specific date instead of number-of-days-until, you can also use the edat property to do that kind of search.

--Chris

Dan_L
Level 3
Partner Accredited

@ChrisLangevin wrote:

I know you're looking for a SQL query, but have you considered using the index for this?

Hi Chris! First, thanks so much for your reply! I hadn't considered either of the properties you mentioned (SQL seemed a better fit off the top of my head) but am of course not opposed to anything that provides the end results meet the customer's needs. I should mention that prior to the review of these results, the customer has asked that all items with non-default retention categories be changed (via SQL update statement) to the default retention category, after which the indexes are being rebuilt.

The ndte property represents the number of days until expiry for an item. You can run a search for all items that currently qualify for expiry by searching

ndte:<0

One advantage here is that it is easy to see the complete item-by-item details and previews using EV Search, as opposed to trying to scrounge what item properties are available in SQL (not many).

The other advantage is that if your item is subject to a Retention Plan, then the Retention Category listed on its Saveset record may not be the actual Retention Category against which its eligibility for expiry is calculated. The ndte index property takes this Retention Plan possibility into account and is therefore more accurate (provided your indexes are current, of course).

Since this is EV11, not EV12, Retention Plans are not going to be a factor. I'm wondering, though, if this language about retention plans would apply to Legal Holds. If a given item was subject to expiry but also blocked from expiry by DA Legal Holds, would the user's item(s) show as expirable using either ndte or edat?

You can get environment-wide results by running the same search in Discovery Accelerator.

If you prefer to center the search around a specific date instead of number-of-days-until, you can also use the edat property to do that kind of search.

DA seems like the better option, except for the fact that the transition from Prod DA into QA DA has left some searches unable to conclude the way one would hope (for example, one case has 18MM items and two weeks later, no legal holds have applied).

Again, many thanks for the quick reply!

The index properties ndte and edat just translate the item's current Retention Category or Plan into a concrete date. They will not account for expiry "override" options such as:

  • DA legal holds
  • The "Delete expired items from this archive automatically" option on archive properties
  • The "Prevent automatic deletion of expired items with this category" option on Retention Category properties

--Chris

Dan_L
Level 3
Partner Accredited

@ChrisLangevin wrote:

The index properties ndte and edat just translate the item's current Retention Category or Plan into a concrete date. They will not account for expiry "override" options such as:

  • DA legal holds
  • The "Delete expired items from this archive automatically" option on archive properties
  • The "Prevent automatic deletion of expired items with this category" option on Retention Category properties

--Chris


Chris,

Thanks. That, unfortunately, is precisely my problem. The customer wants a proactive window into exactly what will (at least by per-user count) be expired out of the environment upon initiation of storage expiry. A list of all items that are eligible for expiry would be close to 20MM. A list of all items eligible for expiry that are NOT on legal hold would be literally an order of magnitude lower at less than 2MM. They want to review the 2MM.

Any thoughts based on that narrowed definition of "scope?"

Thanks again,

Dan

Dan,

Based on this further information, then it does seem we will need to use SQL to get this information. I ran it by our DA support staff and they suggested you open a DA support case and they will assist you with it. Best of luck.

 

--Chris

JesusWept3
Level 6
Partner Accredited Certified
Just as a matter of interest, which query is it because I think I've written eight iterations of it over the years
https://www.linkedin.com/in/alex-allen-turl-07370146

Dan_L
Level 3
Partner Accredited

Hi @JesusWept3, didn't realize the link was broken. I used this link (query inline as well):

 - Makes sure the user is enabled for expiry (enterpriseVaultDirectory.dbo.DeleteExpiredItems = 1)
 - Discounts items in the SavesetHold table (if they're on hold, they would not be deleted)
 - Links to the Archive table and not the ExchangeMailboxEntry table (so that it encompasses archives without users, shared archives etc)


SELECT A.ArchiveName, COUNT(DISTINCT(S.IdTransaction)) AS ItemsToExpire, RCE.RetentionCategoryName

FROM EnterpriseVaultDirectory.dbo.Archive A,
            EnterpriseVaultDirectory.dbo.Root R,
            EnterpriseVaultDirectory.dbo.RetentionCategoryEntry RCE,
            EVVaultStore.dbo.Saveset S,
            EVVaultStore.dbo.ArchivePoint AP,
            EVVaultstore.dbo.HoldSaveset HS

WHERE S.ArchivePointIdentity = AP.ArchivePointIdentity
  AND AP.ArchivePointId = R.VaultEntryId
  AND R.RootIdentity = A.RootIdentity
  AND S.RetentionCategoryIdentity = RCE.RetentionCategoryIdentity
  AND S.SavesetIdentity != HS.SavesetIdentity
  AND S.ArchivePointIdentity != HS.ArchivePointIdentity
  AND S.IdDateTime < DateAdd(year, -1, GetDate())
  AND RCE.RetentionCategoryName = 'Business'
  AND A.DeleteExpiredItems = 1

GROUP BY A.ArchiveName, RCE.RetentionCategoryName
ORDER BY A.ArchiveName

AndrewB
Moderator
Moderator
Partner    VIP    Accredited

I didn't see this pointed out specifically so I figured it's worth chiming in. @JesusWept3's script works in the lab where we have 1000 archived items and 150 on hold. However, in the QA environment with the customer's data, we have 30 million items with 24 million on hold (and multiple holds with lots of overlap) the query has been running for 48 hours and still hasn't completed. The QA has no other activity going on.

Dan_L
Level 3
Partner Accredited

@JesusWept3:

Edited a bit for brevity and then refined to target a single test user with just 1100 items in their archive - all currently on hold - (using "where r.rootidentity = <user>") , the original query took a full 3.5 hours to complete.

I was able to get the same results, interestingly enough immediately by running the same query but editing: 

and s.savesetidentity = hs.savesetidentity

to

and s.savesetidentity != hs.savesetidentity

Does that make sense to you?

Here was the query that returned immediate results:

SELECT A.ArchiveName, COUNT(DISTINCT(S.IdTransaction)) AS ItemsToExpire, RCE.RetentionCategoryName

FROM EnterpriseVaultDirectory.dbo.Archive A,
            EnterpriseVaultDirectory.dbo.Root R,
            EnterpriseVaultDirectory.dbo.RetentionCategoryEntry RCE,
            EVVaultStore.dbo.Saveset S,
            EVVaultStore.dbo.ArchivePoint AP,
            EVVaultstore.dbo.HoldSaveset HS

WHERE R.RootIdentity = 5
  AND S.ArchivePointIdentity = AP.ArchivePointIdentity
  AND AP.ArchivePointId = R.VaultEntryId
  AND R.RootIdentity = A.RootIdentity
  AND S.RetentionCategoryIdentity = RCE.RetentionCategoryIdentity
  AND S.SavesetIdentity = HS.SavesetIdentity
  AND S.ArchivePointIdentity != HS.ArchivePointIdentity
--  AND S.IdDateTime < DateAdd(year, -1, GetDate())
--  AND RCE.RetentionCategoryName = 'Business'
--  AND A.DeleteExpiredItems = 1

GROUP BY A.ArchiveName, RCE.RetentionCategoryName
ORDER BY A.ArchiveName

Dan_L
Level 3
Partner Accredited

Thanks @ChrisLangevin. Unfortunately, after opening a case and talking about it at some length, DA Support declined to pursue this in depth on advice from management.