Deduplication of items found in Enterprise Vault (EV) Compliance Accelerator (CA) or Discovery Accelerator (DA) review sets for export / Production.

Article: 100021703
Last Published: 2020-08-05
Ratings: 1 0
Product(s): Enterprise Vault

Problem

Deduplication of items found in Enterprise Vault (EV) Compliance Accelerator (CA) or Discovery Accelerator (DA) review sets for export / Production.

Cause

When Enterprise Vault (EV) archives messages through journal archiving or user or shared mailbox archiving, each unique message receives a unique TransactionID assigned by the archiving process.  Multiple copies of the same message can be considered unique in themselves under certain conditions.  Those conditions include, but are not limited to:

  1. Multiple Exchange mailbox servers archiving to one Journal mailbox and the message was sent to recipients on multiple Exchange servers
  2. Each Exchange server 'tags' the message as being from an Exchange server other than the server hosting the journal mailbox, thereby making each copy unique
  3. Multiple Exchange mailbox servers, each with its own Journal mailbox and the message was sent to recipients on multiple Exchange servers
  4. Each Exchange server has its own unique identifier that is placed on the message, making each copy unique to that server
  5. An archiving problem occurring that causes messages to be archived multiple times
  6. If a message is set to 'Pending Archive' status and is actually archived, but the 'Pending Archive' status is reset prior to post processing, the message will be archived again as an unique message
  7. User mailbox archiving of the same message sent to multiple recipients
  8. Each user mailbox is considered an unique identity under which the message is archived
  9. Each journal or user mailbox archive in which a copy of a message is stored is in a different Vault Store.

Deduplication of messages occurs with the storage of those messages.  Single instance storage (SIS) is part of the process EV uses to store one instance of the archived copy of the message in Digital Vault Store (DVS) or Digital Vault File (DVF) format.  When the same message is archived multiple times, as in from different user mailboxes or different journal mailboxes, the recipient information is added as a sharer to the DVS file or its information in the database.  Only one copy of the DVS file would exist, but multiple instances of it could be returned in a search.

When Compliance Accelerator (CA) or Discovery Accelerator (DA) run a search, the search will return hits based on the search criteria.  If searching against only journal archives, duplicates of a message would be returned as long as EV assigned multiple unique Transaction IDs to the message.  If searching user archives, each recipient (and possibly the sender) copy of the message would be returned in the search hits.  If searching journal and user archives, copies would be returned for each archived recipient and each unique copy in the journal archive.

CA and DA have the option to export or Produce the search results that have been accepted into the review sets for CA Departments or DA Cases.  CA and DA will export all items within a Department or Case, within a specific Search, or within other export or Production run criteria.  If duplicates of messages exist in the Department / Case / Search, they will be exported or Produced.  There are some possible options to remove or reduce the duplicates either before export / Production or after.

Note that the CA Journal Connector (JC) will provide deduplicated messages captured through the Random Sampling (Random Capture) processing on a per Journal Connector basis.  For example, if a message is sent to multiple recipients on multiple Exchange servers, and those Exchange servers are all serviced by the same Journal Task, the JC could provide one instance of the message for the Random Sampling processing.  If, however, a message was sent to multiple recipient on multiple Exchange servers, and those Exchange server are serviced by different EV Journaling servers, each with its own JC, then an instance of the message could be provided by each JC for Random Sampling.

 

Solution

For all of the 9 scenarios presented above, two possible solutions currently exist for obtaining only 1 instance of duplicate items that are considered unique enough that the deduplication processing within CA searches or DA review sets cannot determine they are duplicates.

1) Place a specific Mark in DA, comment in CA or DA, or review status in CA or DA on only one instance of the duplicate messages, and apply the same Mark, comment or review status to all other items that you want to export or Produce, then export or Produce only those items that have that Mark, comment or review status.  This is a method to de-duplicate items from within CA or DA.

2) Use 3rd party utilities.  Such utilities can scan through PST files to remove duplicate copies of messages.  These utilities can be used on PST files prior to ingesting them into EV and then searching in CA or DA, or they can be used on the CA or DA created PST files from an export or Production run.  Veritas does not maintain a listing of all such utilities.

 

 

 

 

Was this content helpful?