Duplicate items are appearing in the Review Set and in Exports from Symantec Enterprise Vault (EV) Compliance Accelerator (CA) and Discovery Accelerator (DA).
Beginning with Compliance Accelerator (CA) and Discovery Accelerator (DA) 9.0, deduplication features are built-in to reduce the number of duplicate items either captured in CA searches or viewed in CA and DA review and / or export sets.
For Compliance Accelerator:
Items captured by Random Sampling are not deduplicated. CA Searches perform deduplication processing at the individual search level. When the Review tab is selected in the CA Client and a specific Department is selected for review, items are displayed based on the selection facets in the filter pane. When an export is configured within the Department's Export tab, the export selection facets determine what is to be exported.
Duplicate items can be seen in the Department Review Set or selected for export when any of the following conditions are met, regardless of the 'Include items already in review' option usage in the search or searches that captured the duplicate items:
1) No facets are selected to show or export only items captured by specific searches. This could include Random Sampled items along with search captured items in the Export or Review Set.
2) At least 2 searches have been run with the same criteria where duplicates of a message exist in the archive(s) being searched and no Search facets are selected to narrow the Export or Review Set contents to the captured items from a specific search.
For Discovery Accelerator:
Items captured by DA searches are deduplicated during the Review Set processing based on the Stack option setting of None, Similar, or Duplicate. This Stack option is available in the list view (middle) pane on the Review tab in the DA Client. Items captured by DA searches are deduplicated during the Export Set processing based on export selection facets.
For DA Review Sets:
- When the Stack option is set to none, duplicate items captured in the Case will be shown in the review set if they were captured by a single search or if they were captured by multiple searches.
- When the Stack option is set to 'Similar', duplicate items are determined based on metadata such as, but not limited to, ConversationID, Author, Recipient(s), and Subject.
- When the Stack option is set to 'Duplicate', which is only available if Analytics has been enabled on the DA Case, duplicate items are determined based on the same metadata used by the 'Similar' option and the item content.
For DA Export (or Production) Sets:
Duplicates in exports are controlled by the facet Exclude duplicate items check box. If checked, duplicates are to be excluded.
For both CA and DA:
The search processing determines a 'master' item when duplicate items are found. In the CA search processing, only the 'master' item is kept for the review set. All other duplicates are removed from the search and the CA Customer database. Subsequent CA searches that capture the same items may determine a different 'master' item. This allows duplicate items to be available for review and export.
This same search processing in DA still identifies a 'master' item, even though all duplicate items are added to the DA Case. Multiple searches capturing the same duplicate items can have a different 'master' item identified, just as in CA. This identification of a different 'master' item, and items found in archives hosted in different Enterprise Vault Vault Stores can cause duplicates to be in the Export and Review Sets regardless of the export and review facets selected.
Per 'White paper: Deduplication in Compliance Accelerator and Discovery Accelerator' (see DOC3621 in the Related Articles section below), both CA and DA use certain metadata to calculate a hash valued used to determine message uniqueness. Messages with the same metadata create the same hash value and are considered duplicates. Messages with differing metadata create different hash values and are considered unique instances of an item.
For example, one common cause of duplicate items in CA and DA searches is the ConversationID in the 'duplicate' instances of a message or Calendar appointment. The ConversationID is normally created by the item originating application. Certain e-mail applications do not honor a ConversationID, nor do they create one in new items. Such applications include, but are not limited to, Outlook Web Access and some built-in cellular phone e-mail clients. Such applications will not take an existing ConversationID and add to it as other e-mail applications do, but they remove the ConversationID from existing messages' replies or forwards. In such instances, the e-mail server receiving the item creates a new ConversationID for each instance it encounters.
A workaround exists for Compliance Accelerator (CA) only:
Since the CA Search is the delimiter for deduplication, use the Searches facet and select the search in which the items were captured to export or see and review the instance from that search for best results. After completing the export or review of the items from that search, select the next search in the facets to export or see and review that search's instance of the message along with the remaining items from that search. Repeat these steps as necessary until all appropriate items are exported or reviewed. Note that duplicates can still appear if the metadata used to calculate the hash used to determine uniqueness contains different information, such as the ConversationID of a message.
There are currently no plans to address this issue by way of a patch or hotfix in the current or previous versions of the software at the present time. This issue may be resolved in a future major revision of the software at a later time. However, this particular issue is not currently scheduled for any release.
If you feel this issue has a direct business impact for you and your continued use of the product, please contact your Symantec Sales representative or the Symantec Sales group to discuss these concerns.
For information on how to contact Symantec Sales, please see http://www.symantec.com
- Enterprise Vault 9.0 or greater
- Compliance Accelerator or Discovery Accelerator 9.0 or greater