Machine Learning in eDiscovery Upstream: Sentiment Analysis & Classification:

Veritas Perspectives June 21, 2023

In this era where the global data sphere is expected to reach 175 zettabytes by 2025, it has never been more important for organizations to understand their data as early as possible. It’s not only important to apply traditional “right-side” EDRM phases like Analysis and Review further upstream, but it’s important to fully leverage technology to tame “eDiscovery in the wild”.

Two machine learning tools to leverage in eDiscovery further upstream are sentiment analysis and automated (auto) classification. Let’s look at these two technical approaches and how they facilitate conducting eDiscovery further upstream in to support today’s Big Data challenges.

What is Sentiment Analysis?

Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing (NLP) to identify and extract subjective information from source materials. It involves determining the attitude, sentiment, or emotion of a subject based on their spoken or written content. Sentiment analysis can be applied to a wide variety of data, and it can provide an organization with a wealth of information about its customers, products, brand, and even employees.

The simplest form of sentiment analysis is binary classification (positive or negative), but it can also involve more nuanced classifications such as “neutral”, “happy” or “sad”. After the sentiment is determined, the results are interpreted to generate insights. This might involve identifying trends over time, comparing sentiment between different demographic groups and so forth.

What is Auto Classification?

Automated data classification utilizes advanced algorithms and machine learning models to effortlessly organize information into predefined classes or categories without the need for manual intervention. This streamlined process uncovers patterns and extracts valuable insights from vast quantities of data rapidly and effectively. This approach commonly employs innovative technologies such as NLP and image recognition, as well as other AI and machine learning methodologies.

Applying Sentiment Analysis Upstream

The ability to conduct sentiment analysis upstream not only enables organizations to conduct eDiscovery to support use cases such as litigation more efficiently, but conducting sentiment analysis early enough could help avoid litigation altogether! Here are ways in which sentiment analysis can support the discovery process:

  • Prioritizing Documents: Utilizing sentiment analysis can help to prioritize documents that are more likely to contain relevant information. Documents exhibiting a notably negative sentiment may contain evidence of misconduct, making them excellent candidates for prioritization.
  • Identifying Patterns: Sentiment analysis can identify patterns in communication that may be relevant to a case. This could include escalating negative sentiment between two parties or sudden shifts in sentiment that correspond to key events.
  • Highlighting Potential Issues: Sentiment analysis proves helpful in identifying documents that need additional review within large document collections. For example, it could flag emails with a strong negative sentiment for further review.

Sentiment analysis can also help organizations proactively avoid litigation by regularly analyzing internal communications and flagging instances of negative sentiment. As a result, organizations can identify potential issues before they escalate into legal disputes. This could include workplace harassment, discrimination, or other forms of misconduct.

Not only that, but sentiment analysis can also be used to monitor compliance with internal policies or identify potential sources of dissatisfaction from customers that could lead to litigation. The ability to leverage sentiment analytics to a greater portion of your organization’s corpus than ever is moving this analysis further upstream in the eDiscovery life cycle – to the Identification phase and even Information Governance!


Applying Auto Classification Upstream

The creation of customizable auto classification rules can also have a significant benefit when applied upstream in eDiscovery or even during Information Governance. For example:

  • Data Privacy and Compliance: Auto classification can identify sensitive data, such as personally identifiable information (PII) or protected health information (PHI). By flagging this data, organizations can ensure it is handled properly, reducing the risk of data breaches that involve this sensitive information or non-compliance with data protection regulations, such as FINRA, SEC, MiFID II, GDPR, CPRA, HIPAA, PCI, ITAR, SOX, and more. For discovery purposes, it can also lead to early identification of ESI that may need to be redacted before being produced, enabling that ESI to be routed through custom workflows.
  • Reduction of ROT Data: A significant percentage of data in organizations is either Redundant, Obsolete, or Trivial (ROT) data that provides no value to the organization. Data remediation of ROT data before discovery begins saves considerable discovery costs and auto-classification can help identify that ROT data earlier in the life cycle.
  • Sentiment Analysis and Language Detection: Auto classification rules can even be created to support sentiment analysis and language detection to enable custom workflows to be developed to streamline discovery for documents that meet these rules.

Organizational needs change continually to identify different types of documents or comply with ever-changing laws and regulations, so the ability to customize auto-classification to support those ever-changing needs is paramount.


Getting started with eDiscovery when the case or project begins is too late! So, beginning analysis on the data after collection is beyond late! With Big Data, continually changing regulations and other challenges, organizations must not only move eDiscovery upstream in the EDRM life cycle, but they must also leverage state-of-the-art technologies to conduct eDiscovery upstream as efficiently and cost-effectively as possible. Leveraging machine learning technologies such as sentiment analysis and auto-classification maximizes the benefit of moving eDiscovery upstream!

For more regarding how Veritas Alta™ Classification pairs with the Veritas Data Compliance & Governance Solutions Portfolio to support your storage optimization, eDiscovery, regulatory compliance and data security needs, click here.

Irfan Shuttari
Director of eDiscovery Strategy, Product Management
VOX Profile