Description
This document provides design and deployment considerations for implementing eDiscovery Platform on VMware.
Author: Kevin Graves
Scope of Document:
This document aims to provide guidance on designing and deploying eDiscovery Platform on the VMware vSphere platform.
This document should be used in conjunction with other performance and best practice guides as outlined in the “Related Documents” section of this document.
Intended Audience:
This document is aimed at system administrators, solutions architects, and consultants.
It is assumed that the reader has a thorough understanding of the architecture and operational aspects of eDiscovery Platform.
It is also assumed that the reader has experience and understanding of VMware vSphere.
Choosing the Right Platform for Your Environment
Virtualization technology has helped many customers introduce cost savings both in terms of lowered data center power consumption and cooling requirements. Virtualization typically also simplifies the datacenter landscape through server consolidation, requiring less hardware to provide the same service to end users with the added benefit of application independent high availability.
Application architectures are rapidly evolving towards highly distributed, loosely-coupled applications. The conventional x86 computing model, in which applications are tightly coupled to physical servers, is too static and restrictive to efficiently support most modern applications. With a virtual deployment, the architecture can be as modular as is appropriate, without expanding the hardware footprint. The dynamic nature of virtual machines mean that the design can grow and adapt as required, without the need for an initial “perfect” design.
Virtual deployments typically take minutes, can share currently deployed hardware, and can be adjusted “on the fly” when more resources are required. Certain server applications however are less suitable for virtualization, especially those requiring heavy use of physical server resources such as CPU and memory.
Traditionally customers have been reluctant to place applications with high service level agreements such as Microsoft Exchange Server and SQL Server on a virtual platform, not only because the application’s demand on resources meant that only one or two virtual machines could co-exist on a single server, but also because the server could not offer the same performance it would have on a physical server.
A number of factors should be considered before deploying eDiscovery Platform in a VMware environment:
- eDiscovery Platform is heavily dependent on CPU and memory resources. In a typical physical server configuration, it is not unusual for the CPU to run at 90% or higher utilization while ingesting data, running an OCR job or exporting is being performed.
- Generally, the more powerful the processor, the better the ingestion and retrieval rates
- The minimum recommendation for CPU and memory configuration for Stand-Alone eDiscovery Platform is 32 CPU cores and 128GB RAM for an eDiscovery Platform server running Collections, Legal Holds and Cluster Master (with no cases).
- If the eDiscovery Platform server will be used as a Worker Node for Pre-processing, Processing, Analysis and Review, the minimum recommended configuration is 24 CPU cores and 96GB RAM.
- It is recommended that CPU and Memory resources are dedicated (reserved) and Locked to the eDiscovery Platform server, and not shared with other virtual machines on the host.
- Other system components such as network and storage need to be sized accordingly to prevent them from becoming a bottleneck
If the above considerations are acceptable and supported by the customer environment, then it is likely that virtualizing the eDiscovery Platform environment will be a good fit for the organization.
Sizing eDiscovery Platform for VMware
One of the most important considerations when sizing eDiscovery Platform is a thorough understanding of the expected workload on each of the eDiscovery Platform servers; with the main consideration being the customer requirements for collecting, processing, reviewing and exporting.
It is outside the scope of this document to provide a design and sizing introduction to eDiscovery Platform, but in general terms, once the customer requirements are understood, a close look at the function of each eDiscovery Platform server will help determine what minimal server resources will be required.
The most common mistake when designing eDiscovery Platform Vault is to size for capacity, as opposed to sizing for performance. The following sections in this guide will provide detail on how to design the various components for optimal configuration.
** MINIMUM REQUIREMENTS **
FUNCTION | CPU | RAM |
---|---|---|
Legal Hold Confirmation Server | 16 | 32GB |
Legal Hold Server | 16 | 32GB |
Collections Server | 32 | 64GB |
Collection, Legal Hold and Confirmation Server | 32 | 64GB |
Pre-Processing, Analysis, Review and Export Server | 24 | 96GB* |
All features combined eDP Server | 32 | 128GB* |
FUNCTION | CPU | RAM |
---|---|---|
Cluster Master with MySQL on a separate server | 32 | 128GB* |
Cluster Master with mySQL | 48 | 128GB* |
Worker Nodes | 24 | 96GB* |
Utility Nodes | 4 | 8GB |
* Indicates RAM is Reserved and Locked
Special Considerations:
- Do not combine servers with Reserved and Locked memory with non-Reserved/Locked resource, servers.
- Ensure that the total number of vCPUs assigned to the virtual machines is equal or less than the total number of cores on theESX host
- Do not enable Hyperthreading –in most cases this provides little or no benefit to multi-CPU virtual machines, internal testing have shown that Hyperthreading provides no performance benefit
- All other hardware/software recommendations, follow the Veritas Installation Guide for the correct version of the product.