Search <book_title>...

Important Update: Cohesity Products Documentation

All Cohesity product documentation are now managed via the Cohesity Docs Portal: https://docs.cohesity.com/HomePage/Content/home.htm. Some documentation available here may not reflect the latest information or may no longer be accessible.

Enterprise Vault™ Classification using the Microsoft File Classification Infrastructure

Last Published: 2018-03-29

Product(s): Enterprise Vault (12.3)

Limits on the size of classification files

By default, the File Classification Infrastructure can classify files that are up to 25 MB in size. When a text file exceeds this limit, Enterprise Vault automatically splits it into files that are approximately 25 MB in size, and classification then proceeds across the set of files. To determine where to split the files, Enterprise Vault operates as follows:

If any single line in a text file causes the file to exceed the limit, Enterprise Vault places the line in a new text file. For example, the cont property line holds the content of an item and is usually the lengthiest line in the text file. In cases where this line and its predecessors exceed the limit, Enterprise Vault splits the file immediately before the line and creates a new file for the cont property.
If the contents of a single line still exceed the limit, Enterprise Vault searches back from the limit until it finds a space character, and then splits the contents there. If Enterprise Vault cannot find a space character within 300 characters, it splits the file precisely at the limit.

You can change the 25-MB limit by setting a registry entry, MaxTextFilterBytes. The following article on the Microsoft website describes this registry entry:

https://msdn.microsoft.com/library/ms692103.aspx

You may want to increase the limit if you have a complex rule that fails to match items because different parts of it match different files in the set. For example, this issue can arise if you have a rule that searches for both of the words fraud and corruption, when the first word is in one text file and the second word is in another.