Enterprise Vault™ Classification using the Veritas Information Classifier

Last Published:
Product(s): Enterprise Vault (12.3)
  1. About this guide
    1. Introducing this guide
      1.  
        Relationship between the Veritas Information Classifier and other classification methods
    2.  
      What's in this guide
    3. Where to get more information about Enterprise Vault
      1.  
        Enterprise Vault training modules
  2. Preparing Enterprise Vault for classification
    1.  
      About the preparatory steps
    2.  
      What you need
    3.  
      Checking the cache location on the Enterprise Vault storage servers
    4.  
      Setting up the Data Access account
    5.  
      Enabling the Veritas Information Classifier on all Enterprise Vault servers
    6.  
      Configuring the Veritas Information Classifier for secure client connections
  3. Setting up Veritas Information Classifier policies
    1.  
      Introducing the Veritas Information Classifier
    2.  
      Opening the Veritas Information Classifier
    3.  
      Finding your way around
    4.  
      Analyzing sample content for policy matches
    5. About policies
      1.  
        Creating or editing policies
      2.  
        About policy conditions
      3.  
        Enabling or disabling policies
      4.  
        Exporting or importing policies
      5.  
        Resetting policies
      6.  
        Deleting policies
    6. About patterns
      1.  
        Creating or editing patterns
      2.  
        Exporting or importing patterns
      3.  
        Deleting patterns
    7. About tags
      1.  
        Creating or editing tags
      2.  
        Exporting or importing tags
      3.  
        About the Enterprise Vault index properties
      4.  
        How classification property values and retention categories interact
      5.  
        Points to note on setting retention categories
      6.  
        Deleting tags
  4. Defining and applying Enterprise Vault classification policies
    1.  
      About Enterprise Vault classification policies
    2. Defining classification policies
      1.  
        Configuring classification policies to assign retention categories with the shortest duration
    3.  
      About the PowerShell cmdlets for working with classification policies
    4.  
      Associating classification policies with retention plans
    5.  
      About the PowerShell cmdlets for working with retention plans
    6.  
      Applying retention plans to your Enterprise Vault archives
  5. Running classification in test mode
    1.  
      About classification test mode
    2.  
      Implementing classification test mode
    3.  
      About the PowerShell cmdlets for running classification in test mode
    4.  
      Understanding the classification test mode reports
  6. Using classification with smart partitions
    1.  
      About smart partitions
    2.  
      How Enterprise Vault determines whether to archive an item to a smart partition
    3.  
      Setting up smart partitions
    4.  
      Verifying that Enterprise Vault has archived items to smart partitions
  7. Appendix A. Enterprise Vault properties for use in custom field searches
    1.  
      About the Enterprise Vault properties
    2.  
      System properties
    3.  
      Attachment properties
    4.  
      Custom Enterprise Vault properties
    5.  
      Custom Enterprise Vault properties for File System Archiving items
    6.  
      Custom Enterprise Vault properties for SharePoint items
    7.  
      Custom Enterprise Vault properties for Compliance Accelerator-processed items
    8.  
      Custom properties for use by policy management software
    9.  
      Custom properties for Enterprise Vault SMTP Archiving
  8. Appendix B. PowerShell cmdlets for use with classification
    1.  
      About the classification cmdlets
    2.  
      Disable-EVClassification
    3.  
      Get-EVClassificationPolicy
    4.  
      Get-EVClassificationStatus
    5.  
      Get-EVClassificationTestMode
    6.  
      Get-EVClassificationVICTags
    7.  
      Initialize-EVClassificationVIC
    8.  
      New-EVClassificationPolicy
    9.  
      Remove-EVClassificationPolicy
    10.  
      Set-EVClassificationPolicy
    11.  
      Set-EVClassificationTestMode
  9. Appendix C. Classification cache folder
    1.  
      How Enterprise Vault caches the items that it submits for classification
    2.  
      Limits on the size of classification files
    3.  
      Configuring Enterprise Vault to keep the classification files in the cache folder
  10. Appendix D. Migrating from FCI classification to the Veritas Information Classifier
    1.  
      Converting FCI classification rules for use with the Veritas Information Classifier
  11. Appendix E. Monitoring and troubleshooting
    1.  
      Auditing
    2.  
      Checking the classification performance counters
    3.  
      Troubleshooting classification
    4.  
      Searching archives for items that the Veritas Information Classifier has classified

About policy conditions

A condition specifies the criteria that an item must meet for the Veritas Information Classifier to consider it a match. Your policies can contain any number of conditions.

Basic components of a condition

All conditions have this basic form:

property operator value

For example, in the following condition, "Content" is the property, "contains text" is the operator, and "Stocks" is the value:

Example of a Veritas Information Classifier condition

The property specifies the part or characteristic of an item that you want to evaluate: its content, title, modified date, file size, and so on. When you choose a property from the list, the options in the two other fields change to suit it. For example, if you choose the "Modified date" property, the other fields provide options with which you can set one or more dates. For properties such as "Content", "Title", and "Author", the available operators are as follows:

  • contains text

  • matches regex

  • matches pattern

  • language is

At the right of each condition, you can specify the minimum number of times that an item must meet the criteria for the Veritas Information Classifier to consider it a match.

Custom fields

Various applications that you use in your organization may add custom property information to the items that you want to classify. For example, when Enterprise Vault processes an item, it populates a number of the item's metadata properties with information and stores this information with the archived item: the date on which Enterprise Vault archived the item, the number of attachments that it has, and so on.

If you know the name of a property that particularly interests you, you can enter it as a custom field in your policy conditions.

Custom fields in policy conditions

See About the Enterprise Vault properties.

Text matches

Observe the following guidelines when you set up a condition to look for specific words or phrases in the items that you submit for classification:

  • The condition can look for multiple words or phrases, if you place each one on a line of its own. An item needs to contain just one word or phrase in the list to meet the condition.

  • Select Match Case to find only exact matches for the uppercase and lowercase characters in the specified words or phrases.

  • Select String Match to find instances where the specified words or phrases are contained within other ones. For example, if you select this option, the word enter matches enters, entertainment and carpenter. If you clear the option, enter matches only enter.

    Similarly, if you select String Match, the phrase call me matches call media and recall meeting, but not surgically mend.

  • You can place the proximity operators NEAR and BEFORE between two words in the same line. For example, tax NEAR/10 reform matches instances where there are no more than ten words between tax and reform. sales BEFORE/5 report matches instances where sales precedes report and there are no more than five words between them. The number is mandatory in both cases.

    Note:

    These proximity operators may not work as expected when evaluating formatted data, such as tables and spreadsheets. The conversion process that this data undergoes before it is classified can swap the order of the table cells. For example, suppose that a spreadsheet contains the word sales in one cell and report in the cell immediately to the right. This should match the operator sales BEFORE/5 report but may not do so after the spreadsheet has been converted, because the conversion process has transposed the two words.

  • Word and phrases can include the asterisk (*) and question mark (?) wildcard characters. As part of a word, an asterisk matches zero or more characters. On its own, the asterisk matches exactly one word. A question mark matches exactly one character. For example:

    • stock* matches stock, stocks, and stockings.

    • *ock matches stock and clock.

    • *ock* matches stock and clocks.

    • ??ock matches stock and clock, but not dock.

    • sell * stock matches sell the stock and sell some stock, but not sell stock.

    You can use wildcards in combination with the NEAR and BEFORE operators. For example:

    • s?l? BEFORE/1 stock* matches sold the stock, sell stocks, and sale of stockings.

Regular expression matches

A regular expression, or regex for short, is a pattern of text that consists of ordinary characters (for example, letters a through z) and special characters, called metacharacters. The pattern describes one or more strings to match when searching text. For example, the following regular expression matches the sequence of digits in all Visa card numbers:

\b4[0-9]{12}(?:[0-9]{3})?\b

Your regular expressions must conform to the Perl regular expression syntax.

See the online Help for the Veritas Information Classifier for extensive information on this syntax.

You may find it helpful to build and test your regular expressions using the free online tool at https://regex101.com. This tool displays an explanation of your regular expression as you type it, and also lists all matches between the regular expression and a test string of your choice. The default regular expression flavor, pcre (php), is compatible with the Veritas Information Classifier.

Note:

Looking for regular expression matches is considerably slower than looking for matches for specific words or phrases. You can greatly improve performance and accuracy by looking for instances where both types of matches occur in proximity to each other. To do this, set up an All of condition group that contains both a regular expression condition and a contains text condition for finding specific words and phrases, and specify the required distance within which matches must occur. The Veritas Information Classifier first evaluates the contains text condition and only then looks for a regular expression match.

Pattern matches

A pattern match evaluates the selected item property against an existing Veritas Information Classifier pattern. Depending on the selected pattern, you may be able to set the confidence levels that you are willing to accept. A high confidence level is likely to produce fewer but more relevant matches.

Note the following if you do not get the expected results when you test a policy that makes use of a built-in pattern:

  • It is important to check that your test item meets the pattern confidence levels. For example, by default, the Credit Card Policy looks for content that matches the pattern "Credit/Debit Card Number" with medium to very high confidence. To meet the requirements of the medium confidence level, an item must contain either of the following:

    • A delimited credit card number (one that contains spaces or dashes between the numbers).

    • Both a non-delimited credit card number and one or more credit card keywords, such as "AMEX" or "Visa".

    So, an item does not meet these requirements if it contains a non-delimited credit card number but it does not also contain credit card keywords.

  • After you click Show details to view the results of a test, the Test classification results window may fail to highlight some or all of the matches. This is a known issue with certain patterns only. A future version of the Veritas Information Classifier will correct the issue.

Language matches

You can set up a condition to restrict policy matching to items in a particular language. For example, set the condition like the one below to find items whose content is primarily in French:

Language search in Veritas Information Classifier

One of the options in the language list is Multiple languages detected. This option matches items that contain at least two languages.

To safeguard against the Veritas Information Classifier ignoring items because it cannot determine their primary language, select Or Primary Language Unknown. The most common reason why the Veritas Information Classifier may be unable to determine an item's primary language is that the item has a very small amount of content.

Condition groups

You can group a set of conditions and nest grouped conditions within other grouped conditions. The group operator that you choose determines whether an item must meet all, some, or none of the conditions in the group to be considered a match. The following group operators are available:

  • All of. An item must meet all the specified conditions.

  • Any of. An item must meet at least one of the specified conditions.

  • None of. An item must not meet any of the specified conditions.

    Note:

    You can nest a None of group within an All of group to look for certain condition matches while also excluding others. For example, to achieve the effect of "(condition X AND condition Y) BUT NOT condition Z", you would include the X and Y conditions in an All of group and the Z condition in a nested None of group.

  • n or more of. An item must meet the specified number of conditions.

For an All of group only, you can choose to look for instances where the conditions occur within a specified number of characters of each other. For example, the following condition group looks for instances where the word Goodbye appears within 20 characters of the word Hello:

Proximity search in Veritas Information Classifier

The text string "You say Goodbye and I say Hello" matches these conditions because there are fewer than 20 characters between the first character of Hello and the first character of Goodbye. Similarly, the string "You say Hello and I say Goodbye" also matches because there are fewer than 20 characters between the ends of the two words. In each case, the spaces count as characters.

Note:

When you conduct within nn characters proximity searches, take care not to duplicate the same search terms across multiple conditions. For example, suppose that you define one condition to look for the names Fred, Sue, and Bob, and a second to look for Joe, Bob, and Sarah. An item that contains a single instance of Bob would match these conditions.

Rather than choose the from the first condition option, you can choose in a sliding window. This option looks for instances where the conditions occur within any sequence of characters of the specified number. For example, a condition group that looks for instances where the word Goodbye appears within a 20-character sliding window of the word Hello does not match "You say Goodbye and I say Hello". There are 23 characters between the start of the word Goodbye and the end of the word Hello.

Sliding window example