Important Update: Cohesity Products Documentation


All Cohesity product documentation are now managed via the Cohesity Docs Portal: https://docs.cohesity.com/HomePage/Content/home.htm. Some documentation available here may not reflect the latest information or may no longer be accessible.

Arctera™ Insight Classification Help

Last Published:
Product(s): Arctera Insight Classification (Version Not Specified)

About policy conditions

A condition specifies the criteria that an item must meet for the Arctera Insight Classification to consider it a match. Your policies can contain any number of conditions.

Basic components of a condition

All conditions have this basic form:

property operator value

For example, in the following condition, "Content" is the property, "contains text" is the operator, and "Stocks" is the value:

The property specifies the part or characteristic of an item that you want to evaluate: its content, title, modified date, file size, and so on. When you choose a property from the list, the options in the two other fields change to suit it. For example, if you choose the "Modified date" property, the other fields provide options with which you can set one or more dates. For properties such as "Content" the available operators are as follows:

  • contains text

  • matches regex

  • department based policy conditions

  • matches pattern

  • is similar to

  • contains exact data match in

  • language is

  • contains entity

  • sentiment score

At the right of each condition, you can specify the minimum number of times that an item must meet the criteria for the Arctera Insight Classification to consider it a match.

Custom fields

Various applications that you use in your organization may add custom property information to the items that you want to classify. For example, when Enterprise Vault processes an item, it populates a number of the item's metadata properties with information and stores this information with the archived item: the date on which Enterprise Vault archived the item, the number of attachments that it has, and so on.

If you know the name of a property that particularly interests you, you can enter it as a custom field in your policy conditions.

Create new property by using custom property fields

While creating a policy if a required property is not available in the property list, you can create a new property by using custom property fields.

To create a new property, use custom property fields while creating or editing a policy as follows:

  1. Set the other fields as per steps given in the topic See Creating policies.
  2. Under Conditions section, from the Property drop-down list, select a required custom property field: Custom date field, Custom number field, or Custom string field.
  3. Specify the name for the new custom property.

    Note:

    Custom property name must be same as the metadata property name as identified by text extraction engine, for example Apache TIKA. In case of Arctera Enterprise Vault, custom property name must match with one of the indexing properties.

  4. Complete the rest of the steps to create a policy.

    The new policy is created with a new custom property.

Note:

You can add up to 10 group condition levels while creating a new policy.

Create a new property by using YAML configuration file

Use the Arctera Insight Classification's YAML file to add a custom property under the property list on the UI.

The metadataDefinitions section of YAML file lists all the existing properties in the property list as follows:

The following table shows the data structure for an existing property:

Property Item

Description

name

Specifies the metadata property recognized by the text extractor engine like Apache TIKA.

In case of Arctera Enterprise Vault, specify the indexing properties captured.

displayName

Name of the property as displayed in the property list on the UI, for example "Title".

type

Associated property type, for example String, Datetime, or Number.

aliases

Specifies the additional metadata properties to be mapped to displayName.

To make this property available in UI under policy condition page

  1. Add the new property details as shown in the previous table to the metadataDefinitions section in YAML.
  2. Restart the Arctera Insight Classification service of respective application.

Note:

For updating tenant level settings, reach out to Arctera ops team

Text matches

Observe the following guidelines when you set up a condition to look for specific words or phrases in the items that you submit for classification:

  • The condition can look for multiple words or phrases, if you place each one on a line of its own. An item needs to contain just one word or phrase in the list to meet the condition.

  • Select Match Case to find only exact matches for the uppercase and lowercase characters in the specified words or phrases.

  • Select String Match to find instances where the specified words or phrases are contained within other ones. For example, if you select this option, the word enter matches enters, entertainment and carpenter. If you clear the option, enter matches only enter.

    Similarly, if you select String Match, the phrase call me matches call media and recall meeting, but not surgically mend.

    Leading and trailing spaces are now considered as part of string match conditions in both policies and patterns. It means if there is a space before first character or after the last character of the policy condition, it will still be considered as part of the evaluation criteria.

    For example, if you want to generate a match for the term so sorry, good hits will be we are so sorry about this or I am so sorry this happened to you but without leading and trailing space there is a possibility to generate false positive like alfonso sorry about that - no biggie!. Inclusion of leading and trailing spaces will help in avoiding such false positives and you can make use of a space character before and after to classify only so sorry.

  • You can place the proximity operators NEAR and BEFORE between two words in the same line. For example, tax NEAR/10 reform matches instances where there are no more than ten words between tax and reform. sales BEFORE/5 report matches instances where sales precedes report and there are no more than five words between them. The number is mandatory in both cases.

    Note:

    These proximity operators may not work as expected when evaluating formatted data, such as tables and spreadsheets. The conversion process that this data undergoes before it is classified can swap the order of the table cells. For example, suppose that a spreadsheet contains the word sales in one cell and report in the cell immediately to the right. This should match the operator sales BEFORE/5 report but may not do so after the spreadsheet has been converted, because the conversion process has transposed the two words.

  • Word and phrases can include the asterisk (*) and question mark (?) wildcard characters. As part of a word, an asterisk matches zero or more characters. On its own, the asterisk matches exactly one word. A question mark matches exactly one character. For example:

    • stock* matches stock, stocks, and stockings.

    • *ock matches stock and clock.

    • *ock* matches stock and clocks.

    • ??ock matches stock and clock, but not dock.

    • sell * stock matches sell the stock and sell some stock, but not sell stock.

    You can use wildcards in combination with the NEAR and BEFORE operators. For example:

    • s?l? BEFORE/1 stock* matches sold the stock, sell stocks, and sale of stockings.

  • Select Exclude Match if you want to exclude specific words or phrases while evaluating the policy condition criteria.

    When you select this option, along with the inclusion terms, you can also define the terms that you want to exclude from the matching criteria.

    For example, assume that a document contains a sample text "Admin: There is a spoofing activity detected. Bob: Can you help me locate a spoofed account for spoofed email account?". You want to hit on the terms "spoof, spoofed, spoofing" only, and want to avoid or exclude the terms spoofed email account, an email spoof. In such a scenario, you can provide the keywords "spoof, spoofed, spoofing" in the inclusion terms field, and provide the terms "spoofed email account, an email spoof" in the exclusion terms field as shown in the sample image below.

    Keyword explorer

    In case the list of keyword is long, you can manage it by clicking Expand at the bottom of the list.

    A new pop-up with list of all keywords appears on the screen.

    You can search and sort this list by using the search box or clicking Sort. You can also export the list in CSV format by clicking Export.

    If you want to add or import list of keywords, click Edit.

    • For built-in policy, click Edit to

      • enable/disable status

      • change tags, risk weight, confidence levels

      • adjust minimum match counts

    • For custom policy, click Edit to

      • enable/disable status

      • change policy name and description, tag, risk, and weight

      • update policy conditions

    • Click Copy to create a fully editable version of the existing built-in or custom policy.

    In the right pane, click Edit Conditions.

    Click Expand in the Conditions section.

    On the Keyword Condition pop-up, you can perform following actions.

    • Add new keyword

    • Search a keyword in the existing list.

    • Import keywords in CSV format.

    • Export keyword list in CSV format.

    Note:

    Keyword Condition can contain maximum 11,000 keywords. For example, if you already have 2000 keywords in the condition, you can add 9,000 more keywords only.

    Using Special Standalone Characters in Keyword-Based Policies

    When creating keyword-based conditions in policies, it is important to understand how the application handles special standalone characters like @, #, or $.

    • Special standalone characters are not allowed by default in keyword conditions.

    • If such characters are entered, the system will display a warning message.

    • The policy can still be saved, but any standalone special characters will be automatically removed in the back-end, unless the String Match option is explicitly selected.

    To Retain Special Characters

    If you want to retain standalone special characters in your keyword conditions, you must select the String Match option. This setting ensures that the characters are preserved exactly as entered and bypasses the automatic cleanup process.

    Scenarios and Behavior

    Scenarios

    Behavior

    Policy with keyword condition containing special characters

    Warning message is shown. Special characters will be removed unless String Match is selected.

    Policy with keyword condition and String Match selected

    Warning message is not shown. Special characters are retained.

    Limitations of the exclusion policy condition

    • This field allows only keywords based exclusion. It means the input field for exclusion terms will only accept keywords, and not regular expression, pattern, and so on.

    • Keyword based exclusion works only for the scenario where every inclusion term is completely contained within an exclusion term.

    • The group-level condition proximity option will not be available for a group if any of the underlying conditions has exclusions.

    To use the exclusion policy condition, See Using a keywords-based exclusion policy condition.

Emoji Support

Classification of emoji based policy conditions is supported. A policy can be created which has a condition containing emojis and the content can be classified.

Variable support

Variable Support allows you to insert dynamic values into text conditions across policies and patterns using simple placeholders such as {{variable_name}}. Instead of repeatedly typing the same keywords, you can define it once and reuse it anywhere in your text conditions. A variable is a type of pattern that must be created first and then used in text conditions of policies or patterns. Predefined variables can be inserted directly in the text-based conditions by typing {{ in the text box.

See Creating or editing patterns.

Limitations
  • A variable can have up to 100 newline-separated values.

  • In a text-type condition, a single phrase can include up to two variables, including duplicates.

  • In a policy's text-based condition, if variables are used, the exclude match feature cannot be applied.

  • Variables are not permitted in the exclude match text box.

  • If a variable is misspelled, not defined, or not used with the correct syntax (for example, {{variable_name}}), it is treated as plain text during classification and matched accordingly.

Regular expression matches

A regular expression, or regex for short, is a pattern of text that consists of ordinary characters (for example, letters a through z) and special characters, called metacharacters. The pattern describes one or more strings to match when searching text. For example, the following regular expression matches the sequence of digits in all Visa card numbers:

\b4[0-9]{12}(?:[0-9]{3})?\b

Your regular expressions must conform to the Perl regular expression syntax.

You may find it helpful to build and test your regular expressions using the free online tool at https://regex101.com. This tool displays an explanation of your regular expression as you type it, and also lists all matches between the regular expression and a test string of your choice. The default regular expression flavor, pcre (php), is compatible with the Arctera Insight Classification.

Note:

Looking for regular expression matches is considerably slower than looking for matches for specific words or phrases. You can greatly improve performance and accuracy by looking for instances where both types of matches occur in proximity to each other. To do this, set up an All of condition group that contains both a regular expression condition and a contains text condition for finding specific words and phrases, and specify the required distance within which matches must occur. The Arctera Insight Classification first evaluates the contains text condition and only then looks for a regular expression match.

Author and Recipient department based policy condition

With this feature, you will be able to create policy conditions based on departments configured in Arctera Advanced Supervision. This will enable organizations to create policies that apply only to content sent or received by monitored employees from specific departments. The departments get updated over time, as people leave or join a department, the policy will automatically adjust itself. This new capability avoids the need to create multiple policies and applying them broadly. So, with this level of granularity one can configure multiple similar policies that might be slightly different for each department such as a Market Abuse Policy for departments in AMS and a Market Abuse Policy for departments in EMEA.

You can enable Author and Recipient department based policy condition by using departmentApiEnabled parameter in yaml.

To use Author and Recipient department based policy condition,

  1. Navigate to Policies and click New at the bottom of the page.
  2. In the condition field, select Author Department or Recipient Department.
  3. is any of option is selected by default.
  4. The condition value field next to it will display all the selected departments. Note that this is a read-only field.
  5. Click Select Departments and a pop-up with the list of department will appear on screen. Check or uncheck the box next to the department to add or remove any department.
  6. You can also search for the departments by using the search box at the top.

    Note:

    The searched departments will be listed along with their parents, even if the parent is not a match for the search. Selection of deleted and closed departments is not permitted. If policy condition contains departments which are now deleted or closed, those can only be deselected during policy edit.

  7. If the Use inheritance toggle on the same screen is turned on, all the child departments under the department will be selected automatically. By default, this toggle will be Off.
  8. Select all departments checkbox can be used to select or deselect all the departments with a single click. This selection will not impact closed and deleted departments marked in red .
  9. If you want to automatically include all newly added child departments under a specific department (for example, Department A) in a policy condition, you must first select that department and then enable the Auto-update toggle on the right for it. To enable auto-update for all departments at once, you can use the Auto-update all checkbox; however, this option is available only when all departments are selected. Additionally, if you want to automatically include any new departments that are added at the root level, you need to turn on the Auto-update root department toggle.

    Note:

    • The auto-update department syncing activity runs once every 24 hours at 1 AM.

    • During this process, all newly added child departments of a parent department (for which auto-update has been enabled) are automatically included in the policy condition. However, existing child department that were not selected before enabling auto-update are not included.

    • If a complete hierarchy is added as a child department, only the root-level department is included in the policy condition, and not the entire nested hierarchy.

    • Auto-update for a department can only be enabled when that department is selected.

    • The Auto-update root department option is not available if the departments are filtered using the search function.

  10. All selected departments can be viewed or searched by clicking the number at the bottom. All selected departments can also be deleted by clicking X next to the number.

Currently, only 100 parent departments are displayed on one page, irrespective of the number of their child departments. If there are more than 100 parent departments, you can navigate across multiple pages one by one.

Limitations of the department based policy conditions
  • After removing multiple items from pinned department panel rapidly and closing it, reopening that panel might take a while.

Pattern matches

A pattern match evaluates the selected item property against an existing Arctera Insight Classification pattern. Depending on the selected pattern, you may be able to set the confidence levels that you are willing to accept. A high confidence level is likely to produce fewer but more relevant matches.

Note the following if you do not get the expected results when you test a policy that makes use of a built-in pattern:

  • It is important to check that your test item meets the pattern confidence levels. For example, by default, the Credit Card Policy looks for content that matches the pattern "Credit/Debit Card Number" with medium to very high confidence. To meet the requirements of the medium confidence level, an item must contain either of the following:

    • A delimited credit card number (one that contains spaces or dashes between the numbers).

    • Both a non-delimited credit card number and one or more credit card keywords, such as "AMEX" or "Visa".

    So, an item does not meet these requirements if it contains a non-delimited credit card number but it does not also contain credit card keywords.

  • After you click Show details to view the results of a test, the Test classification results window may fail to highlight some or all of the matches. This is a known issue with certain patterns only. A future version of the Arctera Insight Classification will correct the issue.

Exact Data Matches

Unlike most classification techniques that rely on pattern matching to identify sensitive data, Exact Data Match (EDM) triggers a classification response when the actual data that needs to be protected is detected. By matching on the exact data, this reduces the rate of false positives and allows for much higher levels of accuracy in automatic classification. EDM uses a fingerprint method whereby an extract of a database or table is provided as source file in either CSV or TXT format. The table is ingested, and rules are created that indicate a match when one or more columns of a single row are detected in proximity. EDM is ideal when the identification of discrete customer data, employee data, and any other sensitive data repository maintained within a table is required.

To classify information using Exact Data Match

  • Create an EDM pattern by setting the configuration options and providing the source document (typically containing the desired fields exported from a data store, such as a database). See “To create an Exact Data Match based pattern”.

  • Use the resulting EDM pattern in any policy to be used for EDM based classification.

Exact Data Match can be enabled or disabled using YAML.

The Exact Data Match feature allows you to detect the specific data sets from a database. For example, employee records. You can match one or more fields and optional fields as per the configured proximity value. It supports large data sets (like database records) and text in all languages and provides data protection by hashing the stored data. The main benefit of using Exact Data Match is to reduce false positives by matching data exactly (unlike pattern-based matching).

For example, if you have the following content in the document to classify:

Name: Teresa M. Brown

Employee ID: 624828

and you are trying to match against the following EDM source document,

Then this will trigger a match.

Exact Data Match provides following benefits:

  • Provides the ability to detect specific data sets from a database. For example, employee records.

  • Supports matching of combinations of data. For example, matching one or more fields and optional fields as per configured proximity value.

  • Supports large data sets like database records.

  • Provides data protection by hashing of stored data.

  • Automatically synchronizes the exact data match rule pack (which is required for classification) on the remote classification servers. Manual intervention is not required.

  • Supports file encryption for the exact data match rule pack files with the mechanism similar to tenant-specific patterns, policies, and tags.

  • Supports the Min/Max disposition for exact data match type policy conditions while configuring policies.

  • Supports text in all languages.

To create a policy using an Exact Data Match pattern

  1. Follow the initial steps for creating or editing a policy as described earlier.
  2. In the operator list box, select contains exact data match in and then select the required exact data match pattern from the value list box next to it.

    Note:

    Under Conditions, the Min/Max disposition support is added for exact data match type policy conditions while configuring policies. You can specify the exact or more counts for keywords match. When you select the value more than 1, the Exclude repeats check box appears. If you select this check box, matches that are different from each other.

    For example, a credit card condition with a minimum count of two requires two different credit cards in a single document.

  3. Click Save.

When you test a document against a EDM based policy, Arctera Insight Classification shows the result. Also, the first column of the matching row is highlighted.

Example 1:

If source document content is as follows,

with Exact Data Matching Options as follows:

Name

Value

First row contains column headers

Yes

Column delimiter

,

Perform hashing to secure data fields

No

Use case-sensitive matching

No

Proximity for matches

200

Minimum columns to match

2

All columns

Not selected

And if test document content is as follows:

The classification result will show a match for two records Stuart, and James.

Example 2:

For same source document and test document as stated in earlier example, if Minimum Columns value is set to 3 as follows:

Name

Value

First row contains column headers

Yes

Column delimiter

,

Perform hashing to secure data fields

No

Use case-sensitive matching

No

Proximity for matches

200

Minimum columns to match

3

All columns

Not selected

The classification result will show a match for single record, that is Stuart. Because all 3 fields from first record is present in test document.

Example 3:

For same source document and test document as stated in first example, if proximity value is set to 50 as follows:

Name

Value

First row contains column headers

Yes

Column delimiter

,

Perform hashing to secure data fields

No

Use case-sensitive matching

No

Proximity for matches

50

Minimum columns to match

3

All columns

Not selected

In this case, required words are not within proximity of 50 characters. Therefore the result will show no match.

Classification performance for Exact Data Match based policy depends on following factors.

  • Number of records to be matched

  • Number of fields and field size

  • Data being classified

  • Number of matches

  • Proximity and column matches found

  • Compute hardware and available resources

Language matches

You can set up a condition to restrict policy matching to items in a particular language. For example, set the condition like the one below to find items whose content is primarily in French:

One of the options in the language list is Multiple languages detected. This option matches items that contain at least two languages.

To safeguard against the Arctera Insight Classification ignoring items because it cannot determine their primary language, select Or Primary Language Unknown. The most common reason why the Arctera Insight Classification may be unable to determine an item's primary language is that the item has a very small amount of content.

Language Proportion Detection

Arctera Classification supports detailed language analysis by identifying and reporting the proportion of each language found within a document. When a document contains content written in multiple languages, the system calculates and displays the percentage of text in each language. For example, a document may be identified as containing 63 percent English and 33 percent German. This capability is helpful for organizations working with multilingual content.

Choosing a Content Language Detection Engine

You have the flexibility to select between two content language detection engines, depending on your specific use case.

  • Apache Tika Engine: The default option, which has comparatively lower speed and accuracy than FastText.

  • FastText Engine: A newly introduced engine that offers faster and more accurate detection for both single-language and multilingual text.

To configure your preferred engine, navigate to the system settings and select the engine that best aligns with your performance or compatibility requirements.

Entity matches

You can set up a condition to restrict policy matching to content that includes a person name or location.

Note:

The "contains entity" condition will only be available if nlp-service.jar is used while running the Arctera Insight Classification application. Also, Named Entity Recognition (NER) is available only for English.

For example, set the condition like the one below to find content including the person name.

Note:

Named Entity Recognition (NER) consumes more time and resources compared to normal classification. NER is not suitable for large documents, especially documents bigger than 10 MB.

Sentiment score

You can set up a condition to perform sentiment analysis at item level and determine whether the sentiment associated with the item is positive, negative, or neutral based on the sentiment analysis score:

  • score = 0 or score < 50 is negative

  • score = 50 is neutral

  • score = 100 or score > 50 is positive

Depending on how you want to interpret data, you can define and tailor policies with sentiment conditions to meet your sentiment analysis needs. Users can input an items of their choice and gauge the underlying sentiment based on the sentiment analysis score.

Sentiment analysis processing depends on the conditions set for sentiment score. If the set condition is met, then only the policy related tag is returned. The following conditions are available for sentiment score:

  • is - Sentiment analysis is performed, and the policy tag will be returned only if the sentiment score matches this number

  • is at most - Sentiment analysis is performed, and the policy tag will be returned only if the sentiment score is less than or equals this number

  • is at least - Sentiment analysis is performed, and the policy tag will be returned only if the sentiment score is equal to or higher than this number

  • is in range - Sentiment analysis is performed, and the policy tag will be returned only if the sentiment score falls in this range

For example, you set the condition as sentiment score is at most 70, then sentiment analysis will be performed, and the policy tag will be returned only if the score is 70 or less than 70.

Note:

If sentiment analysis fails, the classification process will continue without evaluating the sentiment analysis conditions within the policies. As a result, hits and matches based on the sentiment condition will be affected.

Format Type

You can set up a condition using format type.

To create a policy using Format Type condition,

  1. Click Policy in the left pane
  2. Click New and enter the mandatory details.
  3. From the content drop-down, select Format Type
  4. Select the options from the available list.
  5. Click Save

Limitations of the Format Type policy condition

  • Export is not allowed with format type policies.

  • If multiple policies are selected for export and any policy is of format type, then all policies except the format type policy will be exported, depending on their types.

  • After a format type policy is created, if the feature is disabled, classification for format type policies will not be allowed.

Risk score and risk level information on classification

Risk score and risk level for each classified item is sent to the consuming applications. Consuming applications can analyze this information and support features such as sort, filter, search, and report on items by risk score and/or risk level. By understanding the level of risk, you can optimize efforts on data management, review, and control. You can prioritize activities and resources on items of highest risk.

The risk score and risk level are based on the number of pattern or policy condition hits. Items with more hits are categorized as high risk. Items with fewer hits are categorized as low risk.

Configuring risk level settings through YAML file

In the YAML file, the previously used lowerRiskRuleNameParts parameter is deprecated, and the three new parameters - lowRiskUpperLimit, mediumRiskUpperLimit, and highRiskUpperLimit - are added. These parameters provide control over different risk level definitions based on the risk score value. This configuration defines the upper limit of the risk score range for the low, medium, and high-risk levels.

  • lowRiskUpperLimit - It can be zero or greater than zero. By default, it is set to 2.

  • mediumRiskUpperLimit -It can be any non-zero positive integer. But must be greater than lowRiskUpperLimit value. By default, it is set to 5.

  • highRiskUpperLimit -It can be any non-zero positive integer. But must be greater than mediumRiskUpperLimit value. By default, it is set to 10.

This setting defines the upper limit of risk score range for low, medium, and high risk levels.

Condition

Risk Level

Risk score > highRiskUpperLimit

Very high

highRiskUpperLimit >= Risk score > mediumRiskUpperLimit

High

mediumRiskUpperLimit >= Risk score > lowRiskUpperLimit

Medium

lowRiskUpperLimit >= Risk score >= 1

Low

Risk score = 0

No risk

Risk score and risk level information in classify API response

The risk information is sent as part of classify response only if following conditions are met:

  • The matchDetailLevel is configured in classify request as either LOW/MEDIUM/HIGH

  • The item must have some risk based on the risk score and the risk level limits settings in the YAML file.

Risk score calculation

The risk computation of potentially sensitive content is based on the degree of hits against patterns or policy conditions and policy risk weight.

Note:

By default, the Risk weight value of all the custom policies and most of the built-in policies is configured as 1. For Subscription policy and all the Language detection policies risk weight is set to 0 by default.

Consider the following example. A document has the following classification result.

Policies

Classification result

Policy name

Pattern name (match count)

Risk weight

Policy-1

Pattern-A (2), Pattern-B (3)

2

Policy-2

Pattern-B (2), Pattern-C (1)

0

Policy-3

Pattern-C (3), Pattern D (5)

1

Policy-4

Pattern-C (1), Pattern E (1)

5

Unique policy match table

The following table describes a sample scenario for risk score calculation.

Pattern name

Match count

From policy

Risk weight

Pattern-A

2

Policy-1

2

Pattern-B

3

Policy-1

2

Pattern-C

3

Policy-3

1

Pattern-D

5

Policy-3

1

Pattern-E

1

Policy-4

5

To calculate risk score, the policy with the highest risk weight is considered in case the pattern hitting on item is present in multiple policies as shown in the unique policy match table.

The risk score is a sum of the products of match count and the policy risk weight. The following susing an Exact Data Match patternteps explain the step-level actions for risk score calculation.

Step1: Multiply match count with risk level.

Step2: Repeat step1 for all the rows in the unique rule match table.

Step3: Add the results of step2.

Risk Score = 2*2 + 3*2 + 3*1 + 5*1 + 1*5

= 4 + 6 + 3 + 5 + 5

= 23

Risk levels

Risk is categorized in different risk levels as per the risk score and is described in the About policy conditions section.

In the above example, the risk score is 21. Therefore, the Risk is categorized as:Risk level : Very high

Facts and limitations about risk score:

  • Sentiment score/Named Entity based policy condition hits does not contribute towards risk score.

  • Contribution to the total item risk score will be zero due to Subscription policy and any language detection policy as all these policies have risk weight zero by default.

  • Due to language detection policy hits, some discrepancies may be observed in Most Common Sensitive Data results from analyzer overview page. The result of the analyzer overview page is accurate.

  • Following policy conditions contribute to risk score.

    • Content

    • Title

    • Author

    • Content Type

    • Recipient

    • Modified Date

    • Creation Date

    • Sensitivity

    • Category

    • Size (Bytes)

    • Custom date field

    • Custom number field

    • Custom string field

    • Sender and Recipient department based policy condition

Condition groups

You can group a set of conditions and nest grouped conditions within other grouped conditions. The group operator that you choose determines whether an item must meet all, some, or none of the conditions in the group to be considered a match. The following group operators are available:

  • All of. An item must meet all the specified conditions.

  • Any of. An item must meet at least one of the specified conditions.

  • None of. An item must not meet any of the specified conditions.

    Note:

    You can nest a None of group within an All of group to look for certain condition matches while also excluding others. For example, to achieve the effect of "(condition X AND condition Y) BUT NOT condition Z", you would include the X and Y conditions in an All of group and the Z condition in a nested None of group.

  • n or more of. An item must meet the specified number of conditions.

For an All of group only, you can choose to look for instances where the conditions occur within any proximity or specific/specified number of characters from the first condition. For example, the following condition group looks for instances where the word Goodbye appears within 20 characters of the word Hello:

The text string "You say Goodbye and I say Hello" matches these conditions because there are fewer than 20 characters between the first character of Hello and the first character of Goodbye. Similarly, the string "You say Hello and I say Goodbye" also matches because there are fewer than 20 characters between the ends of the two words. In each case, the spaces count as characters.

Note:

When you conduct within nn characters proximity searches, take care not to duplicate the same search terms across multiple conditions. For example, suppose that you define one condition to look for the names Fred, Sue, and Bob, and a second to look for Joe, Bob, and Sarah. An item that contains a single instance of Bob would match these conditions.

Rather than choose the from the first condition option, you can choose in a sliding window. This option looks for instances where the conditions occur within any sequence of characters of the specified number. For example, a condition group that looks for instances where the word Goodbye appears within a 20-character sliding window of the word Hello does not match "You say Goodbye and I say Hello". There are 23 characters between the start of the word Goodbye and the end of the word Hello.

Sliding window example