Important Update: Cohesity Products Documentation
All Cohesity product documentation are now managed via the Cohesity Docs Portal: https://docs.cohesity.com/HomePage/Content/home.htm. Some documentation available here may not reflect the latest information or may no longer be accessible.
Arctera Insight Information Governance User's Guide
- Section I. Introduction
- Information Governance Dashboard
- Information Governance Workspace
- Section II. Information Governance reports
- Using Information Governance reports
- About Information Governance reports
- How Information Governance reporting works
- Creating a report
- About Information Governance security reports
- Activity Details report
- Permissions reports
- Inactive Users
- Path Permissions
- Permissions Search report
- About Permissions Query templates
- Creating a Permissions Query Template
- Creating custom rules
- Permissions Query Template actions
- Using Permissions Search report output to remediate permissions
- Entitlement Review
- User/Group Permissions
- Group Change Impact Analysis
- Ownership Reports
- Create/Edit security report options
- About Information Governance storage reports
- Create/Edit storage report options
- About Information Governance custom reports
- Considerations for importing paths using a CSV file
- Managing reports
- About managing Information Governance reports
- Viewing reports
- Filtering a report
- Editing a report
- About sharing reports
- Copying a report
- Running a report
- Viewing the progress of a report
- Customizing a report output
- Configuring a report to generate a truncated output
- Sending a report by email
- How does number of records field work differently from the truncate output records field?
- Automatically archiving reports
- Canceling a report run
- Deleting a report
- Considerations for viewing reports
- Organizing reports using labels
- Using Information Governance reports
- Section III. Remediation
- Configuring remediation workflows
- About remediation workflows
- Prerequisites for configuring remediation workflows
- Configuring Self-Service Portal settings
- About workflow templates
- Managing workflow templates
- Creating a workflow using a template
- Managing workflows
- Auditing workflow paths
- Monitoring the progress of a workflow
- Remediating workflow paths
- Using the Self-Service Portal
- About the Self-Service Portal
- Logging in to the Self-Service Portal
- Using the Self-Service Portal to review user entitlements
- Using the Self-Service Portal to manage Data Loss Prevention (DLP) incidents
- Using the Self-Service Portal to confirm ownership of resources
- Using the Self-Service Portal to classify sensitive data
- Managing data
- About managing data using Arctera Enterprise Vault and custom scripts
- Managing data from the Shares list view
- Managing inactive data from the Folder Activity tab
- Managing inactive data by using a report
- Archiving workflow paths using Arctera Enterprise Vault
- Using custom scripts to manage data
- Pushing classification tags while archiving files into Arctera Enterprise Vault
- About adding tags to devices, files, folders, and shares
- Managing permissions
- Configuring remediation workflows
- Appendix A. Command Line Reference
- Index
About the Information Governance Dashboard
The Dashboard page provides a snapshot of your entire environment. Use this page to verify that all components are functioning as expected or to identify areas that require administrative attention. When you log in to Information Governance, the Dashboard appears by default. The Dashboard consists of seven tiles categorized as follows:
The Managed Storage tile provides a high-level overview of your organization's monitored data. It serves as a central summary of the following metrics:
It serves as a central summary of the following metrics:
- The cumulative volume of data across all configured storage sources.
- The number of distinct platforms or systems currently being monitored.
- The count of specific storage areas (such as shares or site collections) under supervision.
- The aggregate count of all objects scanned and indexed by the system.
When a device is configured, collectors perform scans based on a defined schedule. The system processes this data through the following stages to populate the dashboard:
Scanning and Indexing: During initial and incremental scans, file metadata is captured and stored in the system index.
Data Ingestion: The system updates the reporting database for each share immediately following the completion of a scan job.
Daily Summary overview: A scheduled nightly job aggregates the latest information to refresh the dashboard tiles and historical charts.
Note:
The dashboard summary reflects the results of the most recent nightly summarization, which captures a daily snapshot for each device to ensure accurate historical trend reporting.
The system uses the historical data captured during the nightly summarization to visualize growth over time through the following mechanisms:
- By recording a new data point for every device each night, the system plots the total managed storage volume across days, weeks, or months.
- This allows administrators to monitor the velocity of storage consumption and identify specific periods of high activity.
- These trends are used to calculate the Storage Capacity Forecast, estimating when current storage limits may be reached based on average growth rates.
The Storage usage by file group tile provides a sophisticated, multi-dimensional perspective on your organization's storage footprint. By transforming raw file extensions into meaningful functional categories, this tile allows administrators to distinguish between business-critical documentation and high-volume multimedia or system files that may require different lifecycle management or storage policies.
Dynamic Bubble Chart Visualization - The tile employs a high-impact bubble chart to illustrate data distribution. The relative diameter of each bubble serves as an immediate visual indicator of storage density; larger bubbles representing categories like Video or Media Files signal areas where significant capacity is being consumed at a glance.
Automated Top 5 Consumption Ranking - To ensure administrative focus remains on the most impactful data, the system automatically identifies the five file groups with the highest cumulative storage footprint. These groups - such as Office Files, Document, or Text Files - are ranked in descending order, with the largest storage drivers prioritized at the top of the interface.
Granular Resource Metrics - For every identified group, the dashboard provides three essential data points to help diagnose storage trends: the Group Name, the Total Aggregate Size (automatically scaled to MB, GB, or TB for readability), and the Total File Count. This combination allows you to determine if a storage spike is caused by a small number of massive files or a proliferation of millions of tiny objects.
Extensible Functional Mapping - The system utilizes an intelligent mapping logic to consolidate related file extensions into cohesive groups. For example, the Office Files group aggregates metadata from diverse spreadsheet, presentation, and word processing formats, providing a unified view of productivity data across the entire enterprise.
The lifecycle of the data within this tile follows a structured processing path to ensure consistent accuracy across the Storage Dashboard:
Primary Data Ingestion - As collectors complete scheduled or incremental scans of your repositories, a background process captures the raw file metadata. This information is immediately mapped to extensions and ingested into the reporting database to ensure the foundational data remains current.
Daily Analytic Summarization - A nightly scheduled job performs a comprehensive consolidation of the day's scan results. This job extracts the latest available snapshots for each device and generates the analytics required to render the bubble charts and summary tables, ensuring the dashboard reflects the most recent environment-wide state.
Configuration Sensitivity and On-Demand Refresh - The system is designed to be responsive to administrative changes. If a user modifies a file group - such as adding a new extension to the Document group or deleting an obsolete category - the system triggers an on-demand summarization to reflect these changes in the dashboard metrics in real time.
- By recording a new data point for every device each night, the system plots the total managed storage volume across days, weeks, or months.
- This allows administrators to monitor the velocity of storage consumption and identify specific periods of high activity.
- These trends are used to calculate the Storage Capacity Forecast, estimating when current storage limits may be reached based on average growth rates.
The Storage Capacity Forecast tile visualizes historical storage growth and provides a predictive analysis for the next six months. This data allows your organization to plan for future infrastructure requirements based on actual usage trends.
- This metric calculates the cumulative increase in data volume across all monitored sources over the previous half-year period. It provides a baseline for understanding your environment's baseline growth rate.
- Using advanced predictive modeling, the system projects your expected storage requirements for the next 180 days. This forecast helps in budgeting and procurement cycles by estimating future data footprints.
- The tile provides a granular breakdown of both historical and forecasted growth categorized by device type (such as NetApp, Windows File Servers, or SharePoint). This helps administrators identify which specific platforms are driving the most significant storage consumption.
Note:
The system requires a minimum of 30 days of historical data to generate an accurate forecast. If the environment has been monitored for less than 30 days, the tile displays the message: "Insufficient data to forecast storage capacity growth."
The forecasting logic is dynamic and adjusts based on the volume of historical data available. While the tile always provides a fixed six-month outlook, it considers between 30 and 180 days of existing history to calculate the most accurate growth trajectory possible.
To maintain the integrity of these forecasts, the system manages data through a structured background process:
Data Capture and Ingestion: As collectors perform scheduled and incremental scans, metadata is moved from the system index into a primary reporting database. This ensures that every file change is reflected in the raw data set.
Daily Snapshot Summarization: A nightly scheduled job processes the latest available information for each device. To ensure historical accuracy without inflating data, the system records exactly one entry per device for each calendar day.
Data Consolidation Logic: If a summarization job is triggered manually on a day where a record already exists, the system intelligently updates the existing data point with the most current information rather than creating a duplicate entry. This prevents the skewing of trend lines and forecast models.
This consolidated data set serves as the single source of truth for several dashboard components, including the Storage Summary, the Top 5 Data Sources by Storage, and the Capacity Forecast tiles, ensuring that all metrics remain synchronized across the interface.
The Potential Cost Savings tile serves as a high-impact financial forecasting tool, translating complex storage metadata into actionable economic insights.
By identifying stagnant data that no longer provides active business value but continues to consume premium primary storage, the system empowers administrators to quantify the fiscal benefits of implementing data migration or archival strategies.
Access-Driven Financial Modeling - The system leverages the "Last Accessed" metadata captured during the data age distribution process as the primary engine for its calculations. By focusing on when a file was last utilized by a user or application - rather than simply its creation date - the system ensures that cost savings estimates are grounded in actual data inactivity, targeting files that are truly idle.
Context-Aware Device Cost Analysis - Recognizing that the price of storage varies drastically across different infrastructure tiers, the system performs specialized financial computations based on the specific device types currently active in your environment. Whether you are managing high-performance flash arrays, traditional file servers, or cloud-based object storage, the cost models are inherently context-aware. The dashboard dynamically filters cost configurations, displaying only the parameters relevant to the platforms you are monitoring.
Defined Optimization Thresholds: Inactive vs. Stale - To provide a structured framework for storage reclamation, the system segments aging data into two critical tiers for cost modeling. Data that has remained untouched for between 6 and 36 months is classified as Inactive, representing prime candidates for mid-tier archival. Data exceeding the 36-month threshold is categorized as Stale, identifying legacy information that likely qualifies for long-term cold storage or secure deletion.
Administrative Sovereignty and Financial Tuning - Administrators maintain full control over the financial variables used to generate these reports. You can tailor the cost-per-gigabyte estimates and the specific bucket ranges for inactive data to mirror your organization's unique procurement costs and internal data retention mandates. For simplified management, every configuration interface includes a Reset to Default option, allowing you to instantly align your projections with industry-standard baseline settings.
The integrity of these financial projections is maintained through a rigorous background data life-cycle to ensure that the dashboard remains a reliable source of truth for executive reporting:
Automated Analytic Summarization - The system relies on a consolidated data architecture to populate the savings tile. Every night, a scheduled job aggregates the most recent access timestamps across all active repositories. This daily synchronization ensures that as files transition into older age brackets, their contribution to your potential savings is recalculated and updated every 24 hours.
On-Demand Policy Recalculation - The dashboard is designed to be highly responsive to administrative input. Any modification to your global cost settings or the addition of a new device type triggers an immediate, on-demand summarization. This real-time sensitivity allows you to perform "what-if" scenarios, instantly visualizing how changes in storage pricing or aging policies would impact your organization's bottom line.
To customize your financial projections, click the Options Menu (represented by the three vertical dots) located in the upper right-hand corner of the Potential Cost Savings tile. This menu provides the following administrative controls:
Refresh - Select this to manually trigger an update of the tile's data. This ensures your financial estimates reflect the most recent scan results and provides an up-to-the-minute view of your storage environment.
Change Storage Config - Use this setting to define the specific time-based thresholds that categorize data as Inactive or Stale. This allows you to align the system's optimization logic with your organization's unique data retention policies.
Change Cost - This option allows you to input your organization's specific cost-per-gigabyte for both primary and archival storage. By tailoring these variables, you ensure the generated reports provide an accurate and realistic economic outlook for executive review.
The Top 5 Storage Usage tiles provide a strategic rollup of your organization's storage footprint, transforming raw file data into actionable governance insights. By distilling information across three distinct dimensions - Business Units, Owners, and Devices - the system identifies the primary drivers of storage consumption and establishes a framework for departmental accountability.
- This tile identifies which departments or organizational branches are responsible for the highest storage volumes. By aggregating individual user data into functional groups, it provides the visibility necessary for departmental charge backs and budgeting, highlighting which areas of the business are expanding their digital footprint most rapidly.
- This tile focuses on individual accountability by ranking the users who possess the largest volume of data across the enterprise. This visibility is essential for data cleanup initiatives, allowing administrators to collaborate directly with high-volume owners to manage legacy files or migrate data to more appropriate storage tiers.
- This tile monitors the physical and virtual infrastructure, listing the primary storage hardware or platforms (such as specific file servers or cloud instances) that hold the largest share of data. This helps IT teams identify which systems are nearing capacity limits and require proactive hardware management or data rebalancing.
The intelligence fueling these tiles is driven by a sophisticated background correlation engine that merges low-level metadata with high-level organizational mapping through the following lifecycle:
Metadata Collection and Incremental Updates - Upon device configuration, Information Governance collectors perform deep scans of repositories according to their defined schedules. This process captures granular file metadata, which is stored and continually updated during incremental scans to ensure the data reflects the current state of the storage environment.
Directory Service Integration - The system goes beyond file metadata by actively scanning directory services (such as Active Directory) to synchronize user identities and departmental attributes. This ensures that every file tracked is linked to a verified identity within the organization.
Business Unit Mapping and Data Alignment - To ensure accurate departmental reporting, administrators can upload a specialized mapping file. This file bridges the gap between individual user accounts and their respective Business Units, allowing the system to accurately roll up individual file ownership into a cohesive departmental view.
Policy-Based Ownership Calculation - Data ownership is dynamically computed based on your organization's configured data owner policies. Whether ownership is determined by file creation, the last person to modify a document, or specific folder permissions, the system applies these rules to ensure the "Top 5 Owners" tile aligns with your specific governance standards.
Automated Data Ingestion and Update - A background process continuously manages the flow of information, ingesting and updating the reporting database for every individual owner. This automated workflow ensures that as file permissions change or new data is created, the system's primary reporting tables remain a reliable "single source of truth" for the dashboard metrics.
The Data age distribution tile is a critical life-cycle management tool that provides a chronological audit of your storage environment. By visualizing data based on the time elapsed since it was last accessed or modified, administrators can identify deep-seated cold data that is ripe for archival or deletion, thereby reclaiming expensive primary storage capacity.
Comparative Activity Analysis - The dashboard allows users to pivot between Last Accessed and Last Modified data views. This distinction is critical for defining data value; while a file might not have been modified for years, frequent access indicates it remains vital for reference or compliance. Conversely, data that has reached high thresholds in both categories is the primary target for defensible deletion or archival migration.
Granular Chronological Windowing - The system employs a standardized aging logic that segments data into six-month buckets based on its most recent activity. This progression tracks data from its initial state (0 - 6 months) through mid-term aging (6 months to 3 years) and into long-term legacy status (up to 7 years). By providing these specific intervals based on Last Accessed or Last Modified time-stamps, the system helps identify exactly when data activity begins to plateau across the enterprise.
Extended Retention Tracking - Any data that has remained untouched for more than 84 months is consolidated into a specialized 7Y+ category. This provides a clear visibility layer for legacy records that may have exceeded standard legal or corporate retention mandates, facilitating easier compliance auditing and risk management.
Visual Volume Indicators - The tile utilizes a horizontal bar chart where the length of each bar is proportional to the total storage volume (MB, GB, or TB) within that specific time window. This visual hierarchy allows administrators to instantly recognize storage bloat in older tiers, making it easy to build a business case for storage tiering or cloud offloading.
Data Integrity Filtering - To ensure that the dashboard provides a single source of truth, the distribution is calculated exclusively for enabled shares. The system intentionally excludes disabled or offline repositories because their access and modification information may no longer reflect the live environment, which could otherwise lead to inaccurate reporting and flawed optimization strategies.
The system ensures a reliable and up-to-date record of your data's life-cycle through a structured update process:
Flexible Reporting Standards - The aging windows are designed to be extensible, allowing the organization to adapt its reporting views in the future without compromising the continuity of its historical storage trends.
Daily Analytic Refresh - The system performs a comprehensive data consolidation every night for every monitored device. This creates a clean, daily snapshot of the aging distribution. If a manual refresh is triggered, the system intelligently updates the existing daily record to prevent duplicate entries, ensuring that your long-term trend lines remain precise and reliable for executive reporting.
The Data age distribution tile is a critical life-cycle management tool that provides a chronological audit of your storage environment. By visualizing data based on the time elapsed since it was last accessed or modified, administrators can identify deep-seated cold data that is ripe for archival or deletion, thereby reclaiming expensive primary storage capacity.
Comparative Activity Analysis - The dashboard allows users to pivot between Last Accessed and Last Modified data views. This distinction is critical for defining data value; while a file might not have been modified for years, frequent access indicates it remains vital for reference or compliance.
Granular Chronological Windowing - The system employs a standardized aging logic that segments data into six-month buckets based on its most recent activity. This progression tracks data from its initial state (0 - 6 months) through mid-term aging (6 months to 3 years) and into long-term legacy status (up to 7 years).
Extended Retention Tracking - Any data that has remained untouched for more than 84 months is consolidated into a specialized 7Y+ category. This provides a clear visibility layer for legacy records that may have exceeded standard legal or corporate retention mandates.
Data Integrity Filtering - To ensure that the dashboard provides a reliable overview, the distribution is calculated exclusively for enabled shares. The system intentionally excludes disabled or offline repositories because their access and modification information may no longer reflect the live environment.
The Potential Redundant Data tile provides visibility into specific categories of information that may no longer be required for business operations. By identifying these clusters of data, administrators can implement targeted cleanup policies to reclaim valuable storage space and reduce backup overhead.
To customize which data categories are monitored, click the Options Menu (represented by the three vertical dots) in the upper right-hand corner of the tile. This menu provides the following administrative controls:
Select File Groups - Use this option to choose specific File Groups - such as temporary system files, legacy compressed archives, or redundant media formats - that the system should track as potential candidates for reclamation.
Refresh - Select this to manually synchronize the tile with the latest scan results. This ensures that your redundancy metrics reflect the most current state of your storage environment after remediation actions or new data discoveries.
User-Defined Monitoring - Rather than using a fixed set of rules, the system allows you to tailor the dashboard to focus on the file types most relevant to your organization's storage profile. This ensures that the redundancy metrics are actionable and aligned with your specific data management goals.
Automated Conflict Detection - Once your preferred file groups are selected, the system continuously analyzes your repositories to flag matching content. This provides a clear, up-to-date visualization of how much storage is being consumed by non-essential data across the enterprise.
The Dashboard serves as your high-level control center, offering the immediate visibility required to maintain an optimized storage environment. By monitoring these key indicators regularly, administrators can proactively manage data growth, mitigate security risks, and implement cost-saving measures before they impact organizational efficiency.
For a more granular analysis of the specific servers, folders, or users contributing to these dashboard metrics, navigate to the Workspace tab. This area allows you to transition from high-level executive summaries to detailed file-level insights for targeted remediation.
When a device is configured, collectors perform scans based on a defined schedule. To ensure the dashboard remains a reliable reference point for your storage environment, the system processes and refreshes data through several coordinated stages:
Automated Metadata Discovery - As scans and incremental updates complete, the system captures the latest file attributes, including size, type, and ownership, ensuring the foundational data remains current.
Organizational Data Alignment - The system correlates this raw metadata with your directory services and any uploaded Business Unit mapping files to provide essential context to the storage usage metrics.
Daily Analytic Refresh - Every night, a scheduled process performs a comprehensive data consolidation. This creates a clean, daily snapshot for every monitored device, ensuring that your long-term trend lines and aging reports are precise and dependable for executive reporting.