Search <product_name> all support & community content...

How Data Insight works, and the ports that it uses for communication with devices

Article: 100039166

Last Published: 2024-05-14

Ratings: 3 0

Product(s): Data Insight

Description

Introduction

This document is designed to assist customers with locating and reviewing the ports required to be opened bidirectionally in their environment for Data Insight to function properly. This guide provides a review of the network configuration of Data Insight for most environments. This article assumes that the reader has a basic understanding of the main use cases of Data Insight, as well as some basic understanding of the fundamentals of the hardware platforms supported by Data Insight.

Data Insight Basics

Data Insight is a tool for scanning and analyzing data in unstructured data environments. These should be viewed as two separate processes with two separate purposes. Scanning is a single traversal of the file system to read metadata from folder or file objects, as either a directed per-modified object1 or a complete full traverse2 of every object. Whereas an audit trail is established by connecting an application program interface (API) to the device continually for monitoring all input to the device from clients to the device. In auditing, the I/O captured is compared to the shares being monitored after receipt. Unstructured data usually refers to loose user data. Individual users typically manage this data type. In most environments, infrastructure administrators or IT decision makers usually have little understanding of the data on these systems, or how it is used. Data Insight provides answers to many of the typical questions needed to effectively manage storage.

Data Ownership
Data Access Patterns / Forensics
Date Protection / Permissions
Data Access Governance

Data Insight focuses primarily on the data ownership, entitlement reporting and data access patterns required by administrators for effective unstructured data charge-back, storage tier optimization, and ILM reporting. It also enables security use cases involving data forensics, Entitlements management and risk analysis.

Architecture Basics

Data Insight is a self-contained product that installs on the Windows Platform (and Linux Indexers). The product is tested and validated on virtual machines. Virtual machines allow for a great deal of flexibility when requesting resources from a customer for install. The only considerations for a virtual machine are that the host virtualization server should have the same minimum requirements as a physical server. It’s also advisable to consider locating this server as close to the test target file servers as possible.

Data Insight is made up of 3 main component tiers. These components allow for flexible architecture decisions when planning for a worldwide production roll-out. A single-tier T1 would house the Management Server (MS) Collector and indexer housed on a single machine. A two-tier installation would separate the MS from a single node Collector, indexer combination. A three-tier T3 installation would separate all the worker nodes at least one time. For self-service portal or classification, in any tier, a separate server is needed.

Management Console: This is the web engine that provides the GUI. This server also acts as a query engine for other products that integrate into Data Insight such as DLP. This server is also does the Active Directory scanning. The main user database used to resolve SIDs is also located on this server.
Indexer: This is the main Data Insight data store. It’s a highly parallel, independent data store that can be scaled out horizontally as needed to grow to meet the size of the largest environments. The static metadata is maintained in small SQLite databases on a share level. The audit data is maintained in a proprietary data store developed internally by Veritas and implemented as virtual tables in each share’s SQLite database. This is all included in the main Data Insight install and requires no software pre-loads, or customer supplied licenses.
Collector: The collector tier is the set of components that communicate directly with the hardware that is being monitored. The collector is a set of API explorers and scanning engines that collect the user access data and file metadata.

For the purposes device access, the most important consideration is likely the placement of the collector. The collector generates the most network traffic, as it’s responsible for collecting audit data as well as metadata scans. It’s highly recommended to have the collector installed as close to the filer/device being monitored as possible to reduce the likelihood of latency issues created by crossing multiple ip network segments. In connecting to the devices it will be the most important for this node to have open access across firewalls to connect to ports from the devices.
Self-Service Portal – The self-service portal server provides an interface for data custodians to perform a workflow action or remediation. There are 4 types of workflow tasks supported today:

Ownership confirmation for custodians assigned within Data Insight
Entitlement reviews / Access certifications
DLP Incident remediation
Records Classification

Pre-Planning

The installation and configuration work detailed in the manuals for the devices compatible with Data Insight must be completed to ensure that the customer environment is ready for the install. First it will give the customer insight into what types of interfaces Data Insight uses to collect data. It will also inform the customer what logins or software for API access needs to be in place prior to install. Review the manuals for administration and installation (6.1.6 versions linked) to ensure that valid logins with the correct privileges for each device will be available at install, and that the necessary change control actions if needed have been taken to open ports.

It’s a good idea to review the Release Notes prior to the install date.

Device Requirements

Review the Software Compatibility list

The SCL will contain the network port requirements

There are mechanisms for altering the ports such as application settings or configuration files if it is desired to move off of the defaults.

All requirements and directions are listed in the Installation and administration, below is a summary by device.

NetApp Filer Targets

ONTAP 7.3.5 or higher (ONTAP 8.0.x OR 8.1.x must be configured in ONTAP 8 “7 mode”. ONTAP 8.2.x can be configured in cluster-mode or 7-mode.)
Domain User account that can scan shares (member of Backup Operators group locally on filer recommended)
Domain User account that can discover shares, configure FPolicy, and connect to FPolicy (either a member of the Administrators local group, or in a group with a role that has login-* and api-* rights is recommended)
Click on the DataInsightFpolicyCMod service to configure it.
- Provide logon credential for Fpolicy. By default, Fpolicy name is matpol and FPolicy port is 8787.
- Specify the IP address of the collector that can be accessed from the filer.
- In cluster-mode, there is no credential required to be specified. Make sure correct IP is selected in Fpolicy Collector IP field, in some cases where multiple NIC’s are present private IPs may populate resulting tcp communication to fail. Then, press Configure.
Once successfully configured, Fpolicy cluster-mode Collector service will be up and running on the server.
Ports configuration and review:

Configuring your storage system in a cluster environment

Note: please check the correct note for your version if not listed.

EMC Celerra / VNX

Common Event Enabler framework version 8.2 or higher
- Referred to as CEE
Domain User account with sufficient privileges to scan shares
User account that can connect via VNX / Celerra Event Publishing Agent (CEPA) for user access collection.

EMC Isilon

Version 7.1
Common Event Enabler framework version 8.2 or higher
- Requires .NET 3.5
- Default port 12228 is required to get to the CEE host from the cluster
- port 135 is used for the initial call to the CEE/CEPA server answer is within the dynamic port range between 49152 and 65535.
Domain User account with sufficient privileges to scan shares
User account for share discovery
New APIs for discovery3
Access to the web interface for connection of the device
- IPv4 - https://<yourNodeIPaddress>:8080
- IPv6 - https://[<yourNodeIPaddress>]:8080

Windows File Servers

With agent install (has a filter driver which enables auditing and monitoring)

Ports list
- SMB: 445 (TCP) // File Sharing
- RDP: 3389 (TCP) // Remote Desktop for Administration
- NetBT enabled. If so, NetBT uses ports:
  - 137 (UDP)
  - 138 (UDP)
  - 139 (TCP)

Windows 2008 R2, 64bit
Windows 20012R2, 64bit
Windows 2016
Windows 2019
Ref: https://support.microsoft.com/en-us/help/179442/how-to-configure-a-firewall-for-domains-and-trusts
All windows ports Informational - https://msdn.microsoft.com/en-us/library/cc875824.aspx

Without agent (No Auditing available)

User-added image

Windows 2003 or 2003 R2, 32 or 64 bit
Windows 2008 or 2008 R2, 64bit
Windows 2012 or 2012 R2, 64bit
Other targets such as Windows 2000 have not been officially tested but have been observed to work in the field although there are no supported extensions known from Microsoft.

Veritas File System (VxFS) Server

6.0.1 or higher
Configured in standalone or using Veritas Cluster Server (Clustered File Systems {CFS} not supported)

Microsoft SharePoint Servers

Microsoft SharePoint 2019
Microsoft SharePoint 2016 (under latest version with some caveats and patches) (ports list)
Microsoft SharePoint 2013 (ports list)
Microsoft SharePoint 2010
Site Collection administrator account under SharePoint server domain having full control permissions on added web applications

Footnotes:
(T1)
User-added image

(T2)

Incremental Scan-Related Jobs

CollectorJob runs once every 2 hours (configurable)

Collector job moves all raw audit logs files (e.g. fpolicy*.sqlite, winnas*.sqlite) from collector folder and moves them to folder called collector/staging.
For each filer for which it's a collector, it will determine the scan extract ID for this filer.
It then executes collector.exe -e ext_ids -d <staging> -s <staging> -p <collector/changelog>
Once the collector returns, it would have created a per-share adt file in the staging folder and would have populated logs into a per-filer changelog file (<devid>_changelog.pathdb) in the collector/changelog folder.
All adt files will be moved to outbox.
For filers where incremental scanning is disabled, it will purge all records from the changelog of the device to avoid buildup. Incremental scanning is disabled when IScannerJob is disabled for the effective collector OR scanning is disabled for the filer.
IScannerJob runs once every day at 7pm (configurable):

On Collector

This job will first create a list of shares that need to be scanned. This includes all shares of filers for which this node is collector except shares of those filers which are monitored using a remote winnas agent.
For all such shares, it will issue mergedb -x -m msu_ids -s <collector/changelog> -d collector/changelog/staging
Once mergedb returns, it would have created per-msu pathdb files for all shares which had activity since the last time incremental scan ran.
All per msu pathdb files are moved to the inbox.
Then, inbox is scanned to figure out which msus have pathdbs in the inbox.
If multiple pathdb files exist for the same msu, they are first merged and original removed.
Then, based on which msus have pathdbs available, those shares are added to the incremental scan queue.

On Winnas agent (clustered and non_clustered)

This job will first create a list of shares that need to be scanned. This includes all shares of the winnas filer for which this node is the agent such that the share is available on the local node. It checks this by accessing the physical path of the share. That way, if the share is currently imported on a different cluster node, it does not result in a failed scan.
For all such shares, it will make a request to the Collector node to send it pathdb files.
When the collector receives this request, it will issue the following command: mergedb -x -m msu_ids -s <collector/changelog> -d <collector/changelog/staging>.
When mergedb returns, it would have created a per msu pathdb file in the staging folder.
Collector will zip all these files and send them to the agent.
The agent will unzip these files into inbox folder.
Then, inbox is scanned to figure out which msus have pathdbs in the inbox.

If multiple pathdb files exist for the same msu, they are first merged and original removed.

Sequence of jobs for scan/audit in DI

Sequence of Jobs for Scan for Shares

Job	Node Type	What it does	Default Schedule
ScannerJob	Collector	Invokes scanner.exe for the shares and site collections monitored by collector node scanner.exe creates scan_cifs.sqlite or scan_nfs.sqlite files in <datadir>\outbox	7 p.m. on last Friday of every month
FileTransferJob	Collector	Transfers scan*.sqlite files from <datadir>\outbox to <datadir>\inbox on Indexer node	Every minute
IndexWriterJob	Indexer	Invokes idxwriter.exe, which consumes scan data and updates the index db	Every 4 hours

Sequence of Jobs for Audit for CIFS and NFS shares and Site Collections

Job	Node Type	What it does	Default schedule
CollectorJob	Collector	Invokes collector.exe to process raw audit files present in the <datadir>\collector folder and generates audit*.sqlite file in the <datadir>\outbox folder	Every 2 hours
FileTransferJob	Collector	Transfers audit*.sqlite files from <datadir>\outbox to <datadir>\inbox on Indexer node	Every minute
IndexWriterJob	Indexer	Invokes idxwriter.exe, which consumes audit data and updates the index db	Every 4 hours

Sequence of Jobs for Audit for Site Collections

Job	Node Type	What it does	Default schedule
SPAuditJob	Collector	Fetches audit data for site collections configured in Data Insight from SQL Server database and stores it in audit_sharepoint.sqlite in <datadir>\collector (It used to move the audit_sharepoint.sqlite directly to outbox until DI 5.1, but after 5.1, you must run CollectorJob to move the files to <datadir>\outbox folder. This change was made to support real-time alerts)	Every 2 hours
CollectorJob	Collector	Invokes collector.exe to process raw audit files present in the <datadir>\collector folder and generates audit*.sqlite file in the <datadir>\outbox folder	Every 2 hours
FileTransferJob	Collector	Transfers audit*.sqlite files from <datadir>\outbox to <datadir>\inbox on Indexer node	Every minute
IndexWriterJob	Indexer	Invokes idxwriter.exe, which consumes audit data and updates the index db	Every 4 hours

Sequence of Jobs for Incremental Scans
Incremental scanning takes place for a share when there are CREATE/WRITE/SECURITY events on the share.

Job	Node Type	What it does	Default schedule
CollectorJob	Collector	It moves all raw audit logs files (e.g. fpolicy.sqlite, winnas.sqlite) from collector folder and moves them to folder called collector/staging. In case of multiple collector threads, the folder is called collector/staging/<threadid>. In case of multiple threads running collector, each thread will work on separate filers. Then, for each filer for which it's a collector, it determines the scan extract ID for this filer. It then executes "collector.exe -e ext_ids -d <staging> -s <staging> -p <collector/changelog>", which a. Generates per share audit*.sqlite file in the <datadir>\outbox folder b. Populates changelogs into an intermediate per filer changelog file (<device>_<timestamp>_changelog.pathdb) in the <datadir>collector\changelog\inbox folder For filers where incremental scanning is disabled, it will purge all records from the changelog of the device to avoid buildup. Incremental scanning = disabled when IScannerJob disabled for the effective collector OR scanning disabled for the filer	Every 2 hours
ChangeLogJob	Collector	Merges changelog files from <datadir>\collector\changelog\inbox into a main device-specific changelog file (collector\changelog\<device_id>_changelog.db), which is used by IScannerJob	Every hour
IScannerJob	Collector/Windows File Server agent	Runs "mergedb.exe -x -m msu_ids -s <collector/changelog> -d collector/changelog/staging", which creates per-msu pathdb files for all shares which had activity since the last time incremental scan ran. All per msu pathdb files are moved to the inbox. based on which msus have pathdbs available, those shares are added to the incremental scan queue. scanner.exe is executed as follows: scanner.exe --extractid <extractid> --msuid <msuid> --pathdb C:\DataInsight\data\inbox\<msuid>_<timestamp>.pathdb --aclsfor only_dir --ownrfor dir_and_file --max_err 5000 This generates ISQLITE files in C:\DataInsight\data\outbox which has all the changes that have been done on the share. In case of nfs monitoring, isqlite files will be generated under <datadir>/unix/scanner folder, which will get transferred to outbox after running NFSUserMappingJob.	7 p.m. Daily
FileTransferJob	Collector	Transfers *.isqlite files from <datadir>\outbox to <datadir>\inbox on Indexer node.	Every minute
IndexWriterJob	Indexer	Invokes idxwriter.exe, which consumes incremental scan data and updates the index db.	Every 4 hours

New APIs used by isilonutil for share discovery/APIs currently used by isilonutil for share discovery:

Get all zone names: https://10.10.10.10:8080/platform/1/zones
Get all shares in that zone: https://10.10.10.10:8080/platform/1/protocols/smb/shares?zone=System

1. Discover OneFS API Version:

a.https://10.10.10.10:8080/platform/latest
b.Smart Connect APIs are supported for API version 3 and above. The older API versions don’t support this API
c.If this API returns a version of 3 or higher, we can look for smart connect or else fall back to old flow

2. Discover all access zones:

a.https://10.10.10.10:8080/platform/1/zones

3. Discover all shares for each access zones

a.https://10.10.10.10:8080/platform/1/protocols/smb/shares?zone=System
b.Return the smart connect name along with share information.

Example of node table with ports

(L)ocal/(R)emote	ID	Name	IP	Queryd_port	Commd_port	isConsole	isIndexer	isCollector	Ctr
L	1	DIServer1.local	DIServer1.local	8282	8383	1	1	1	0
R	2	DIPortal1.local	DIPortal1.local	8282	8383	0	0	0	0
R	3	Indexer1.local	Indexer1.local	8282	8383	0	1	1	0
R	4	Collector1.local	Collector1.local	8282	8383	0	0	1	0
R	5	FileServer1.local	FileServer1.local	8282	8383	0	0	1	0
R	6	FileServer2.local	FileServer2.local	8282	8383	0	0	1	0

Reference

List of Jobs Available in Data Insight (veritas.com)

How Data Insight works, and the ports that it uses for communication with devices

Description

Introduction

Data Insight Basics

Architecture Basics

Pre-Planning

Device Requirements

NetApp Filer Targets

EMC Celerra / VNX

EMC Isilon

Windows File Servers

Veritas File System (VxFS) Server

Microsoft SharePoint Servers

Incremental Scan-Related Jobs

CollectorJob runs once every 2 hours (configurable)

On Collector

On Winnas agent (clustered and non_clustered)

Sequence of jobs for scan/audit in DI

Reference

Related Knowledge Base Articles

Was this content helpful?

Translated Content

How Data Insight works, and the ports that it uses for communication with devices

Description

Incremental Scan-Related Jobs

CollectorJob runs once every 2 hours (configurable)

On Collector

On Winnas agent (clustered and non_clustered)

Sequence of jobs for scan/audit in DI

Reference

Related Knowledge Base Articles

Was this content helpful?

Article Languages

Translated Content

Translated Content