List of Jobs Available in Data Insight

Article: 100045147
Last Published: 2023-10-11
Ratings: 0 1
Product(s): Data Insight

Problem

Data Insight (DI) runs tasks in the concept of jobs on a scheduled basis to move and process data between worker nodes in the DI environment.

Cause

Jobs can be listed using the command: <INSTALLDIR>\Program Files\DataInsight\bin\configcli.exe list_jobs
 

A job can be run via the CLI using the following command: <INSTALLDIR>\Program Files\DataInsight\bin\configcli.exe execute_job <JobName>

Note: JobName is case-sensitive

Example:
C:\Program Files\DataInsight\bin\configcli execute_job IndexWriterJob

Result expected:

Job started

Solution

DataInsightComm Service Jobs

 
Job Description Default Schedule Node Files Processed\Updated Comments
ActivityIndexJob Refreshes activity index for each share 7:00:00 AM - Daily Indexer Updates activity.idx.<timestamp>, dir-activity.idx.<timestamp> for each MSU  
ADScanJob Scans the directory services (AD, LDAP, NIS). Updates users.db 3:00:00 AM -Daily MS Processes users.db and _local_users.zip files  
CollectorJob Initiates collector.exe to process raw event files and generates audit files Every 2 hours Collector Input: fpolicy, cee, winnas*.sqlite files
File Format: fpolicy_<timestamp>_<gen>_<deviceId>.sqlite

Output: audit_sqlite files
File Format: audit_cifs_<timestamp>_<shareId>
 
CollectorJob_size Initiates collctor.exe run for a device if the input size for device exceeds the batch_size limit Every 10 minutes Collector Input: fpolicy, cee, winnas*.sqlite files
File Format: fpolicy_<timestamp>_<gen>_<deviceId>.sqlite

Output: audit_sqlite files
File Format: audit_cifs_<timestamp>_<shareId>
Runs more frequently than the regular CollectorJob, so that files for an active device are processed as soon as enough input is available
ChangeLogJob Merges changelog files generated by collector.exe from collector\changelog\inbox into a main device-specific changelog file. This file is later used by incremental scanner Every 1 hour Collector Processes <nodeid>_<timestamp>_changelog.db files  
ControlPointJob Triggers controlpoint.exe for each share on the indexer and generates tag files 12:00:00 PM Saturday Indexer Generates tags file in inbox with the format <msuId>_timestamp_tags.sqlite
Generated tags files are consumed up by the next IndexWriterJob to update MSU DB
 
ScannerJob Initiates the scanner process to scan the shares/site collections added to Data Insight. Creates the scan database for each share that it scanned in the outbox folder 7:00:00 PM Last Friday of the Month Collector, WinNAS w/agent Output: scan_cifs_<shareId>_<timestamp>_snapshot.sqlite or scan_sharepoint_<siteCollId>_<timestamp>.sqlite There may be additional ScannerJob_<num> instances if custom scan schedules are configured
IScannerJob Initiates the incremental scan process for shares/site collections for paths that have changed on those devices since the last full scan 7:00:00 PM - Daily Collector, WinNAS w/agent Comsumes msu<msuId>_<timestamp>_path.db files and generates scan_cifs_<msuid>_<timestamp>.sqlite and scan_sharepoint_<msuid>_<timestamp>.sqlite files  
CreateWorkflowDBJob Creates the database containing the data for DLP Incident Management, Entitlement Review , and Ownership Confirmation workflows based on the input provided by users Every 1 minute MS    
DlpSensitiveFilesJob Pulls classification information from the configured DLP server and generates tags.db file for each MSU 12:00:00 AM - Daily MS Generates tags file in outbox for each msu with the format msu<msuId>_<timestamp>_dlp-tags.sqlite Generated tags files will be transferred to respective indexers and will be consumed by next IndexWriterJob
FileTransferJob Transfers the files from the <datadir>\outbox folder from a Collector node to the <datadir>\inbox folder of the appropriate Indexer or Management Server node Every 1 minute All    
FileTransferJob_content Routes content files and classification sqlite files to the assigned classification server Every 10 seconds WinNAS w/agent    
FileTransferJob_Evt Transfers event files (internal events of the product) to MS Every 1 minute All Processes <timestamp>_<nodeId>_<version>_events.sqlite files Ensures that events make it to the management server asap without waiting in queue for other files to get transferred
FileTransferJob_WF Transfers workflow DB from MS to appropriate Portal node Every 1 minute MS, Portal Processes <datadir>\workflow\outbox\workflow.db  
FileTranferJob_classify Distributes the classification events between nodes Every 1 minute All    
IndexWriterJob Initiates the idxwriter process to update the Indexer database with scan (incremental and full) and audit data Every 4 hours Indexer Consumes audit_sqlite, scan_sqlite and _tags.sqlite files  
ActivityIndexJob Updates activity index when a share or site collection index.db is updated 7:00:00 AM - Daily Indexer Updates dir-activity.idx.### and file-activity.idx.#### for each MSU Allows faster computation of data ownership
IndexCheckJob Verifies the integrity of the index databases 6:00:00 PM Monday Indexer    
PingHeartBeatJob Sends the heartbeat from the worker node to the Management Server Every 1 minute All, except MS    
PingMonitorJob Monitors heartbeats from the worker nodes. Sends notifications in case it does not get a heartbeat from the worker node Every 1 minute MS    
SystemMonitorJob Monitors status of Watchdog service Every 1 minute All    
DiscoverSharesJob Discovers shares on filers and web applications on site collections 2:00:00 AM & 2:00:00 PM - Daily All Creates <nodeid>_<timestamp>_discovered_shares.zip for the node The zip file is sent to the MS via the FileTransferJob
DiscoverSharesJob_mrg Merges all of the zip files produced by DiscoverSharesJob 3:00:00 AM & 3:00:00 PM - Daily MS Consumes _discovered_shares.zip files for all nodes Runs an hour later than the regular DiscoverSharesJob so as to let the share discovery complete
ScanPauseResumeJob Checks the changes to the pause and resume settings on the Data Insight servers, and accordingly pauses or resumes scans Every 1 minute Collector, WinNAS w/agent    
DataRetentionJob Enforces the data retention policies which include archiving old index segments, deleting old segments, indexes for deleted objects, old system events, and old alerts 12:00:00 AM 1st and 15th of the Month MS, Indexer    
IndexVoldbJob Index all volume usage information received from all collectors and Windows File Server agents Every 10 minutes MS Consumes voldb files <deviceid>_<timestamp>_voldb.sqlite  
SendNodeInfoJob Sends the node information, such as the operating system, Data Insight version running on the worker node to the MS Every 1 minute All   Information is sent only if it differs from information in the current DB
EmailAlertsJob Send email alerts which are created when any Data Insight policy is violated Every 15 minutes MS Reads <datadir>\alerts\alerts.db SMTP settings need to be configured successfully
EmailEventsJob Sends product event email notifications as configured in Data Insight.  Every 15 minutes MS Reads <datadir>\events\events_email.db SMTP settings need to be configured successfully
LocalUserScanJob Scans the local users and groups on the configured devices 12:30:00 AM - Daily Collector, WinNAS w/agent Generates <nodeid>_<timestamp>_ local_users.zip files The local_users.zip files are transferred to MS via FileTransferJob to be consumed by ADScanJob
UpdateCustodiansJob Downloads custodian updates from each indexer Every 1 minute MS    
CompactJob Compress the attic folder and err folders in <datadir>\collector, <datadir>\scanner, and <datadir>\indexer 5:00:00 AM - Daily All   Uses the Windows compression feature to set the "compression" attribute for the folders
Compact_Job_Report Compresses completed reports under <datadir>\console\reports\reportruns folder Every 1 hour MS    
MergeStatsJob Aggregates the published statistics into a primary <datadir>\stats\statistics.db that are rolled up into hourly, daily, weekly buckets Every 15 minutes Collector, WinNAS w/agent For WinNAS nodes, this job creates a statistics file to be sent over to its collector node.

 The collector node consolidates stats from all its WinNas nodes
 
StatsJob_Index_Size Publishes statistics related to index size 6:00:00 AM - Daily Indexer Updates <datadir>\stats\lstats.db file The information is used to display the filer statistics on the Data Insight Management Console
StatsJob_Latency Records filer latency statistics for NetApp filers Every 15 minutes Collector    
SyncScansJob Collector: Gathers all scan progress information from WinNAS nodes

MS: Gathers all scan progress information from Collectors
Every 1 minute MS, Collector   The scan status is displayed on the Settings>Scanning Dashboard > In-progress Scans tab of the Management Console
RTWBJob Consumes copies of the audit files generated from the CollectorJob to use in Real Time Policies Every 1 minute Indexer Consumes wb_audit_<namespace>_<msuId>_<timestamp>.sqlite files generated by the CollectorJob for Real Time Policies

get_rt_alerts.exe creates alerts_<nodeId>_<timestamp>_wbrt.sqlite files to alerts\outbox folder
Generated alert files are Transferred to the MS via FileTransferJob and processed on the MS by the PolicyJob_alerts job to create the alerts
EVGetCategoriesJob Pulls the retention categories of all the configured EV servers Every 1 hour MS Updates <datadir>\conf\workflow\steps\ev\EV.db  
SFAuditJob Collects audit logs from the InfoScale file system servers Every 10 minutes Collector Generates <datadir>\unix\vxfs_<gen>_<timestamp>_<filerId>.sqlite files Enabled only on collector nodes that have VxFS filers configured
SFUpdateJob Sends Data Insight configuration changes to the InfoScale file system server Every 10 minutes Collector   Updates VxFS server with share configuration changes
SPEnableAuditJob Enables auditing for site collections monitored by Data Insight Every 10 minutes Collector    
SPAuditJob Collects the audit logs from the SQL Server database for SharePoint web applications and generates SharePoint audit databases Every 2 hours Collector Generates audit_sharepoint_<siteCollId>_<timestamp>.sqlite Enabled only if a SharePoint web application is configured on the collector
SPScannerJob Scans the site collections at the scheduled time 11:00:00 PM - Daily Collector Generates scan_sharepoint_ files Enabled only if a SharePoint web application is configured on the collector
NFSUserMappingJob Maps every UID in raw audit files for NFS and VxFS to an ID generated for use in Data Insight (either a SID of local user or NIS/LDAP user) Every 1 hour Collector Processes fpolicynfs_<timestamp>_<gen>_<deviceId>.sqlite and vxfs_<gen>_<timestamp>_<deviceId>.sqlite Enabled only on collector nodes that have VxFS or NFS filers configured
MsuAuditJob This job computes the storage statistics for each msu folder on the indexer. Statistics which are consumed per msu are activity index files size, hash files size, segment files size, index db size and total size 6:00:00 PM Saturday Indexer Stores statistics in <datadir>\stats\msu_stats.db file This information is used to show msu level statistics in GUI. Also useful for monitoring historical growth
MsuMigrationJob This job starts/resumes any filer migration initiated by user 7:00:00 AM - Daily & immediately after migration is executed Indexer Copies all MSU folders from source Indexer to destination Indexer  
ProcessEventsJob Processes all internal Data Insight events received from worker nodes and adds them to the events.db files on the Management Server Every 1 minute MS Processes _events.sqlite files in inbox

Adds events to the MS date wise file stored in <data>\events\yyyy-mm-dd_events.db
 
ProcessEventsJob_SE Consumes the scanerror files and inserts the information about failed paths into a per-share file on MS Every 1 minute MS Input: <datadir>\inbox\scanerror_<msuId>_<timestamp>_snap.sqlite

Output: <data>/console/stat/scan_errors/msu<id>.db
 
SpoolEventsJob Spools events on worker nodes to be sent to Management Server Every 1 minute All Generates _events.sqlite files  
WFStatusMergeJob Merges the workflow and action status updates for remediation workflows (DLP Incident Remediation, Entitlement Reviews, Ownership Confirmation), Enterprise Vault archiving, and custom actions. Updates the master workflow database with the details so that users can monitor the progress of workflows and actions from the Management Console Every 5 minutes MS    
UpdateConfigJob Updates configuration changes to the local configuration database on the worker node Every 1 minute All Updates <datadir>\conf\config.db.<v> to the latest version  
UpdateConfigJob_lic Updates portal license in the configuration database properly Every 24 hours MS    
DeviceAuditJob Fetches audit records from Hitachi HNAS EVS configured in Data Insight Every 1 second Collector    
HNasEnableAuditJob Enables SACLs for the shares when a Hitachi HNAS filer is configured Every 10 minutes Collector    
VolInfoJob Collects volume utilization information from WinNAS filers Every 24 hours WinNAS w/agent Generates <deviceid>_<timestamp>_voldb.sqlite files The voldb files are sent over to MS and consumed by IndexVoldbJob
WorkflowActionExecutionJob Consumes the request file from the MS when a Records Classification workflow is submitted from the Portal. The request file contains the paths on which and Enterprise Vault action is submitted. When the action on the paths is complete, the job updates the request file with the status of the action Every 1 hour MS, Portal    
UserRiskJob Updates hashes used to compute the user risk score 2:00:00 AM - Daily Indexer    
UpdateWFCentralAuditDBJob Updates workflow audit information Every 1 minute MS Updates <datadir>\workflow\workflow_audit.db  
TagsConsumerJob Consumes the CSV file containing tags for paths. Imports the attributes into Data Insight and generates a tags.db file for each file system object 11:00:00 PM - Daily MS    
KeyRotationJob Changes encryption keys used by the nodes On Demand MS    
RiskDossierJob Computes the number of files accessible and number of sensitive files accessible to each user on each MSU 11:00:00 PM - Daily Indexer    
ClassifyInputJob Processes the classification requests from the Data Insight Management Console and from reports for the consumption by the bookkeeping DB Every 10 seconds MS    
ClassifyBatchJob Splits classification batch input databases for consumption by the scanner on the collector Every 1 minute Indexer    
ClassifyIndexJob Updates index database for each MSU with classification tabs and updates the bookkeeping DB Every 1 minute Indexer    
ClassifyMergeStatusJob Pulls classification update status files from indexer and updates the global bookkeeping DB which is used to display high level classification status in the console Every 1 minute MS    
CancelClassifyRequestJob Fetches the list of classification requests that are canceled and distributes this request between Data Insight nodes Every 20 seconds Classification   Before classifying files, all the classification jobs consult this list to identify the requests that are marked for cancellation. If they observe any canceled request in the new request that is submitted for classification, then that request is deleted
CloudDeviceAuditJob Collects audit data from configured Box account Every 70 seconds Collector    
CloudDeviceAuditJob_sponline Collects the SharePoint Online site collection audit data Every 70 seconds Collector    
CloudDeviceAuditJob_onedrive Collects the audit records for OneDrive accounts Every 70 seconds Collector    

 

DataInsightWeb Service Jobs

 
Job Description Default Schedule Node Files Processed\Updated Comments
CustodianSummaryReportJob Periodically runs the custodian summary report, which is used to determine the custodians assigned in Data Insight for various resources. The output produced by this report is used in DLP Incident Remediation, Entitlement Review, and Ownership Confirmation workflows Daily MS    
HealthAuditReportJob Generates Health Audit Report 5:00:00 AM - Daily MS Generates latest output at <installdir>\log\health_audit  
PolicyJob_alerts Merges all of the Real Time alerts files generated on the Indexers by RTWBJob and updates the alerts DB Every 10 seconds MS Consumes alerts_<nodeId>_<timestamp>_wbrt.sqlite files and generates alerts_wb_<nodeId>_<timestamp>_rt.sqlite files which are used to update the alerts_master.db The EmailAlertsJob then reads the data from the alerts_master.db and generates the email alerts
PolicyJob Evaluates the policies configured and raises alerts 12:00:00 AM - Daily MS    
PurgeReportsJob Deletes report outputs older than threshold configured 12:00:00 AM - Daily MS    
UpdateConfigJob Updates the configuration DB on the worker nodes based on changes made on the MS Every 1 minute All    
UserIndexJob_merge Consolidates user activity and permission map from all indexers Every 2 hours MS    
UserIndexJob_split Requests each Indexer for user activity and permission map via UserIndexJob_idx 12:00:00 AM - Daily Indexer    
UserRiskMergeJob Combines data from all MSUs into a single risk score value for each user and generates the User Risk report 6:00:00 AM - Daily MS Generates the  <datadir>\conf\userrisk_dashboard.db   

 

DataInsightWorkflow Service Jobs

 
Job Description Default Schedule Node Files Processed\Updated Comments
WFStepExecuterJob Processes actions for Enterprise Vault archiving, requests for permission remediation, and custom actions Every 5 minutes MS, Portal    
WFStepExecuterJob_im Processes Entitlement Review, DLP Incident Remediation, and Ownership Confirmation workflows.  Sends email reminders containing links to the remediation portal to the custodians at a specified interval Every 1 hour MS, Portal    
UpdateConfigJob Updates the workflow schedules based on configuration changes from the MS Every 1 minute All    
WFSpoolStatusJob Monitors workflow data and generates status DB with any new updates Every 5 minutes Portal    
FileTransferJob_WF Transfers workflow status DB's from Portal to MS Every 1 minute Portal    

 

DataInsightWatchdog Service Jobs

 
Job Description Default Schedule Node Files Processed\Updated Comments
SyncPerformanceJob Fetches performance related statistics from all nodes Every 30 minutes MS    
SystemMonitorJob Collects resources (disk usage, CPU, memory usage) from the node Every 1 minute All    
SystemMonitorJob_backlog Gathers statistics for unprocessed backlog files Every 1 hour All    
SystemMonitorJob_monitor_coredump Monitors <installdir>\dumps folder for core dumps from process crashes Every 1 hour All    
UpdateConfigJob Reconfigures watchdog jobs based on configuration updates from MS Every 1 minute All    

 

DataInsight Classification Service Jobs

 

 
Job Description Default Schedule Node Files Processed\Updated Comments
ClassifyFetchJob Processes classification input files and adds to the priority queues and tracks a list of files that could not be fetched Every 1 minute Classification Input: <datadir>\classification\inbox\<PRIORITY>_<CRID>_<BATCHID>_<NODEID>_<MSUID>_<TIMESTAMP>_snap<N>.csqlite files The input file contains the location where the actual file has been kept in the <datadir>\classification\content folder
ClassifyFetchPauseJob Refreshes the pause or resume status of fetch jobs as per the duration configured for content fetching Every 1 minute Classification    
CancelClassifyRequestJob Fetches the list of classification requests that are canceled and distributes this request between Data Insight nodes Every 20 seconds Classification   Before classifying files, all the classification jobs consult this list to identify the requests that are marked for cancellation. If they observe any canceled request in the new request that is submitted for classification, then that request is deleted
ClassifyJob Checks the <datadir>\classification\inbox folder for input files submitted for classification folder and adds them to three separate priority queues. It picks a file from the highest queue in FIFO order, and starts classifying content using Arctera Information Classifier. All files in that input file are submitted for classification. Once all paths in the file have been classified, result of the classification and any resulting errors are written to a database in the <datadir>\classification\outbox folder Every 1 minute Classification    
UpdateVICPolicyMapJob It ensures that Data Insight configuration database is in sync with the Classification Policy Manager Every 10 seconds MS    
UpdateConfigJob Reconfigures classification jobs based on the configuration changes made on the Management Server Every 1 minute Classification    
CreateFeaturesJob Checks if sufficient classified data is available for the supervised learning algorithm to create predictions (training sets) 12:01:00 AM Sunday Indexer   The job has a multi-threaded execution framework which executes actions in parallel. The default thread count is 2. You can set the value using the matrix.classification.sl.features.threads property at global or node level.

NOTE: The node level property always takes precedence over the global level property
PredictJob Copies the prediction files from the temp output directory to a classification outbox 5:00:00 AM Sunday Indexer    
SLCreateBatchesJob Creates batches of files for the consumption of Arctera Information Classifier. These files are classified with high priority Every 2 hours Indexer    
ClassifyManageWorkloadJob Checks the classification or workload folder on primary Classification Server and counts batches based on their priority. If the workload needs to be distributed, the job fetches a list of servers in it its pool and fetches the number of batches based on their priority in the classification or inbox folder. If the number of batches on any slave that have priority less than 10, then the job distributes the batches across that slave and copies them to the inbox on the secondary Classification Server Every 1 minute Classification (Master)    

Reference

How Data Insight works, and the ports that it uses for communication with devices

 

Was this content helpful?