Scheduled Data Insight jobs

Each Data Insight service performs several actions on a scheduled basis. These services are called jobs. The section explains the function of the important jobs that run in various services. The schedule for few jobs can be changed from the Advanced Settings tab of the Server details page.

Table: Communication service jobs

Job

Description

ADScanJob

Initiates the adcli process on the Management Server to scan the directory servers. Ensure the following:

  • The directory servers are added to the Data Insight configuration.

  • The credentials specified when adding the directory server have permissions to scan the directory server.

CollectorJob

Initiates the collector process to pre-process raw audit events received from storage devices. The job applies exclude rules and heuristics to generate audit files to be sent to the Indexers. It also generates change-logs that are used for incremental scanning.

ChangeLogJob

The CollectorJob generates changelog files containing list of changed paths, one per device, in the changelog folder. There cab be multiple files with different timestamps for each device. The ChangeLogJob merges all changelog files for a device.

ScannerJob

Initiates the scanner process to scan the shares and site collections added to Data Insight.

Creates the scan database for each share that it scanned in the data\outbox folder.

IScannerJob

Intiates the incremental scan process for shares or site-collections for paths that have changed on those devices since the last scan.

CreateWorkflowDBJob

Runs only on the Management Server. It creates the database containing the data for DLP Incident Management, Entitlement Review, and Ownership Confirmation workflows based on the input provided by users.

DlpSensitiveFilesJob

Retrieves policies and sensitive file information from Data Loss Prevention (DLP).

FileTransferJob

Transfers the files from the data\outbox folder from a node to the inbox folder of the appropriate node.

FileTransferJob_content

Runs every 10 seconds on the Windows File Server.

Routes content file and CSQLite file to the assigned Classification Server.

FileTransferJob_Evt

Sends Data Insight events database from the worker node to the Management Server.

FileTransferJob_WF

Transfers workflow files from Management Server to the Portal service.

FileTransferJob_classify

Runs on all Data Insight nodes once every minute.

It distributes the classification events between Data Insight nodes.

IndexWriterJob

Runs on the Indexer node; initiates the idxwriter process to update the Indexer database with scan (incremental and full), tags, and audit data.

After this process runs, you can view newly added or deleted folders and recent access events on shares on the Management Console.

ActivityIndexJob

Runs on the Indexer node; It updates the activity index every time the index for a share or site collection is updated.

The Activity index is used to speed up the computation of ownership of data.

IndexCheckJob

Verifies the integrity of the index databases on an Indexer node.

PingHeartBeatJob

Sends the heartbeat every minute from the worker node to the Data Insight Management Server.

PingMonitorJob

Runs on the Management Server. It monitors the heartbeat from the worker nodes; sends notifications in case it does not get a heartbeat from the worker node.

SystemMonitorJob

Runs on the worker nodes and on the Management Server. Monitors the CPU, memory, and disk space utilization at a scheduled interval. The process sends notifications to the user when the utilization exceeds a certain threshold value.

DiscoverSharesJob

Discovers shares or site collections on the devices for which you have selected the Automatically discover and monitor shares on this filer check box when configuring the device in Data Insight

ScanPauseResumeJob

Checks the changes to the pause and resume settings on the Data Insight servers, and accordingly pauses or resumes scans.

DataRetentionJob

Enforces the data retention policies, which include archiving old index segments and deleting old segments, indexes for deleted objects, old system events, and old alerts.

IndexVoldbJob

Runs on the Management Server and executes the command voldb.exe --index which consumes the device volume utilization information it receives from various Collector nodes.

SendNodeInfoJob

Sends the node information, such as the operating system, and the Data Insight version running on the node to the Management Server. You can view this information on the Data Insight Server > Overview page of the Management Console.

EmailAlertsJob

Runs on the Management Server and sends email notifications as configured in Data Insight.The email notifications pertain to events happening in the product, for example, a directory scan failure. You can view them on the Settings > System Overview page of the Management Console.

LocalUsersScanJob

Runs on the Collector node that monitors configured file servers and SharePoint servers. In case of a Windows File Server that uses agent to monitor access events, it runs on the node on which the agent is installed.

It scans the local users and groups on the storage devices.

UpdateCustodiansJob

Runs on the Indexer node and updates the custodian information in the Data Insight configuration.

CompactJob

Compresses the attic folder and err folders in <datadir>\collector, <datadir>\scanner, and <datadir>\indexer folders. The process uses the Windows compression feature to set the "compression" attribute for the folders.

The job also deletes stale data that's no longer being used.

Compact_Job_Report

Compresses the folders that store report output.

StatsJob

On the Indexer node, it records index size statistics to lstats.db. The information is used to display the filer statistics on the Data Insight Management Console.

MergeStatsJob

Rolls up (into hourly, daily and weekly periods) the published statistics. On the Collector nodes for Windows Filer Server, the job consolidates statistics from the filer nodes.

StatsJob_Index_Size

Publishes statistics related to the size of the index.

StatsJob_Latency

On the Collector node, it records the filer latency statistics for NetApp filers.

SyncScansJob

Gets current scan status from all Collector nodes. The scan status is displayed on the Settings > Scanning Dashboard > In-progress Scans tab of the Management Console.

SPEnableAuditJob

Enables auditing for site collections (within the web application), which have been added to Data Insight for monitoring.

By default, the job runs every 10 minutes.

SPAuditJob

Collects the audit logs from the SQL Server database for a SharePoint web application and generates SharePoint audit databases in Data Insight.

SPScannerJob

Scans the site collections at the scheduled time and fetch data about the document and picture libraries within a site collection and within the sites in the site collection.

NFSUserMappingJob

Maps every UID in raw audit files for NFS and VxFS with an ID generated for use in Data Insight. Or generates an ID corresponding to each User and Group ID in raw audit files received from NFS/VxFS.

MsuAuditJob

Collects statistics information for all indexers on the Indexer.

MsuMigrationJob

Checks whether a filer migration is in process and carries it out.

ProcessEventsJob

Processes all the Data Insight events received from worker nodes and adds them to the yyyy-mm-dd_events.db file on the Management Server.

ProcessEventsJob_SE

Processes scan error files.

SpoolEventsJob

Spools events on worker nodes to be sent to Management Server.

WFStatusMergeJob

Merges the workflow and action status updates for remediation workflows (DLP Incident Remediation, Entitlement Reviews, Ownership Confirmation), Enterprise Vault archiving, and custom actions and update the master workflow database with the details so that users can monitor the progress of workflows and actions from the Management Console.

UpdateConfigJob

Reconfigures jobs based on the configuration changes made on the Management Server.

DeviceAuditJob

Fetches the audit records from the Hitachi NAS EVS that are configured with Data Insight.

By default, this job runs in every 5 seconds.

HNasEnableAuditJob

Enables the Security Access Control Lists (SACLs) for the shares when a Hitachi NAS filer is added.

By default, this job runs in every 10 minutes.

WorkflowActionExecutionJob

This service reads the request file created on the Management Server when a Records Classification workflow is submitted from the Portal. The request file contains the paths on which an Enterprise Vault action is submitted. When the action on the paths is complete, the job updates the request file with the status of the action.

By default, this job runs in every 1 hour.

UserRiskJob

Runs on each Indexer. The job updates hashes used to compute the user risk score.

By default, the job runs at 2:00 A.M. everyday.

UpdateWFCentralAuditDBJob

Runs only on the Management Server. It is used to update the workflow audit information in <DATA_DIR>/workflow/workflow_audit.db.

By default, this job runs every 1 minute.

TagsConsumerJob

Parses the CSV file containing tags for paths. Imports the attributes into Data Insight and creates a Tags database for each filesystem object.

By default, this job runs once every day.

KeyRotationJob

Run the job on demand to change the encryption keys. It is not an automatically scheduled job.

It is recommended to run this job after the Data Insight servers including Windows File Agent server is upgraded to 5.2.

If you want to run the KeyRotationJob without upgrading all the servers, restart all services on the servers that have not been upgraded after the KeyRotationJob is executed and the configuration database is replicated on these servers.

RiskDossierJob

Runs on each Indexer and computes the number of files accessible and number of sensitive files accessible to each user on each share.

This job runs every day at 11.00 P.M. by default.

ClassifyInputJob

Runs every 10 seconds on the Management Server.

The job processes the classification requests from the Data Insight console and from reports for the consumption of the book keeping database.

ClassifyBatchJob

Runs every minute on the Indexer.

The job splits the classification batch input databases for the scanner's consumption, which are later pushed to the Collector.

ClassifyIndexJob

Runs once every minute on the Indexer node.

Updates the index with classification tags and also updates the status of the book keeping database.

ClassifyMergeStatusJob

Runs once every minute on the Management Server.

The job calls the files with the classification update status that are received from each indexer. These files are automatically created on the indexer whenever updates are available. It also updates the global book keeping database that is used to show high level classification status on the Console.

The following processes run in the Data Insight WatchDog service

Table: WatchDog service jobs

Job

Description

SyncPerformanceStatsJob -

Runs only on the Management server. Fetches performance related statistics from all other servers.

SystemMonitorJob

Gathers statistics like disk usage, CPU, memory usage.

SystemMonitorJob_backlog

Gathers statistics for unprocessed backlog files.

UpdateConfigJob

Reconfigures its own jobs based on configuration updates from the Management Server.

The following processes run in the Data Insight Workflow service

Table: Workflow service jobs

Job

Description

WFStepExecutorJob

Processes actions for Enterprise Vault archiving, requests for permission remediation, and custom actions configured in Data Insight.

WFStepExecutorJob_im

Processes workflows of type Entitlement Reviews, DLP Incident Remediation, and Ownership confirmation. It also sends email reminders containing links to the remediation portal to the custodians at a specified interval.

UpdateConfigJob

Updates its schedules based on the configuration changes made on the Management Server.

WFSpoolStatusJob

Reads the workflow data every minute, and if there are any new updates in last minute, it creates a status database with the new updates.

FileTransferJob_WF

Transfer workflow status databases from the Self-Service portal nodes to the Management Server.

The following processes run in the Data Insight Webserver service.

Table: Webserver service jobs

Job

Description

CustodianSummaryReportJob

Periodically runs the custodian summary report, which is used to determine the custodians assigned in Data Insight for various resources. The output produced by this report is used in DLP Incident Remediation, Entitlement Review, and Ownership Confirmation workflows.

HealthAuditReportJob

Periodically creates a report summarizing health of the entire deployment, and stores it to log/health_audit folder on the Management Server. The report aids Veritas Support to troubleshoot issues on your setup.

PolicyJob

Evaluates configured policies in the system and raises alerts.

PurgeReportsJob

Deletes older report outputs.

UpdateConfigJob

Updates configuration database on the worker nodes based on the configuration changes made on the Management Server.

UserIndexJob_merge

Consolidates user activity and permission map from all indexers.

UserIndexJob_split

Requests each Indexer for user activity and permission map.

UserRiskMergeJob

This job runs on the Management Server. Its default schedule is 6:00 A.M. every day. The job combines data from all MSUs into a single risk score value for each user. This job creates the userrisk_dashboard.db in the DATA_DIR\conf folder.

The following processes run in the Data Insight Classification service.

Table:

Job

Description

ClassifyFetchJob

Runs every minute on the server that is assigned the role of a Classification Server.

It searches the classification/inbox folder for the input files and adds them to the priority queues. One input file can result in multiple snapshots with the name <PRIORITY>_<CRID>_<BATCHID>_<NODEID>_<MSUID>_<TIMESTAMP>_snap<N>.csqlite. The input file contains the location where the actual file has been kept in the classification/content folder. The job also keeps a list of files that could not be fetched.

Note:

Error logs are created in the <Install directory>/log/fetch folder.

ClassifyFetchPauseJob

Runs once every minute on any node that acts as the Classification Server.

Refreshes the pause or resume status of fetch jobs as per the duration configured for content fetching.

CancelClassifyRequestJob

Runs every 20 seconds in Communication Service and Classification Service.

Fetches the list of classification requests that are cancelled and distributes this request between Data Insight nodes.

Before classifying files, all the classification jobs consult this list to identify the requests that are marked for cancellation. If they observe any canceled request in the new request that is submitted for classification, then that request is deleted.

ClassifyJob

Runs once every minute on any node that acts as a Classification Server.

Checks the classification/inbox folder for input files submitted for classification folder and adds them to three separate priority queues. It picks a file from the highest queue in FIFO order, and starts classifying content using Veritas Information Classifier. All files in that input file are submitted for classification. Once all paths in the file have been classified, result of the classification and any resulting errors are written to a database in the classification/outbox folder.

UpdateVICPolicyMapJob

Runs every ten seconds on the Management Server.

It ensures that Data Insight configuration database is in sync with the Classification Policy Manager.

UpdateConfigJob

Reconfigures jobs based on the configuration changes made on the Management Server.

CreateFeaturesJob

Runs once every week on Sunday at 00.01 A.M. on the Indexer.

Checks if sufficient classified data is available for the supervised learning algorithm to create predictions (training sets).

The job has a multi-threaded execution framework which executes actions in parallel. The default thread count is 2. You can set the value using the matrix.classification.sl.features.threads property at global or node level.

Note:

The node level property always takes precedence over the global level property.

PredictJob

Runs once every week on Sunday at 05.00 A.M. on the Indexer.

Copies the prediction files from the temp output directory to a classification outbox.

SLCreateBatchesJob

Runs every 2 hours on the Indexer.

The job creates batches of files for the consumption of Veritas Information Classifier. These files are classified with high priority.

See Monitoring Data Insight jobs.