About scanning and event monitoring

Data Insight scans the file system hierarchy to collect information related to permissions and file system metadata from the monitored storage devices.

Event monitoring is an operation that keeps track of the access events happening on a file system. During event monitoring if Data Insight detects an event such as create, write or file system ACL level permission changes, it uses this information to perform incremental scans for the paths on which events are reported.

Data Insight uses asynchronous APIs, such as FPolicy for NetApp filers, the CEE framework for EMC filers, and filter driver for Windows File Servers to collect access events.

By default, Data Insight initiates event monitoring every 2 hours. You can disable event monitoring for the individual storage devices. To turn off event monitoring, navigate to Settings > Filers. In the edit page for filer, uncheck the option Enable file system event monitoring.

Note:

Data Insight scans only share-level permission changes when event monitoring is turned off.

To fetch file system metadata, Data Insight performs the following types of scans:

Full scan

During a full scan Data Insight scans the entire file system hierarchy. A full scan is typically run after a storage device is first added to the Data Insight configuration. Full scans can run for several hours, depending on the size of the shares. After the first full scan, you can perform full scans less frequently based on your preference. Ordinarily, you need to run a full scan only to scan those paths which might have been modified while file system auditing was not running for any reason.

In case of large shares, a full scan can take long time to complete. To reduce the time it takes to scan certain large shares, you can configure parallel scanning which uses multiple threads to scan the share. By default, the single thread scanner runs on the shares regardless of their size. To use the parallel scanner feature, you must configure it for each share.

Note:

Parallel scanner supports only full scans on CIFS.

To configure parallel scanning for a Collector or a filer from the Data Insight Servers > Advanced Settings tab.

See Configuring advanced settings.

To configure multiple threads to scan a share:

See Add New Share/Edit Share options .

By default, each Collector node initiates a full scan at 7:00 P.M. on the last Friday of each month.

For SharePoint, the default scan schedule is 11:00 P.M. each night.

Figure: Scanner - Single thread and parallel threads

Scanner - Single thread and parallel threads
Incremental scan

During an incremental scan, Data Insight re-scans only those paths of a share that have been modified since the last full scan. It does so by monitoring incoming access events to see which paths had a create, write, or a security event on it since the last scan. Incremental scans are much faster than full scans.

Note:

For Data Insight versions before version 5.0, incremental scans were triggered only when Data Insight detected any events during event monitoring.

Incremental scans are not available for SharePoint web applications and for the cloud-based storage from Box.

By default, an incremental scan is scheduled once every night at 7:00 P.M. You can initiate an on-demand incremental scan manually by using the command line utility scancli.exe. It is recommended to run the IScannerJob before you execute the utility.

See Scheduled Data Insight jobs.

Path re-confirmation scan

After Data Insight completes indexing the full scan data, it computes the paths that no longer seem to be present on the file system. A re-confirmation scan confirms if a path which is present in the indexes, but appears to be no longer present on the file system, is indeed deleted. A re-confirmation scan is automatically triggered, when Data Insight detects potentially missing paths on the file system during a full scan.

You can turn off re-confirmation scan for any Indexer, using the Advanced Setting for that Indexer. When the re-confirmation scan is turned off, Data Insight readily removes the missing paths from the indexes without carrying out a re-confirmation.

See Configuring advanced settings.

At a global level, full scans are scheduled for individual Collectors or Windows File Server agents. The Table: Entities having configurable scan schedules gives you the details of all the entities for which you can schedule a full scan.

Table: Entities having configurable scan schedules

Entity

Scan schedule settings location

Scope

Details

Collector or Windows File Server agents

Settings > Data Insight Servers > Advanced Setting > File System Scanner settings.

Applies to all the storage devices associated with the Collector, for which a schedule is defined.

See Configuring advanced settings.

Filers, web applications, and cloud sources

In case of a filer, Settings > Filers > Add New Filer.

In case of a SharePoint web application, Settings > SharePoint Web Applications > Add SharePoint Web Application.

In case of a cloud storage account, Settings > Cloud Sources > Add New Cloud Source.

Note:

You can also configure scanning at the time of editing filers, web applications, and cloud sources.

Applies to filers, SharePoint web applications, or cloud sources for which schedule is defined.

This setting overrides the scan schedule defined for the Collector associated with the filer, web applications, and cloud sources.

See Adding filers.

See Adding web applications.

See Configuring Box monitoring in Data Insight.

Shares and site collections

Settings > Filers > Monitored Shares > Add New Share.

Settings > SharePoint Web Applications > Monitored Site Collections > Add Site Collection.

Note:

You can also configure scanning at the time of editing shares and site collections.

Applies to the entire share or site collection for which schedule is defined.

Overrides the scan schedules defined for the filer or the web application associated with the share or the site collection.

See Adding shares.

See Adding site collections.

You can override all the full scan schedules and initiate an on-demand full scan for configured shares or site collections. See Managing shares.

Sometimes for maintenance and diagnostic purposes, you may need to disable all the scans. You can disable all scans:

If you disable scanning for any device, you will not be able to view any permissions data for that device. However, you may still see some stale metadata like size, permissions etc., which was collected before the scanning was disabled. If you run a report on the paths for which scanning is disabled, you may get a report with stale data.

You can specify pause schedules for both full and incremental scans to indicate when scanning should not be allowed to run. You can configure a pause schedule from the Settings > Data Insight Servers > Advanced Settings page. See Configuring advanced settings. to know more about configuring a pause schedule.

You can view the details of the current and historical scan status for your entire environment from the scanning dashboard. To access the scanning dashboard, from the Data Insight Management Console, navigate to Settings > Scanning > Overview. See Viewing the scanning overview. to know more about the scanning dashboard.