How to verify the integrity of indexes with the IndexCheck utility when using Enterprise Vault (EV).

Article: 100017351
Last Published: 2015-07-14
Ratings: 0 0
Product(s): Enterprise Vault

Problem

How to verify the integrity of indexes with the IndexCheck utility when using Enterprise Vault (EV).

Solution

Since Enterprise Vault (EV) version 6.0 Service Pack 1 (SP1), a utility has been included which can help verify the integrity of the 32-bit (Alta Vista) Indexes used by EV.

The tool is called INDEXCHECK.EXE and is located in the EV installation folder (for 32-bit OS, default folder is C:\Program Files\Enterprise Vault, for 64-bit OS, default folder is C:\Program Files (x86)\Enterprise Vault)

This utility can be used to verify a single specified index or all the indexes within the specified index location.  The utility has a number of checking modes, the most useful being the ' exist ' check.  This ensures index integrity from the file level, i.e. all the files that should exist.  This should be enough to detect most types of index corruptions.

The utility must be run from a Command Prompt as follows:

INDEXCHECK -c exist -f <index location> -ignorewarnings > d:\evlogs\indexcheck.log

use INDEXCHECK ? to get more information on parameters you can use.

If an index location is specified, rather than a particular index (i.e., E:\Indexes), this means that all the indexes within that index location will be checked.  Typically this check takes about 2 to 3 seconds per index.  Since it is simply a file level check, it does not open the index and search it in any way.

If any argument for IndexCheck other than " exist " is used, each index volume that is specified or in the index location would be opened while the utility is running.  When this happens, the need exists to ensure that no other processes are attempting to access the index volume that is open.  An alternative is running the tool against copies of indexes and not the 'live' versions.

By default, the output of IndexCheck is displayed on the screen.  There are occasions where writing the output to a file is needed for external review.  The above example shows the argument to use to create such an output file.  When outputting the information to a file the ' -ignorewarnings ' parameter is needed so that no additional prompts are displayed.

Below is an example output from one index.


C:\Program Files\Enterprise Vault>indexcheck -c exist -f D:\EVMBX1\MBX_INDEX\LOCATION2\12BB36C7C2F1A164087C7167e7638d4011110000evlab -ignorewarnings
Start time: 13/12/2005 06:10:28
Running with ignorewarnings set
Processing index folder D:\EVMBX1\MBX_INDEX\LOCATION2\12BB36C7C2F1A164087C7167e7638d4011110000evlab
Checking file existence...
Completed OK. 95 lines
Finished. Checked 1 index(es), 0 with errors, 0 with warnings
End time: 13/12/2005 06:10:28
Duration: 0 minute(s) 0 second(s)

 

Here is example batch file to run IndexCheck:

@ECHO OFF
 
IndexCheck.exe -f d:\evdata\indexlocation1\ -ignorewarnings > d:\evlogs\indexcheck.log  
IF ERRORLEVEL 3 GOTO label3
IF ERRORLEVEL 2 GOTO label2
 
IF ERRORLEVEL 1 GOTO label1
 
IF ERRORLEVEL 0 GOTO label4 
 
GOTO End  
 
:label1 
 
ECHO indexcheck has returned warnings 
 
GOTO End  
 
:label2 
 
ECHO indexcheck has returned errors 
 
GOTO End  
 
:label3 
 
ECHO indexcheck has returned errors and warnings 
 
GOTO End  
 
:label4 
 
ECHO indexcheck has not found any problems 
 
GOTO End  
 
:End

Using a batch file like this on a nightly basis to check all index volumes, or against targeted index volumes, could be used as a "Best Practice" to  monitor index volume health.

Since EV 7.0 SP1, there have been some major enhancements added to this utility which are listed in detail below:
 

1. Generating and validating index file checksums 
 

The Indexing service can now generate a checksum for an index volume, and use the checksum to perform index validation before opening the index volume. The checksum is stored in a file named Checksum.dat in the index volume folder.  
 

The following new options can be used by the IndexCheck utility to validate index volumes: 

  • -c ChecksumCreate - This option generates or updates checksums for index volumes.
  • -c ChecksumValidate - This option validates index volumes against existing checksums.

For each of the options, the path to the target index folder must be specified using the -f parameter. If specifying the index location that is displayed in the (Enterprise Vault) Administration Console, on the Index Locations page of the Indexing service properties, then IndexCheck will attempt to create or validate the checksum for each of the index volumes in the index location. If specifying the path for a particular index volume, then IndexCheck will attempt to create or validate the checksum for that index volume only.
 

To generate or update a checksum for an index volume, use the ChecksumCreate option as noted above.  It is advisable to stop the Indexing service before using this option:
 
  IndexCheck.exe -f <index_folder_path> -c ChecksumCreate 
 
To validate index files using the checksum in an existing Checksum.dat file, use the ChecksumValidate option:
 
  IndexCheck.exe -f <index_folder_path> -c ChecksumValidate 
 
The Checksum.dat file must exist in the index volume folder for the validation argument.  If it does not exist, no validation of the index files is performed.  If the path has a space, the entire path must be enclosed in quotes.
 
2. Logging items missing from indexes  
 
The Indexing service can now log details about archived items that were not indexed, and archived items that were only partially indexed because some content was missing or could not be converted into HTML.  The information is written to the IndexMissing.log file in the index volume folder.
 
By default, the Indexing service will log items (savesets) that are missing from an index.  The required level of logging can be configured for the Indexing service, or logging can be disabled, using the LogMissingItems registry setting.   For more information about this registry settings, see article HOWTO32402 in the Related Articles section below.
 
The IndexCheck utility can also be used to generate the IndexMissing.log file or report on the contents of the file. The LogMissingItems registry setting is not required in order to use IndexCheck to generate missing item logging. To provide the functionality, the following options have been added to the utility:
 
  • -c MissingDocs - Write the list of items (savesets) missing from the index to IndexMissing.log.
  • -c MissingContent - Write the list of items with missing content to IndexMissing.log.
  • -c MissingItemsLogFile - Report on the contents of IndexMissing.log, if it exists.
To search for and report on items that are missing from the index, the command format is:
 
 IndexCheck -c MissingDocs  -f <index_folder_path> -db <Directory_database_server>
 
To search for and report on items with missing content, the command format is:
 
 IndexCheck -c MissingContent -f <index_folder_path> -db <Directory_database_server>
 
To report on the contents of the missing items log file in the index volume folder, the command format is:
 
 IndexCheck -c MissingItemsLogFile  -f <index_folder_path> -db <Directory_database_server>
 
where
 
  • -f <index_folder_path> is the path to the index folder.  If specifying the index location that is displayed in the Administration Console, on the Index Locations page of the Indexing service properties, then IndexCheck will process each index volume in the index location.  If specifying the path for a particular index volume, then IndexCheck will process that index volume only.  If there are spaces in the path, the path will have to be enclosed in quotes.
  • -db <Directory_database_server> is the name of SQL server that manages the EnterpriseVaultDirectory database.
The MissingDocs and MissingContent options will update the IndexMissing.log file or generate it, if it does not exist.
 
When generated by IndexCheck, the format of the contents of the IndexMissing.log file is slightly different from that generated by the Indexing service.  Items are grouped under missing items or missing content. This difference is because IndexCheck performs the checks sequentially.
 
3. When using the Item Granularity index schema
 
There are two index schemas available in EV:
  • The default schema (SchemaType 0)
  • The Item Granularity schema (SchemaType 1)
If you are using the Item Granularity schema, then the registry setting, SchemaType, will have a value of "1". This setting, if configured, is in the following location:
 
HKEY_LOCAL_MACHINE
\SOFTWARE
 \KVS
  \Enterprise Vault
   \Indexing 
 
The default value is "0" for EV 6 through 2007 SP6.  Starting with EV 8.0, the default value is "1".
With SchemaType 0, if an item has multiple items (top-level or attachment or both) with missing content, then the item and the reason for each missing content are logged together on one line.
 
With SchemaType 1, the same information is presented if the Indexing service is used to produce the log. However, if IndexCheck is used to produce the log, then "0" (zero) is shown instead of the reasons for missing content. This indicates that there is missing content, but the reason is not known.
 
4. Verifying the number of indexed items in index volumes
 
Options have been added to the IndexCheck tool to compare the number of items in an archive with the number of items indexed for that archive. This is done by performing the following comparisons: 
  • Compare the highest Index Sequence Number in the vault store database with the highest Item Sequence Number in the index volume.
  • Compare the number of top-level items reported in the vault store database with the top-level Item Count in the index volume.
IndexCheck reports the statistics and gives a warning if the difference exceeds the configured tolerance.
 
The following IndexCheck options are used to perform this validation:
 
  • -c stats For the specified index volumes, compare the information in the index volume with the information in the vault store database.
  • -f <index_folder_path> The index volumes to validate. If specifying the index location that is displayed in the Administration Console, on the Index Locations page of the Indexing service properties, then IndexCheck validates all index volumes in that location.  If specifying the path for a particular index volume, then IndexCheck validates only that index volume.
  • -db <Directory_database_server> The name of SQL server that manages the EnterpriseVaultDirectory database. This is needed to ascertain the required vault store databases.
  • -diff <integer> The permitted tolerance. A warning is reported for an index volume if the difference between the index volume and database information is greater than <integer>. The tolerance specified will depend on individual company requirements. Note that the number of items in an open index can fluctuate greatly, depending on the amount of activity on the index. On a very busy system there may be a delay before the vault store database information is updated. The default value is 1.
  • -csv <file_name> The results of the validation can be written to a csv file. If used, specify a file name and path, or just a file name.  If no path is given, the file is created in the folder from which IndexCheck is run.  This option can only be used with the -c stats argument.
A warning is reported if either of the following is true:
  • The difference between the highest Index Sequence Number in the vault store database and the highest Item Sequence Number in the index volume is greater than the number specified in the -diff parameter.
  • The difference between the number of top-level items reported in the vault store database and the top-level Item Count in the index volume is greater than the number specified in the -diff parameter.
The following example shows an IndexCheck command line and the report information returned:
 
IndexCheck.exe -c stats -f C:\Program Files\Enterprise Vault\Indexing\1773A46CFC34... -db SQLserver2 -diff 20
-csv C:\IndexCheck\ValidationFeb2007.csv
Processing index folder C:\Program Files\Enterprise Vault\Indexing\1773A46CFC34...
Counting the number of top level documents...
Stats from the index
  Highest Index Sequence Number : 1002
  Top Level Document Count : 1002
Stats from the database [Index Volume Identity = 14]
  Highest Index Sequence Number : 970
  Top Level Document Count : 970
WARNING! Mismatch on top level document count
Finished. Checked 1 index(es), 0 with errors, 0 with warnings
End time: 12/02/2007 13:01:18
Duration: 0 minute(s) 3 second(s) 
 
The -csv parameter ensures that the statistics are also written to the file C:\IndexCheck\ValidationFeb2007.csv used in the example above.
 
The Indexcheck utility is detailed at length in the Utilities document which is located in the Documentation folder in the EV installation folder.





 

 


Was this content helpful?