How do Sparse Collections and the SparseCollectionPercentage setting work?

Article: 100023091
Last Published: 2021-09-29
Ratings: 0 0
Product(s): Enterprise Vault

Problem

Note: Sparse Collections will only apply to NTFS and CIFS file systems.

Solution

How does the compaction process function?

Before modifying the SparseCollectionPercentage setting, it is important to understand the way in which Enterprise Vault (EV) deals with the deletion of items from collection (CAB) files.

As archived items are deleted from Enterprise Vault, the Enterprise Vault Collector does not delete the related saveset files from the collection files. Instead, it maintains a count of the number of 'active' saveset files left within the collection file.

By default, if the count of items that still exist in a collection reaches 30% of the total number of items in the collection, then the collector restores the active saveset files in the collection file back to their original location.

This allows the saveset files to be collected into a new collection file if the conditions for collection within the relevant directory are true.  The MinimumFilesInCollection registry value has a default value of 15; therefore, by default, if there are not 15 or more items in the directory a new Collection file will not be created and the old Collection entry and the exported saveset files will remain. If it is absolutely necessary this registry setting can be set to a Minimum value of 1. However, it is not recommended to leave this at ‘1' but to revert it to the default of 15 once the required small number of files have been collected into new Collection files.

Note: In EV 12.x and above, the number of files to collect has been reduced to 1.
 
As soon as the restored saveset files are added to a new collection file, this causes the RefCount for the original collection file to be set to 0 and thus, the original collection file is deleted.

If sharing is enabled for the partition and shared saveset files have been added to a collection, a shared saveset file is not removed from the collection file until the final sharer on the saveset file deletes their copy of the item from their archive or it is expired.

This process remains the same after a partition is closed.

Modifying the SparseCollectionPercentage setting:

The point at which collections are eligible for compaction can be modified by changing the SparseCollectionPercentage value, which is set on a per partition basis in the PartitionEntry table in the EnterpriseVaultDirectory Database.

This is a value from 1-99 which represents the percentage of all files in a collection that still exist (have not expired or been deleted).

By default, when 30% (or fewer) of a collection's files are still active, the collection will be compacted.

Modifying this value can alter this behavior to cause a collection to be compacted earlier (by increasing the value) or later (by decreasing the value).

The following registry key can be modified to control the setting for newly created partitions:

The SparseCollectionPercentageForNewPartitions DWORD is set in the following location: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\KVS\Enterprise Vault\Storage

This DWORD is a decimal value between 1-99 and controls the SparseCollectionPercentage value which is set in the PartitionEntry table when a new partition is created.

Related SQL Queries:

If modifying the SparseCollectionPercentage value is being reviewed, it is important to analyze the impact the change will have on each Vault Store / Partition in the site.

Significantly altering this value may result in a large performance hit on the Enterprise Vault server(s) running the storage service as collections are compacted to comply with the new setting.

Because the collection files are not deleted until the active savesets are recollected, it is necessary to confirm there is enough free disk space on the drive that the collection resides on BEFORE this value is modified.

An approximate value for the amount of free disk space required is generated by the script below.

It is possible to use the following SQL queries to understand the current state of the collections on a per vault store basis, and how many collections would be eligible for compacting by altering the value.

--SQL queries to find out more info about sparse file collections
--Replace <VSDBNAME> with the SQL DB Name - For example: EVVaultStore1
--Replace <VSVACNAME> with the Vault Store Name from the VAC - For Example: Vault Store 1
--Replace <NUMREGKEYPER> with the % value you would set as the SparseCollectionPercentage registry value. - For example 35
--Replace <CABSIZE> with the cab file size specified on the properties of the vault store partition you are investigating


First, it is necessary to obtain a list of all CAB files for the vault store specified and more information about these CAB files.

SELECT
a.PartitionName as 'Partition',
(a.PartitionRootPath+'\'+b.RelativeFileName) AS 'Cab File Name',
b.TotalCount AS 'Original No of Items In Cab',
b.RefCount AS 'Current No of Items in Cab',
    CASE
        WHEN b.TotalCount = 0
        THEN 'N/A'
    ELSE
        LEFT(((CAST(b.RefCount AS DECIMAL(6,2)))/(CAST(b.TotalCount AS DECIMAL(6,2)))*100),5)
    END
    AS 'Percentage Used'
FROM
EnterpriseVaultDirectory.dbo.partitionentry a,
<VSDBNAME>.dbo.collection b,
EnterpriseVaultDirectory.dbo.VaultStoreEntry c
WHERE
c.VaultStoreName = '<VSVACNAME>'
AND
a.VaultStoreEntryId = c.VaultStoreEntryId
AND
a.IdPartition = b.IdPartition


Next, it is necessary to display a list of all CAB files that would be uncollected if the key is set to a % for the vault store selected:

SELECT
a.PartitionName as 'Partition Name',
(a.PartitionRootPath+'\'+b.RelativeFileName) AS 'Cab File Name',
b.TotalCount AS 'Original No of Items In Cab',
b.RefCount AS 'Current No of Items in Cab',
    CASE
        WHEN b.TotalCount = 0
        THEN 'N/A'
    ELSE
        LEFT(((CAST(b.RefCount AS DECIMAL(6,2)))/(CAST(b.TotalCount AS DECIMAL(6,2)))*100),5)
    END
    AS 'Percentage Used'
FROM
EnterpriseVaultDirectory.dbo.partitionentry a,
<VSDBNAME>.dbo.collection b,
EnterpriseVaultDirectory.dbo.VaultStoreEntry c
WHERE
c.VaultStoreName = '<VSVACNAME>'
AND
a.VaultStoreEntryId = c.VaultStoreEntryId
AND
a.IdPartition = b.IdPartition
AND
CAST(LEFT(((CAST(b.RefCount AS DECIMAL(6,2)))/(CAST(b.TotalCount AS DECIMAL(6,2)))*100),5) AS DECIMAL(5,2)) < <NUMREGKEYPER>


The following breaks down, per partition, approximately how much space would be required if all cab files were extracted in the same collection run after altering the SparseCollectionPercentage value:

SELECT
a.PartitionName as 'Partition Name',
COUNT(*) AS 'No of CAB Files affected',
COUNT(*) * <CABSIZE> AS 'Approximate Free Space Required in MB'
FROM
EnterpriseVaultDirectory.dbo.partitionentry a,
<VSDBNAME>.dbo.collection b,
EnterpriseVaultDirectory.dbo.VaultStoreEntry c
WHERE
c.VaultStoreName = '<VSVACNAME>'
AND
a.VaultStoreEntryId = c.VaultStoreEntryId
AND
a.IdPartition = b.IdPartition
AND
    CASE
        WHEN b.TotalCount = 0
        THEN 0
    ELSE
        CAST(left(((CAST(b.RefCount AS DECIMAL(6,2)))/(CAST(b.TotalCount AS DECIMAL(6,2)))*100),5) AS DECIMAL(5,2))
    END
< <NUMREGKEYPER>
GROUP BY
a.PartitionName

Was this content helpful?