Description
This is a living document that answers frequently asked questions about the Deduplication Engine (Content Router Service). Read this article if you have ever wondered why:
- MSDP is always busy, even with no jobs running
- TLog processing takes days to complete
- DiskVolume is down
- You are getting Status Code 83, 84, or 2074
What is Deduplication?
Deduplication is a process that significantly reduces the amount of storage space that is required for backup images. This is done by retaining only one unique instance of backup data on backend storage media. Redundant data is replaced with a pointer to the unique data copy. Deduplication occurs on both a file level and a file segment level. When two or more files are identical, deduplication stores only one copy of the file. When two or more files share identical content, deduplication breaks the files into segments and stores only one copy of each unique file segment.
The identification and tagging of unique segments of data is done by a process called fingerprinting. The unique segments of data are obfuscated and stored on backend storage media. The generated fingerprints (FPs) are managed in a relational database system on one or more hosts on which the Deduplication Engine services may be active. The Deduplication Engine is also commonly referred to as the Content Router Service. The host running the Content Router Service is known as a Content Router (CR).
Any activity that modifies the backend storage media of a Deduplication Pool generates transaction logs (tlogs). For example, movement of content, such as adding, deleting or relocating segments will generate tlogs. These tlogs are periodically sorted and are then either committed to the relational database or leveraged to remove any extraneous data from the backend.
The relational database is called the Content Router DataBase (CRDB). The proces of batching the tlogs and committing them to the CRDB is called Content Router Queue Processing (CRQP).
What Does the Content Router Do?
The CR manages data movement tasks by using spooler daemon (spoold). The most common tasks are the following:
- Backups (PUT operation)
- Verify and restores (GET operation)
- Expiration and removals (DEREFerence operation)
Two other internal housekeeping tasks, Compaction and Rebasing, also move data on the backend. All of the tasks result in tlogs being generated. Debug activity regarding these tasks and CR startup and shutdown messages are logged to the spoold.log files.
spoold caches the FPs it manages into process memory. The FP cache is populated at startup by reading the contents of the CRDB relational database and any unprocessed tlogs (more on tlogs later). As data arrives and leaves the system, the FP activity is recorded to tlogs and synchronized to spoold's cache memory. The tlog contents are sorted and processed by default twice daily at noon and midnight. This operation is referred to as CRQP. Activity related to CRQP is captured in storaged.log files.
The spoold.log and storaged.log files are located in the following locations, according to the deduplication technology in use.
NetBackup PureDisk/50x0 Appliances
UNIX: /Storage/log/spoold/
NetBackup MSDP
UNIX: /<StoragePath>/log/spoold/
Windows: X:\<StoragePath>\log\spoold\
NetBackup 52x0 Appliances
UNIX: /disk/log/spoold/
Note: The first-level directory structure immediately under the /Storage in PD, <StoragePath> in MSDP and /disk on 52x0 appliances are analogous. For the rest of the discussion, this path will be referred to <StoragePath> folder.
What is a transaction log? (files with the tlog extension)
On a Content Router, you can view TLOGs from the following three folder locations under <StoragePath>:
<StoragePath>/spool | tlogs are created in this folder when spoold moves content on the system |
<StoragePath>/queue | Folder in which tlogs are stored after the individual tlog files are closed |
<StoragePath>/processed | Folder to which tlogs are migrated while CRQP is in progress |
Example: A sampling of the tlogs from a NetBackup 5220 Appliance system
Raw tlogs before they are processed:
/disk/queue/1882.tlog
/disk/queue/1883.tlog
...
/disk/queue/7673.tlog
/disk/queue/7674.tlog
When CRQP commences, information in the tlogs is sorted before being processed. The sorting of the transaction logs creates two types of files:
/disk/queue/sorted-1882-7674.tlog | Information meant to be updated into the CRDB |
/disk/queue/sorted-1882-7674.delayed | Information about segments no longer needed by the CR |
In newer versions of the Deduplication Engine, the sorting and processing of tlogs is a multi-threaded process which results in more than one sorted tlog.
/disk/queue/partsorted-XXX-YYY-0.tlog
/disk/queue/partsorted-XXX-YYY-1.tlog
/disk/queue/partsorted-XXX-YYY-2.tlog
/disk/queue/partsorted-XXX-YYY-3.tlog
After the sorted tlog files are created, the raw tlogs are moved temporarily to the /<StoragePath>/processed folder for fail-safe purposes until processing is complete.
/disk/processed/1882.tlog
/disk/processed/1883.tlog
...
/disk/processed/7673.tlog
/disk/processed/7674.tlog
What is CRQP?
As shown in the example above, CRQP results in the creation of two sets of files: sorted *.tlog file(s) and sorted *.delayed file. The sorted *.tlog file(s) are processed first (multi-threaded in PureDisk 6.6.4/NetBackup 50x0 Appliance 1.4.3.1 and later revisions). The sorted *.delayed file is processed thereafter (this portion is not multi-threaded at the current time).
Example: A storaged.log file from a CRQP operation for a NetBackup 5220 Appliance that has been configured for MSDP
The start of a CRQP operation can be identified by a line similar to this:
January 26 12:20:02 INFO [1082194240]: Queue processing triggered by external request.
The tlog contents are sorted, and, after processing is under way, storaged logs a message to the storaged.log file once every 100,000 transactions or every 5 minutes, whichever comes first.
January 26 12:20:16 INFO [1082194240]: Starting to partition tlog file range 0000000 - 0000000.
January 26 12:24:50 INFO [1082194240]: Starting to sort tlog partition bucketes with 4 main threads and 2 assisting threads.
...
January 26 12:29:24 INFO [1082194240]: Finished to sort tlog partition bucketes in 274 seconds.
...
January 26 12:29:26 INFO [1082194240]: Preparing to sequentially process transaction logs from /disk/queue/partsorted-0000000-0000000-0.tlog to /disk/queue/partsorted-0000000-0000000-0.tlog
...
January 26 12:29:26 INFO [1082194240]: pqFileOpenPartition: Open CRDB dump partition 0 file(/disk/databases/crdbdump/objects.0) for reading.
January 26 12:29:26 INFO [1082194240]: Synchronization for transaction log /disk/queue/partsorted-0000000-0000000-0.tlog to /disk/queue/partsorted-0000000-0000000-0.tlog started, 127684973 transactions pending.
January 26 12:29:28 INFO [1082194240]: Transaction log 0000000-0000000: 100000 of 127684973 entries processed.
January 26 12:29:31 INFO [1082194240]: Transaction log 0000000-0000000: 200000 of 127684973 entries processed.
...
January 26 12:45:33 INFO [1082194240]: Transaction log 0000000-0000000: 33300000 of 127684973 entries processed.
January 26 12:45:34 INFO [1082194240]: pqFileOpenPartition: Open CRDB dump partition 1 file(/disk/databases/crdbdump/objects.1) for reading.
January 26 12:45:47 INFO [1081137472]: ImportThreadMain: Begin to do importing and indexing for CRDB partition 0.
January 26 12:45:47 INFO [1081137472]: ImportThreadMain: There are 173289346 records in file /disk/databases/crdbdump/objects2.0.
January 26 12:45:49 INFO [1082194240]: Transaction log 0000000-0000000: 33400000 of 127684973 entries processed.
...
January 26 13:11:20 INFO [1082194240]: Transaction log 0000000-0000000: 96500000 of 127684973 entries processed.
January 26 13:11:22 INFO [1082194240]: pqFileOpenPartition: Open CRDB dump partition 3 file(/disk/databases/crdbdump/objects.3) for reading.
January 26 13:11:22 INFO [1119181120]: ImportThreadMain: Begin to do importing and indexing for CRDB partition 2.
January 26 13:11:22 INFO [1119181120]: ImportThreadMain: There are 173296836 records in file /disk/databases/crdbdump/objects2.2.
January 26 13:11:22 INFO [1082194240]: Transaction log 0000000-0000000: 96600000 of 127684973 entries processed.
...
January 26 13:21:48 INFO [1082194240]: TlogProcessLog: Copying remaining records from objects to file objects2
January 26 13:21:48 INFO [1082194240]: WSRequestExt: submitting &request=5&login=agent_3_408416&passwd=********************************&action=setCRDelay&id=1&delay=3692
January 26 13:21:48 INFO [1082194240]: Number of data store commits: 10155488
January 26 13:21:48 INFO [1082194240]: Garbage found and collected in this round: 1 SO, 1536 DO.
January 26 13:21:48 INFO [1082194240]: Importing records from dump file to database table
January 26 13:21:48 INFO [1118652736]: ImportThreadMain: Begin to do importing and indexing for CRDB partition 3.
January 26 13:21:48 INFO [1118652736]: ImportThreadMain: There are 173298974 records in file /disk/databases/crdbdump/objects2.3.
January 26 14:04:04 INFO [1081137472]: Time required to import to table objects2_0: 4696.415948
January 26 14:16:21 INFO [1080609088]: Time required to import to table objects2_1: 4798.155214
January 26 14:26:55 INFO [1081137472]: Time required to build index on objects2_0 table: 1370.664940
January 26 14:26:55 INFO [1081137472]: ImportThreadMain: Import 173289346 records into CRDB table objects2.
January 26 14:30:00 INFO [1119181120]: Time required to import to table objects2_2: 4717.209217
January 26 14:36:50 INFO [1080609088]: Time required to build index on objects2_1 table: 1229.337399
January 26 14:36:50 INFO [1080609088]: ImportThreadMain: Import 173312140 records into CRDB table objects2.
January 26 14:38:03 INFO [1118652736]: Time required to import to table objects2_3: 4574.918170
January 26 14:47:51 INFO [1119181120]: Time required to build index on objects2_2 table: 1071.433587
January 26 14:47:51 INFO [1119181120]: ImportThreadMain: Import 173296836 records into CRDB table objects2.
January 26 14:55:01 INFO [1118652736]: Time required to build index on objects2_3 table: 1017.473246
January 26 14:55:01 INFO [1118652736]: ImportThreadMain: Import 173298974 records into CRDB table objects2.
January 26 14:55:01 INFO [1082194240]: Time required to build index on objects2 table: 0.088262
January 26 14:55:02 INFO [1082194240]: Time required to drop objects table: 0.486934
January 26 14:55:02 INFO [1082194240]: Time required to rename objects2 table to objects: 0.150074
...
An entry similar to the following indicates that the first phase of CRQP has completed. This step also involves indexing the large tables of the CRDB.
January 26 14:58:05 INFO [1082194240]: Transaction log 0000000-0000000 Completed. Expect: 127684973 (7789.38MB) Commit: 127684973 (27194.16MB) Retry: 0 Log: from /disk/queue/partsorted-0000000-0000000-0.tlog to /disk/queue/partsorted-0000000-0000000-0.tlog SO: Add 1653266, Ref Add 77786607, Ref Add Fail: 0, Ref Del 46308611 DO: Add 935, Ref Add 1571, Ref Add Fail: 0, Ref Del 2912 TASK: Add 339, End 353, End All 0, Del 3637 DCID: SO 1926742, SO Fail 0, DO 0, DO Fail 0 MARKER: 0, Fail 0
In this example, approximately 2.5 hours was spent updating the CRDB. The next step involves marking the segments that need to be removed from the backend data volume:
January 26 15:01:06 INFO [1082194240]: Start processing delayed operations of '/disk/queue/sorted-0000000-0000000.delayed'.
January 26 15:01:25 INFO [1082194240]: Delayed file: 100000 entries processed.
...
<logs snipped>
...
January 26 15:06:30 INFO [1082194240]: Delayed file: 4700000 entries processed.
January 26 15:06:31 INFO [1082194240]: Number of data store commits: 10164119
January 26 15:06:34 INFO [1082194240]: Number of data store commits: 10164120
January 26 15:06:36 INFO [1082194240]: Completed processing of 4777411 delayed operations of '/disk/queue/sorted-0000000-0000000.delayed'.
January 26 15:06:36 INFO [1082194240]: Number of data store commits: 10164132
After both sub processes of CRQP are completed, the crcontrol --queueinfo command output shows an updated timestamp closer to the CRQP start time (in this case Jan 26 12:20). The crcontrol tool is a command line interface to control the Content Router service.