Problem
Enterprise Vault (EV) experiences performance issues and restarting the SQL server or restarting the SQL services does improve performance but only for a short time. Various Event Log errors indicating deadlocks, transient errors or System.OutOfMemoryException errors may be seen.
Error Message
Examples of errors that may be seen in the server Event Logs.
Log Name: Veritas Enterprise Vault
Source: Enterprise Vault
Event ID: 41619
Task Category: None
Level: Information
Description:
A transient error is preventing the execution of a SQL command. Enterprise Vault will try to run the command again.
Attempt: 1 of 40.
Last Error: Transaction (Process ID 233) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
v-437-41619
Cause
The active SQL processes can be viewed by executing the following query on the SQL server servicing the Customer database:
USE master;
GO
sp_who2;
The output may show one or more stored procedures associated with the Export process are blocking other stored procedures from executing. The wait type for these blocking stored procedures shows as SLEEP_TASK.
The following explanation involves advanced SQL concepts.
SQL SLEEP_TASK waits are typically caused by a task waiting on SQL to complete a process or perform a task to allow the current task to complete. In the majority of occurrences, SLEEP_TASK waits will self clear. In the event they do not self clear, there are 2 prime culprits: TempDB issues or stale statistics.
TempDB
TempDB issues can manifest themselves by not having tempdb configured per Microsoft's recommendations (see see https://docs.microsoft.com/en-us/sql/relational-databases/databases/tempdb-database?view=sql-server-ver15#optimizing-tempdb-performance-in-sql-server) or by the TempDB database and transaction log file locations being scanned by antivirus (see https://www.veritas.com/support/en_US/article.100007613).
Contention latency may be seen when a large number of processes occur simultaneously, causing multiple concurrent latches to be queued, resulting in latency, and is usually recognised by the presence of PAGELATCH locks. Latches are locks on three types of special pages: Global Allocation Map (GAM), Shared Global Allocation Map (SGAM), and Page Free Space(PFS). A full description of these is out of the scope of this writing. At least one of each of these pages exists in all database files and are responsible for helping to identify where incoming data can be written to in the physical file (s). Updating these pages requires a latch lock, which is a short-duration lock used during the upgrade. The most common latch locks are update (UP) and exclusive (EX).
Stale Statistics
Lack of SQL Maintenance can cause the statistics to become stale. Severely stale statistics can cause cached execution plans to become outdated. The actual execution plans then rely more on parameter sniffing which, if excessive, can lead to hash spills whereby the data being processed is 'spilled' into TempDB if there is inadequate memory to execute the SQL query being processed. This may result in a large amount of space being consumed by TempDB resulting in slow performance.
Viewing hash spills depends on how the query for the process in question is run. There are typically 2 types: queries run in SQL Management Studio or stored procedures run by the application.
For queries run within the SQL Management Studio:
1. Prepare the SQL query to be executed in a new Query window.
2. Go to the Query menu and click on 'Include Actual Execution Plan'.
3. Go to the Query menu and click on 'Display Estimated Execution Plan'. This will execute the query.
4. Review the Execution Plan tab for the plan.
For stored procedures, run a SQL Profiler Trace or an Extended Events Session. To run a SQL Profiler:
1. Start SQL Profiler from Start | Microsoft SQL Server 20XX | Performance Tools | SQL Server Profiler | File | New Trace.
2. Verify Server Type is Database Engine and authentication type is Windows Authentication | click Connect.
3. Enter a trace name.
4. Select 'Blank' under 'Use the template:'.
5. Click the 'Save to file:' option and configure as follows:
5.1. Select a location and a filename (be sure to add the .trc extension to the filename). As the log file can be quite large, please verify the target drive has adequate available free space.
5.2. Select 'Save as type:' of 'SQL Server Profiler trace files (*.trc)'.
5.3. Click Save.
5.4 Set the 'Set maximum file size (MB):' value to 250.
5.5. Leave the 'Enable file rollover' option selected.
6. Click on the 'Events Selection' Tab. Select 'Show all events' and 'Show all columns'. There should not be any events selected - if selected, deselect all existing events in the left column.
7. Select Errors and Warnings: Hash Warning.
8. Click on the Run button to start tracing and then reproduce the issue in order to capture information in the Trace Log.
9. After the issue has been reproduced successfully, click on the Stop Selected Trace button, close the Trace Log window and close SQL Server Profiler.
To run an Extended Events Session:
1. Go to Management | Extended Events.
2. Right-click on Sessions | New Session Wizard.
3. Introduction page: Click Next.
4. Set Session Properties page: Enter a name for the session, then click Next.
5. Choose Template page: Do not use a template, then click Next.
6. Select Events To Capture page: Verify the 'Event library' is set to search in 'Event names only', search for and select all entries for 'hash', search for and select all entries for 'spill', then click Next.
7. Capture Global Fields page: Select the following, then click Next.
client_app_name
collect_cpu_cycle_time
database_name
event_sequence
query_hash
query_hash_signed
query_plan_hash
query_plan_hash_signed
request_id
session_id
sql_text
task_time
transaction_id
transaction_sequence
8. Set Session Event Filters page: Click Next.
9. Specify Session Data Storage page: Select 'Save data to a file for later analysis (event_file_target), enter the file name and location, leave the 'Maximum file size' and 'Enable file rollover' at the default values, set the 'Maximum number of files' to 10, then click Next.
10. Summary page: Review the Session settings and click Finish is correct to run the Session. Can also click on 'Script' to save the Session script prior to clicking on Finish.
Review of the SQL Profiler Trace/Extended Events Session should show hash spills as a yellow warning on SORT operators. Hovering the cursor over the symbol should display more information.
Solution
TempDB
The solution is to add more TempDB data files, per article https://support.microsoft.com/en-us/help/2154845/recommendations-to-reduce-allocation-contention-in-sql-server-tempdb-d.
The recommendations are:
1. Start with a number of files equal to half the number of processors but not to exceed eight (8) total data files (1 MDF + 7 NDF).
2. All data files are exactly the same size with the same autogrowth to accommodate SQL's proportional fill algorithm (if one file is larger, SQL will use that file more often, causing it to grow more often, causing it to be chosen more often again).
3. Per Microsoft referenced article - "Having the different data files on separate disks would be provide additional performance benefit. However, this is not required. The files can co-exist on the same disk volume."
Additionally, verify the TempDB locations are excluded from antivirus scanning, as listed in https://www.veritas.com/support/en_US/article.100007613.
Stale Statistics
It is recommended to run SQL Maintenance regularly in order to maintain the health of the table statistics. In some busier environments it may be necessary to run update statistics on the highly used tables several times per day. The operation can be performed with EV services online.
It is best to perform SQL Maintenance, per article https://www.veritas.com/support/en_US/article.100022023.