Analyzing SQL Performance using Performance Monitor Counters

Analyzing SQL Performance using Performance Monitor Counters

Article: 100000726
Last Published: 2019-11-14
Ratings: 2 1
Product(s): Enterprise Vault

Problem

Analyzing SQL Performance using Performance Monitor Counters

Solution

Below is a list and an explanation of Performance Monitor (PerfMon) counters, plus recommended solutions, in five areas that should be monitored for SQL performance health.

Evaluating SQL Memory Usage (memory bottlenecks)

Counters:
 
  • Memory:  Pages/sec  (avg 20)
  • Memory:  Page Faults /sec
  • Memory:  Cache Faults /sec
  • Memory:  Page Reads /sec
  • Memory:  Page Input /sec
  • Memory:  Available Bytes  (above 5MB)
  • Paging File: % Usage (should remain low)
  • Physical Disk: Disk Reads/sec
  • Physical Disk: Disk Writes/sec
  • Process: Page File Bytes (to track a particular process)
  • SQL Server:  Buffer Manager: Buffer Cache Hit Ratio (must be above 90 percent)
 
Counters Explained:
 
  • Memory: Pages/sec – measures the number of pages per second that are paged out of RAM to Virtual Memory (HDD)or ‘hard faults’ OR the reading of memory-mapping for cached memory or ‘soft faults’ (systems with a lot of memory).  Average of 20 or under is normal.

    -  This counter needs to be used in conjunction with Paging File: %Usage.  If the Pages/sec are high, but the % Usage is low, these are soft faults (cached memory).  If both are high, these are hard faults (working set replacement).
    -  To look for excessive paging, use Task Manager and add the column: Page Faults and PF Delta for current activity.
     
  • Memory:  Page Faults /sec - measures the working set.  (high count indicates working set is too large for memory to handle - could be a mix of hard and soft faults).  A page fault occurs when a program requests an address on a page that is not in the current set of memory resident pages.  What happens when a page fault occurs is that the thread that experienced the page fault is put into a Wait state while the operating system finds the specific page on disk and restores it to physical memory.
       When a thread attempts to reference a nonresident memory page, a hardware interrupt occurs that halts the executing program.  The instruction that referenced the page fails and generates an addressing exception that generates an interrupt.  There is an Interrupt Service Routine that gains control at this point and determines that the address is valid, but that the page is not resident.  The OS then locates a copy of the desired page on the page file, and copies the page from disk into a free page in RAM.  Once the copy has completed successfully, the OS allows the program thread to continue on.  The performance of applications will suffer when there is insufficient RAM and excessive hard page faults occur.  It is imperative that hard page faults are resolved in a timely fashion so that the process of resolving the fault does not unnecessarily delay the program’s execution.
     
  • Memory:  Cache Faults /sec - measures the File System Cache.  The system file cache maps open files into a portion of the system virtual address range and uses the process working set memory management mechanisms to keep the most active portions of current files resident in physical memory.  Cache faults are a type of page fault that occur when a program references a section of an open file that is not currently resident in physical memory.
     
  • Memory:  Page Reads /sec - tracks the hard and soft (memory mapping) page faults (Sustained values higher than 5 indicate a closer look at Physical Disk: Avg Read and Write Queue Length for hard page faults)
     
  • Memory:  Page Input /sec - tracks the hard page faults  (should not exceed 15)
    If you have a high rate of page faults combined with a high rate of page reads (which also show up in the Disk counters) then you may have an issue where you have insufficient RAM given the high rate of hard faults.  If Page Reads / sec stay high during heavy operations and the Physical Disk Avg Disk Queue Length remain high, RAM is the problem area.
     
  • Memory: Available Bytes – measures physical memory availability, SQL 2005 (unless manually set with limits) will use all but 10MB of available space.
    (add the /3gb switch to the boot.ini  to make the SQL kernel use only 1gb of memory instead of the default 2gb).  This figure should be above 25% of installed memory.  Note this value is dynamic and only shows last checked value, not the average.
     
  • Paging File: % Usage - This counter shows the amount of thrashing to the hard drive to service virtual memory requests.
     
  • Physical Disk: Disk Reads/sec and Writes/sec – measures disk access.  These vary widely and should be used in conjunction with the other counters.  Example: wide SCSCI drives can handle between 50 -70 I/O's per second.  A baseline should be created to do comparisons for these counters.
     
  • Process: Page File Bytes:  Page File Bytes is the current number of bytes that the chosen process has used in the paging file(s). The lack of space in paging files can prevent other processes from allocating memory.
     
  • SQL Server:  Buffer Manager: Buffer Cache Hit Ratio – measures now much the buffer is used instead of the hard disk to get data.
    (typically at 99 %, but must be near or above 90%, if not: add more memory.  The one caveat is if the SQL server is running OLTP programs, which will cause more disk reads)
 
Recommended Solutions when the memory counters are not in optimum range:
  1. Add more memory
     
  2. Have SQL server the only application running on the computer
     
  3. Move active databases to a dedicated SQL server
     
  4. Increase the maximum memory per instance (if Available Bytes is above 10MB)
     
  5. Decrease the maximum memory per instance (if Available Bytes is below 5MB)
     


Evaluating Disk Usage (HDD bottlenecks)

Counters:
 
  • Physical Disk: Avg. Disk Queue Length  (Should not be higher then the number of spindles plus 2)
  • Physical Disk: Avg. Disk Read Queue Length  (Should be less then 2)*
  • Physical Disk: Avg. Disk Write Queue Length  (Should be less then 2)*
  • Physical Disk: Avg. Disk Read /sec (Should be under 20ms, if over 50ms indicates a serious bottleneck)
  • Physical Disk: Avg. Disk Write /sec (Manufacturer dependent)

Counters Explained:
 
  • Physical Disk: Avg. Disk Queue Length – also measures the pressure on the physical disk array.  Excess of 2 per drive (3 disk array = 6) for 10+ minutes over a 24 hour period, indicates a disk bottleneck.
     
  • Physical Disk: Avg. Disk Read Queue Length  -  Excess of 2 per drive (3 disk array = 6) for 10+ minutes over a 24 hour period, indicates a disk bottleneck.
     
  • Physical Disk: Avg. Disk Write Queue Length -  Excess of 2 per drive (3 disk array = 6) for 10+ minutes over a 24 hour period, indicates a disk bottleneck.
     
  • Physical Disk: Avg. Disk Read /sec - measures the rate of read operations from the disk.
     
  • Physical Disk: Avg. Disk Write /sec - measures the rate of write operations on the disk.
 
Recommended Solutions:
  1. Verify the issue is not a memory bottleneck first
     
  2. Increase the amount of disks in the array
     
  3. Have SQL server the only application running on the computer
     
  4. Move active databases to a dedicated SQL server
 

Evaluating Processor Usage (CPU bottlenecks)

Counters:
 
  • Processor:  % Processor Time_Total
  • System:  Processor Queue Length

Counters Explained:
 
  • Processor:  % Processor Time_Total – measures the CPU utilization of each processor.  Add _Total to the count to get overall utilization of all CPU's combined.  Not to exceed 80% for 10+ minutes over a 24 period.
     
  • System:  Processor Queue Length – measures the backlog queue for processors.  Should not exceed 2 per CPU for 10+ minutes for a 24 hour period.  Example, if the server contains 4 CPU's, the count should not exceed 8 for a 10 minute period.

Recommended Solutions when both counts are too high:
  1. Add CPU's , faster CPU's or CPU's with larger L2 cache.
     
  2. 2 CPU environment, speed is more important.
     
  3. 4+ CPU environment, L2 cache is more important.
     
Recommended Solutions when Queue length is too high only:
(too many active worker threads)
  1. Move active databases to a dedicated SQL server
 

Evaluating Network Usage

Counters:
 
  • Network Interface:  Bytes Received/sec
  • Network Interface:  Bytes Sent/sec
  • Network Interface:  Bytes/sec
  • Network Interface:  Output Queue Length

Counters Explained:
 
  • Network Interface:  Bytes Received/sec – measures the rate at which bytes are received over each network adapter.  Need existing baseline to use effectively.
     
  • Network Interface:  Bytes Sent/sec - measures the rate at which bytes are sent over each network adapter.  Need existing baseline to use effectively.
     
  • Network Interface:  Bytes/sec – measures the rate at which bytes are sent and received over each network adapter.  Need existing baseline to use effectively.
     
  • Network Interface:  Output Queue Length – should always be 0, but can reach as high as 2 momentarily.  


Recommended Solutions:

  1. Verify there is not an external attach on the system
     
  2. Move active databases to a dedicated SQL server

 



Evaluating User Connections

Counter and explanation:

  • SQL Server: General Statistics:  User Connections.  Shows number of user connections (not currently connector users).    Count should not exceed 255 for 10+ minutes over a 24 hour period.
      

Recommended Solutions:

  1. Increase SQL 2005 Server configuration Maximum Worker Threads
     
  2. Move active databases to a dedicated SQL server
     
  3. Open all ports on the SQL server

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
    MaxUserPort (DWORD)  64,512 (Value:Decimal)

 

Was this content helpful?