Problem
Troubleshooting volume performance in InfoScale and Storage Foundation
Solution
Table of Contents
Introduction
Checking for basic resource shortages
Verifying that the correct ASL (Array Support Library) is loaded
Disabled paths and I/O errors
Filesystem Fragmentation
Volume and subdisk bottlenecks
Stripe set considerations
RAID Contention
DMP I/O policy
Mount options
File system allocation unit size
Introduction
This article discusses basic performance troubleshooting for Veritas Storage Foundation.
Before proceeding, make sure that you know the answers to the following questions:
- What are the specific symptoms that are being observed? If available, use performance monitoring tools to determine which storage component is affected, and how the degraded performance actually differs from normal performance.
- Is the performance degradation consistent or intermittent?
- If the problem is intermittent, what patterns can be observed? Does performance become degraded at certain times, such as when other scheduled tasks are taking place?
- When was the degraded performance first observed?
- Have there been any recent changes, such as software installations, patches, driver updates, hardware changes or changes to the SAN?
It is common that a performance problem appears to originate from a particular component, but upon further investigation, is found to originate from an entirely different storage layer, or even from a component that is not directly related to storage. A best practice is to check for basic resource shortages, such as CPU, memory and disk space, before delving into more complex possibilities.
Checking for basic resource shortages
Before delving into more specific topics, first verify that there are no basic resource shortage that may account for poor performance. In particular check the following items.
- Memory
- CPU
- Disk Space - VxFS volumes should be less than 90% utilized.
Table 1 - Common commands and syntax to check resources
Memory | CPU | Disk Space | |
Solaris | prstat -s rss Look for the "RSS" column. |
prstat Look for the "CPU " column. |
df -k Look for "capacity." |
Linux | top Type "M" to sort by memory usage. Look for the "MEM" column. |
top Type "P" to sort by CPU usage. Look for the "CPU" column. |
df -k Look for "Use%." |
AIX | nmon Type "t," then "4" to sort by "Size." topas -P Use the cursor to highlight one of the "RES" columns. |
nmon Type "t," then "3" to sort by "CPU." topas -P Use the cursor to highlight the "CPU" column. By, default, this should already be highlighted. |
df -k Look for "%Used." |
HP-UX | top Look for the "RES" column. |
top Look for the "WCPU" and "CPU" columns. |
df -k "% allocation used." |
Verify that the correct ASL (Array Support Library) is loaded
A common cause of poor performance is that a generic ASL is loaded instead of a vendor-specific ASL. While troubleshooting performance, verify that the correct ASL is loaded.
More details about determining which ASL is loaded and why a wrong ASL may be loaded can be found in this article:
"Other_disks," "scsi3_jbod" or "jbod" ASLs (Array Support Libraries) are claiming disks as generic devices
https://www.veritas.com/support/en_US/article.100029022
Disabled paths and I/O errors
- Use vxdmpadm getsubpaths to determine the status of the paths to the disks (Figure 1).
- Use Vxdmpadm -e iostat show to check for I/O errors that are detected for each path (Figure 2).
Veritas will disable a path if serious or sustained I/O errors occur. When all paths to a disk are disabled, the server will be unable to read or write to the volume. If a path has been disabled, review the syslog for events that are reported by "vxdmp," or "scsi" for I/O errors.
Although a path can be re-enabled using "vxdmpadm enable," vxdmp should automatically evaluate the status of a path in five minute intervals using a scsi inquiry. If the query is successful, the path is automatically re-enabled. If a path remains disabled beyond this interval, it is possible that I/O errors are still being detected, warranting further investigation. Paths are not automatically re-enabled If the diskgroup has been disabled, or if vxesd is stopped. The behavior of vxdmp in response to disabled paths can be modified via the DMP tunables, which can be viewed using "vxmpadm gettune."
Figure 1 - Using vxdmpadm to determine the status of paths
Syntax: vxdmpadm getsubpaths Example, with typical output: # vxdmpadm getsubpaths |
Figure 2 - Using vxdmpadm to check for errors down I/O paths
Syntax:
Example, with typical output: Note: In this example, path sdj appears to be experiencing consistent I/O errors. Check the syslog for references to path sdj to see what errors are being reported.
|
Filesystem Fragmentation
Filesystem fragmentation causes data blocks to be scattered through a filesystem in a non-contiguous manner. This reduces performance by increasing the amount of time and movement that is required to access data blocks and reduces performance. When troubleshooting performance, use /opt/VRTS/bin/fsadm to check for VxFS filesystem fragmentation.
More information about using fsadm to analyze and defragment a filesystem can be found here:
"How to interpret directory and extent fragmentation report from fsadm -E and fsadm -D output"
https://www.veritas.com/support/en_US/article.100024882
Volume and subdisk bottlenecks
Use vxprint to display the objects that are contained by the diskgroup (Figure 3).
From the vxprint output in Figure 3, notice that:
- Disk group datadg has three volumes: "engvol," "hrvol" and "locks."
- Each volume has one subdisk: "datadg01-02," "datadg01-01" and "datadg04-01."
- Two of the subdisks, "datadg01-02," "datadg01-01" both reside on the same disk: "datadg01."
- One of the subdisks, "datadg04-01," resides on its own disk: "datadg04."
Note: A subdisk is simply a contiguous "piece" of a volume. A volume that spans two disks is typically broken into two subdisks. A volume that only resides on a single disk might only have one subdisk, but this can vary depending on the volume structure. Subdisks are tagged with an "sd" by vxprint.
Figure 3 - Using vxprint to display a diskgroup
Syntax: vxprint -ht Example, with typical output: # vxprint -ht |
Use vxstat to gather I/O performance statistics about this disk group (Figure 4):
In particular, look for bottlenecks:
- Does vxstat show that multiple, busy volumes (or subdisks) reside on the same disk? Moving a busy volume, or subdisk, to its own disk may improve performance.
- Does vxstat show that the I/O is composed of significantly more read operations than write operations? Mirroring a volume often improves read performance. However, mirroring also usually degrades the write performance slightly due to the increased work required to maintain multiple copies of the data.
For example, the vxstat output in Figure 4 shows that disk "datadg01" has virtually all of the I/O activity, while disk "datadg02" has none. Recall from Figure 4 that both volumes "hrvol" and "engvol" reside on disk "datadg01," while volume "locks" has disk "datadg02" to itself. In this example, performance may be improved by simply moving either "engvol" or "hrvol" to another disk. Also, notice that most of the I/O is composed of write operations. In this case, mirroring either volume for performance reasons is not recommended.
When moving a subdisk for performance reasons, the target LUN should reside on a different set of physical "spindles" (individual, physical disks) than the source LUN. Moving a subdisk to a target LUN that uses the same spindles as the source LUN is unlikely to improve performance because the same physical spindles are still being used by both subdisks. This undermines the purpose of moving the subdisk.
Figure 4 - Using vxstat to gather performance statistics about a disk group
Syntax: vxstat -g <diskgroup> -vpsduh -i <time_interval> -c <number_of_samples_to_gather> Example, with typical output:
Note: Notice that the first sample is the cumulative total since the statistics were last reset. Resetting the statistics manually can be done with
vxstat -g <diskgroup> -r.
# vxstat -g datadg -vpsduh -i30 -c3 |
Stripe set performance considerations
(Back to top)
By striping data across multiple spindles (physical disks) I/O can be processed in a parallel manner, increasing peformance. Vxtrace can be used to analyze the characteristics of I/O that is being written to a volume. This is useful for distinguishing random I/O from sequential I/O, the typical length (in sectors) of each I/O transaction, and how the I/O is being fragmented across multiple columns.
More information about stripe set performance can be found here:
"Stripe set performance considerations in Veritas Storage Foundation"
https://www.veritas.com/support/en_US/article.100029158
RAID Contention
Many disk arrays have their own built-in RAID capability. A single "disk," or LUN, that is presented from a disk array may actually be a group of several hardware spindles (physical disks) that are a part of a RAID set. This creates the possibility that volume performance may be affected by multiple RAID configurations at the same time: one on the hardware layer (controlled by the disk array) and one on the software layer (controlled by Veritas). When configuring a RAID set, it is important to consider the performance effect that a RAID layout at one layer will affect the performance of another layer.
For example, configuring a RAID-5 set within Veritas, using LUNs that are also a part of a RAID-5 set within the disk array, will likely result in contention between the two RAID logics, decreasing performance, creating additional work for the disk spindles and increasing the chance of a hardware failure.
Alternatively, it is common to combine striping without parity (RAID-0) and mirroring (RAID-1) into configurations that improve both performance and data availability.
DMP I/O policy
Review the DMP I/O policy for the disks. In some cases, switching to a different I/O policy may improve performance. For disk arrays that support "active/active" multipathing, "MinimumQ" (also known as "Least Queue Depth") is the default I/O policy, and it often provides the best I/O performance with little configuration required. However, the appropriate policy will depend on the environment and the type of I/O.
More information about changing the DMP I/O policy can be found here:
"How to change the DMP I/O policy and monitor for performance"
https://www.veritas.com/support/en_US/article.100029158
Mount options
(Back to top)
The mount options for a volume can have a significant impact on performance. In particular, adjusting the intent log mount option may increase or decrease performance by 15-20 percent. Currently, the default mount log option is "delaylog."
Use mount, to determine the current mount options for a volume (Figure 5). Mount options can be changed by dismounting and mounting the volume while specifying the desired option. This can be done manually, using mount (Figure 5) or by modifying a system configuration file, such as etc/fstab (or vfstab).
Figure 5
Syntax: mount | grep -i vxfs Example, with typical output: # mount | grep -i vxfs |
Table 2 - Basic Intent log mount options.
Mount Option | Description | Performance Considerations |
log | Writes are not acknowledged until the data has actually been written to the disk. |
|
delaylog | Some writes are first written to filesystem cache and then later committed to the disk, after a slight delay. |
|
tmplog | Writes are only committed to the disk when the kernel write buffer is full |
|
"Mounting a VxFS file system" (from the Veritas Storage Foundation 6.0 Administrators Guide for Solaris)
https://sort.veritas.com/public/documents/sfha/6.0.1/solaris/productguides/html/sf_admin/ch07s03.htm
Filesystem allocation unit size
The default file system allocation unit size, commonly referred to as the "block size," for VxFS is 1 KB for file systems that are smaller than 1 TB. For filesystems that are 1 TB, or larger, the default file allocation unit size is 8 KB.
When creating a file system, it is possible to specify a file allocation unit size by using the "-o bsize" argument with /opt/VRTS/bin/mkfs (Figure 6).
As a rough guideline, a file system with a smaller block size tends to be the most efficient for a volume that is primarily composed of small files. For a volume that contains mostly larger files, use a larger block size. Some application vendors provide recommendations for an optimal filesystem block size. It is usually best to follow their guidelines when creating a new filesystem. Ultimately, the best way to determine the optimal block size is to use benchmarking tools to measure the performance of a file system, at different block sizes, before a volume is placed into production.
Figure 6 - Specifying a file allocation unit size with /opt/VRTS/bin/mkfs
Syntax: /opt/VRTS/bin/mkfs -t|F vxfs -o bsize=<desired_block_size> <path_to_volume> Example, with typical output: # /opt/VRTS/bin/mkfs -t vxfs -o bsize=4096 /dev/vx/rdsk/datadg/mgmtvol |
Use fstyp to determine the filesystem block size (Figure 7).
Figure 7
Syntax: fstyp -t|F vxfs -v <path_to_volume> Example, with typical output: # fstyp -t vxfs -v /dev/vx/rdsk/datadg/mgmtvol |