Problem
Stripe set performance considerations for Veritas Storage Foundation
Solution
This article is a part of a set on troubleshooting volume performance. Click here to start at the beginning: https://www.veritas.com/docs/000087750 |
Table of Contents
Introduction
Using vxtrace to determine I/O characteristics
Sequential I/O
Random I/O
Determining the current stripe unit size
Matching the stripe size to the file system allocation unit size
Introduction
(Back to top)
By striping data across multiple spindles (physical disks) I/O can be processed in a parallel manner, increasing peformance. However, the traditional advantages of software-based stripe-sets are sometimes outweighed by changes and improvements to modern storage hardware. Today, disk arrays typically provide their own hardware-based striping which should be taken into consideration to avoid implmenting multiple RAID implementations that may conflict with each other. Different applications, such as databases or file servers, have dissimilar I/O characteristics that are affected by striping in varying ways.
In theory, as more spindles are added to a stripe set, more I/O is processed in parallel, potentially improving performance. However, the increase in parallel processing must be weighed against the increasing amount of movement that is the result of fragmenting I/O across multiple columns. As columns are added, one eventually encounters a "diminishing return" where adding further columns no longer provides a significant improvement in I/O, or is not worth the increased risk of a hardware failure. Every spindle that is added to a stripe set increases the chance that a single hardware failure will cause the entire volume to fail.
Note: Do not assume that a larger number of columns will provide better performance than a smaller number, or that a certain stripe unit size will have superior performance when compared to a different stripe unit size, or even that a striped volume will actually have superior performance when compared to a concatenated volume.
There are too many variables involved in performance for such assumptions to be true for all cases and there is no substitute for testing. Before putting a volume into production, use benchmarking tools to test I/O performance, in different layouts, in a manner that is representative of the intended production environment. This is the only reliable method to determine which layout provides the best performance.
Using vxtrace to determine I/O characteristics
(Back to top)
Vxtrace can be used to analyze the characteristics of I/O that is being written to a volume (Figure 1). This is useful for distinguishing random I/O from sequential I/O, the typical length (in sectors) of each I/O transaction, and how the I/O is being fragmented across multiple columns. The optimal stripe unit size ultimately depends on the characteristics of the I/O that is generated by the application.
Finding the typical I/O length is important for determining an appropriate stripe unit size.
- I/O lengths that are larger than the stripe width will be broken across multiple columns
- I/O lengths that are smaller than, or equal to, the stripe unit size will completely "fit" into one of the columns and not use any of the others.
Note: The vxtrace excerpts in this article are very brief to improve readability. Reviewing a larger sample is recommended in order to include data that is representative of the production environment.
Figure 1 - Using vxtrace to gather information about I/O to a volume
Syntax: vxtrace -t <time_in_seconds> -g <diskgroup> -o dev,disk <volume> > <outputfile> Example, with typical output: # vxtrace -t 10 -g datadg -o dev,disk engvol > /tmp/vxtrace.engvol |
Sequential I/O
(Back to top)
Figures 2 shows an example of sequential I/O, as observed by vxtrace. Notice that the starting block for each I/O appears to increment slightly from the previous operation. Also notice that the I/O length is usually 384 sectors.
For sequential I/O, optimal performance is generally achieved if I/O transactions are more frequently spread across multiple columns. This can be accomplished by using a stripe width size that is smaller than the typical I/O length.
Figure 2 - An example of vxtrace output showing sequential I/O
53595 START write vdev vol1 block 5785984 len 384 concurrency 1 pid 5855 53596 START write disk disk_5 op 53598 block 1994368 len 128 53597 START write disk disk_3 op 53598 block 1994496 len 128 53598 START write disk disk_4 op 53598 block 1994496 len 128 53595 END write vdev vol1 block 5785984 len 384 53596 END write disk disk_5 op 53598 block 1994368 len 128 53597 END write disk disk_3 op 53598 block 1994496 len 128 53598 END write disk disk_4 op 53598 block 1994496 len 128 53603 START write vdev vol1 block 5786752 len 384 concurrency 1 pid 5855 53604 START write disk disk_5 op 53606 block 1994624 len 128 53605 START write disk disk_3 op 53606 block 1994752 len 128 53606 START write disk disk_4 op 53606 block 1994752 len 128 53603 END write vdev vol1 block 5786368 len 384 53604 END write disk disk_5 op 53602 block 1994496 len 128 53605 END write disk disk_3 op 53602 block 1994624 len 128 53606 END write disk disk_4 op 53602 block 1994624 len 128 53611 START write vdev vol1 block 5786752 len 384 concurrency 1 pid 5855 53612 START write disk disk_5 op 53606 block 1994624 len 128 53613 START write disk disk_3 op 53606 block 1994752 len 128 53614 START write disk disk_4 op 53606 block 1994752 len 128 53615 START write vdev vol1 block 5787136 len 64 concurrency 2 pid 5855 53616 START write disk disk_5 op 53610 block 1994752 len 64 |
Random I/O
(Back to top)
Figure 3 shows an example of random I/O. Notice that the starting block varies significantly. The I/O lengths also vary in this sample, but tend to be lower than those in Figure 2.
For random I/O, optimal performance is generally achieved by containing each I/O transaction into a single column. To accomplish this, the stripe unit size should be larger than the average I/O size.
Figure 3 - An example of vxtrace output showing random I/O
43024 START write vdev vol1 block 33778 len 94 concurrency 1 pid 2202 43025 START write disk disk_5 op 43024 block 77042 len 14 43026 START write disk disk_3 op 43024 block 77056 len 80 43025 END write disk disk_5 op 43024 block 77042 len 14 time 3 43026 END write disk disk_3 op 43024 block 77056 len 80 time 3 43024 END write vdev vol1 op 43024 block 33778 len 94 time 3 43027 START write vdev vol1 block 1104 len 1 concurrency 1 pid 2203 43028 START write disk disk_5 op 43027 block 66128 len 1 43028 END write disk disk_5 op 43027 block 66128 len 1 time 2 43027 END write vdev vol1 op 43027 block 1104 len 1 time 2 43028 START write vdev vol1 block 1631 len 59 concurrency 1 pid 2202 43029 START write disk disk_3 op 43037 block 66399 len 33 43030 START write disk disk_4 op 43037 block 66304 len 26 43029 END write disk disk_3 op 43037 block 66399 len 33 time 3 43030 END write disk disk_4 op 43037 block 66304 len 26 time 3 43028 END write vdev vol1 op 43037 block 1631 len 59 time 3 43040 START write vdev vol1 block 36080 len 16 concurrency 1 pid 2203 |
Determining the current stripe unit size
(Back to top)
Use vxprint to determine the current stripe unit size (Figure 4).
Figure 4 shows volume "mgmtvol" with the following characteristics:
- 3 columns
- stripe unit size of 128 sectors (64KB)
- stripe width size of 384 (the stripe unit size multiplied by the number of columns)
Figure 4
Syntax: vxprint -htv <volume> Example, with typical output: # vxprint -htv mgmtvol |
Matching the stripe size to the file system allocation unit size
(Back to top)
A best practice is to set the stripe width to a multiple of the filesystem allocation unit size, For example, if the filesystem block size is 4KB, a stripe width of 384 would be a valid multiple because the quotient of 384 and 4 is an integer. Recall that the stripe width is the product of the stripe unit size multiplied by the number of columns.
Use fstyp to determine the filesystem block size (Figure 5).
Figure 5
Syntax: fstyp -t|F vxfs -v <path_to_volume> Example, with typical output: # fstyp -t vxfs -v /dev/vx/rdsk/datadg/mgmtvol |