Search <product_name> all support & community content...

Troubleshooting volume performance in InfoScale and Storage Foundation

Article: 100028692

Last Published: 2015-07-09

Ratings: 0 0

Product(s): InfoScale & Storage Foundation

Problem

Troubleshooting volume performance in InfoScale and Storage Foundation

Solution

Table of Contents

Introduction
Checking for basic resource shortages
Verifying that the correct ASL (Array Support Library) is loaded
Disabled paths and I/O errors
Filesystem Fragmentation
Volume and subdisk bottlenecks
Stripe set considerations
RAID Contention
DMP I/O policy
Mount options
File system allocation unit size

Introduction

(Back to top)

This article discusses basic performance troubleshooting for Veritas Storage Foundation.

Before proceeding, make sure that you know the answers to the following questions:

What are the specific symptoms that are being observed? If available, use performance monitoring tools to determine which storage component is affected, and how the degraded performance actually differs from normal performance.

Is the performance degradation consistent or intermittent?

If the problem is intermittent, what patterns can be observed? Does performance become degraded at certain times, such as when other scheduled tasks are taking place?

When was the degraded performance first observed?

Have there been any recent changes, such as software installations, patches, driver updates, hardware changes or changes to the SAN?

It is common that a performance problem appears to originate from a particular component, but upon further investigation, is found to originate from an entirely different storage layer, or even from a component that is not directly related to storage. A best practice is to check for basic resource shortages, such as CPU, memory and disk space, before delving into more complex possibilities.

Checking for basic resource shortages

(Back to top)

Before delving into more specific topics, first verify that there are no basic resource shortage that may account for poor performance. In particular check the following items.

Memory
CPU
Disk Space - VxFS volumes should be less than 90% utilized.

Note: These are not Veritas commands. This information is provided as a convenience and should not be regarded as authoritative. Review the official documentation supplied by the vendors for their respective platforms to confirm the correct usage of these commands.

Table 1 - Common commands and syntax to check resources

Memory

CPU

Disk Space

Solaris

prstat -s rss

Look for the "RSS" column.

prstat

Look for the "CPU " column.

df -k

Look for "capacity."

Linux

top

Type "M" to sort by memory usage. Look for the "MEM" column.

top

Type "P" to sort by CPU usage. Look for the "CPU" column.

df -k

Look for "Use%."

AIX

nmon

Type "t," then "4" to sort by "Size."
Look for the "Res Set" column.

topas -P

Use the cursor to highlight one of the "RES" columns.

nmon

Type "t," then "3" to sort by "CPU."
Look for the "CPU Used" column.

topas -P

Use the cursor to highlight the "CPU" column. By, default, this should already be highlighted.

df -k

Look for "%Used."

HP-UX

top

Look for the "RES" column.

top

Look for the "WCPU" and "CPU" columns.

df -k

"% allocation used."

Verify that the correct ASL (Array Support Library) is loaded

(Back to top)

A common cause of poor performance is that a generic ASL is loaded instead of a vendor-specific ASL. While troubleshooting performance, verify that the correct ASL is loaded.

More details about determining which ASL is loaded and why a wrong ASL may be loaded can be found in this article:

"Other_disks," "scsi3_jbod" or "jbod" ASLs (Array Support Libraries) are claiming disks as generic devices

https://www.veritas.com/support/en_US/article.100029022

Disabled paths and I/O errors

(Back to top)

Use vxdmpadm getsubpaths to determine the status of the paths to the disks (Figure 1).
Use Vxdmpadm -e iostat show to check for I/O errors that are detected for each path (Figure 2).

Veritas will disable a path if serious or sustained I/O errors occur. When all paths to a disk are disabled, the server will be unable to read or write to the volume. If a path has been disabled, review the syslog for events that are reported by "vxdmp," or "scsi" for I/O errors.

Although a path can be re-enabled using "vxdmpadm enable," vxdmp should automatically evaluate the status of a path in five minute intervals using a scsi inquiry. If the query is successful, the path is automatically re-enabled. If a path remains disabled beyond this interval, it is possible that I/O errors are still being detected, warranting further investigation. Paths are not automatically re-enabled If the diskgroup has been disabled, or if vxesd is stopped. The behavior of vxdmp in response to disabled paths can be modified via the DMP tunables, which can be viewed using "vxmpadm gettune."

Note: Although the syslog may show that vxdmp is the source of an I/O error, vxdmp itself is not usually the origin. Veritas depends on the OS device drivers to communicate with disks. When I/O errors occur, they are reported to Veritas by the device drivers. Vxdmp will report the errors that have been passed to it by the device drivers and may disable a path in response to the events.

Figure 1 - Using vxdmpadm to determine the status of paths

Syntax:

vxdmpadm getsubpaths

Example, with typical output:

# vxdmpadm getsubpaths
 
 NAME      STATE[A]   PATH-TYPE[M] DMPNODENAME  ENCLR-NAME   CTLR
 ================================================================
 sdk       ENABLED(A)   -          disk_0       ams_wms0     c8
 sdr       ENABLED(A)   -          disk_0       ams_wms0     c3
 sdb       ENABLED(A)   -          disk_1       ams_wms0     c8
 sdc       ENABLED(A)   -          disk_1       ams_wms0     c3
 sdo       ENABLED(A)   -          disk_2       ams_wms0     c8
 sdt       ENABLED(A)   -          disk_2       ams_wms0     c3
 sdd       DISABLED     -          disk_3       ams_wms0     c8
 sdf       ENABLED(A)   -          disk_3       ams_wms0     c3
 sdh       ENABLED(A)   -          disk_4       ams_wms0     c8
 sdn       ENABLED(A)   -          disk_4       ams_wms0     c3
 sde       ENABLED(A)   -          disk_5       ams_wms0     c8
 sdi       ENABLED(A)   -          disk_5       ams_wms0     c3
 sdj       ENABLED(A)   -          disk_6       ams_wms0     c8
 sdp       ENABLED(A)   -          disk_6       ams_wms0     c3
 sdq       ENABLED(A)   -          disk_7       ams_wms0     c8
 sdu       ENABLED(A)   -          disk_7       ams_wms0     c3
 sdg       ENABLED(A)   -          disk_8       ams_wms0     c8
 sdl       ENABLED(A)   -          disk_8       ams_wms0     c3
 sdm       ENABLED(A)   -          disk_9       ams_wms0     c8
 sds       ENABLED(A)   -          disk_9       ams_wms0     c3
 sda       ENABLED(A)   -          sda          other_disks  c2

Figure 2 - Using vxdmpadm to check for errors down I/O paths

Syntax:

vxdmpadm iostart start
vxdmpadm -ez iostat show interval=<time_in_seconds> count=<desired_number_of_samples>
vxdmpadm iostart stop

Example, with typical output:

Note: In this example, path sdj appears to be experiencing consistent I/O errors. Check the syslog for references to path sdj to see what errors are being reported.

Notice that the first set of output is the cumulative total since the statistics were last reset. Resetting the statistics manually can be done with vxdmpadm iostat reset.

# vxdmpadm -ez iostat show interval=5 cpu usage = 36678us per cpu memory = 192512b ERROR I/Os PATHNAME READS WRITES sdd 0 5 sdf 0 5 sdh 0 7 sdn 0 5 sde 0 5 sdi 0 7 sdj 0 10 sdp 0 8 sdj 0 2 sdj 0 6 sdj 0 3

sdj 0 1

Filesystem Fragmentation

(Back to top)

Filesystem fragmentation causes data blocks to be scattered through a filesystem in a non-contiguous manner. This reduces performance by increasing the amount of time and movement that is required to access data blocks and reduces performance. When troubleshooting performance, use /opt/VRTS/bin/fsadm to check for VxFS filesystem fragmentation.

More information about using fsadm to analyze and defragment a filesystem can be found here:

"How to interpret directory and extent fragmentation report from fsadm -E and fsadm -D output"

https://www.veritas.com/support/en_US/article.100024882

Volume and subdisk bottlenecks

(Back to top)

Use vxprint to display the objects that are contained by the diskgroup (Figure 3).

From the vxprint output in Figure 3, notice that:

Disk group datadg has three volumes: "engvol," "hrvol" and "locks."
Each volume has one subdisk: "datadg01-02," "datadg01-01" and "datadg04-01."
Two of the subdisks, "datadg01-02," "datadg01-01" both reside on the same disk: "datadg01."
One of the subdisks, "datadg04-01," resides on its own disk: "datadg04."

Note: A subdisk is simply a contiguous "piece" of a volume. A volume that spans two disks is typically broken into two subdisks. A volume that only resides on a single disk might only have one subdisk, but this can vary depending on the volume structure. Subdisks are tagged with an "sd" by vxprint.

Figure 3 - Using vxprint to display a diskgroup

Syntax:

vxprint -ht

Example, with typical output:

# vxprint -ht
 
 dg datadg       default      default  10000    1336408747.34.Server101
 
 dm datadg01     disk_3       auto     65536    2027264  -
 dm datadg02     disk_4       auto     65536    2027264  -
 
 v  engvol        -            ENABLED  ACTIVE   819200   SELECT    -        fsgen
 pl engvol-01    engvol       ENABLED  ACTIVE   819200   CONCAT    -        RW
 sd datadg01-02   engvol-01    datadg01 1024000  819200   0         disk_3   ENA
 
 v  hrvol         -            ENABLED  ACTIVE   1024000  SELECT    -        fsgen
 pl hrvol-01     hrvol        ENABLED  ACTIVE   1024000  CONCAT    -        RW
 sd datadg01-01   hrvol-01     datadg01 0        1024000  0         disk_3   ENA
 
 v  locks         -            ENABLED  ACTIVE   102400   SELECT    -        fsgen
 pl locks-01     locks        ENABLED  ACTIVE   102400   CONCAT    -        RW
 sd datadg02-01   locks-01     datadg02 0        102400   0         disk_4   ENA

Use vxstat to gather I/O performance statistics about this disk group (Figure 4):

In particular, look for bottlenecks:

Does vxstat show that multiple, busy volumes (or subdisks) reside on the same disk? Moving a busy volume, or subdisk, to its own disk may improve performance.
Does vxstat show that the I/O is composed of significantly more read operations than write operations? Mirroring a volume often improves read performance. However, mirroring also usually degrades the write performance slightly due to the increased work required to maintain multiple copies of the data.

For example, the vxstat output in Figure 4 shows that disk "datadg01" has virtually all of the I/O activity, while disk "datadg02" has none. Recall from Figure 4 that both volumes "hrvol" and "engvol" reside on disk "datadg01," while volume "locks" has disk "datadg02" to itself. In this example, performance may be improved by simply moving either "engvol" or "hrvol" to another disk. Also, notice that most of the I/O is composed of write operations. In this case, mirroring either volume for performance reasons is not recommended.

Note: In this article, the term "disk" is used in a generic sense. A "disk" that is presented across a SAN typically refers to a LUN, which is associated with a logical group of multiple, physical disks.

When moving a subdisk for performance reasons, the target LUN should reside on a different set of physical "spindles" (individual, physical disks) than the source LUN. Moving a subdisk to a target LUN that uses the same spindles as the source LUN is unlikely to improve performance because the same physical spindles are still being used by both subdisks. This undermines the purpose of moving the subdisk.

Figure 4 - Using vxstat to gather performance statistics about a disk group

Syntax:

vxstat -g <diskgroup> -vpsduh -i <time_interval> -c <number_of_samples_to_gather>

Example, with typical output:

Note: Notice that the first sample is the cumulative total since the statistics were last reset. Resetting the statistics manually can be done with vxstat -g <diskgroup> -r.

# vxstat -g datadg -vpsduh -i30 -c3
 
                       OPERATIONS          BYTES           AVG TIME(ms)
 TYP NAME              READ     WRITE      READ     WRITE   READ  WRITE
 
 Tue 09 Apr 2013 10:33:42 AM PDT
 dm  datadg01           217     97528      638k     6068m  13.53   3.54
 dm  datadg02             0         0         0         0   0.00   0.00
 vol engvol              93     47796      268k     2957m  16.54  78.72
 pl  engvol-01           93     47796      268k     2957m  16.54  78.72
 sd  datadg01-02         93     47796      268k     2957m  16.54  78.72
 vol hrvol               93     49580      268k        3g  13.29  17.59
 pl  hrvol-01            93     49580      268k        3g  13.29  17.59
 sd  datadg01-01         93     49580      268k        3g  13.29  17.59
 vol locks                0         0         0         0   0.00   0.00
 pl  locks-01             0         0         0         0   0.00   0.00
 sd  datadg02-01          0         0         0         0   0.00   0.00
 
 Tue 09 Apr 2013 10:34:12 AM PDT
 dm  datadg01             0     11580         0      718m   0.00  51.32
 dm  datadg02             0         0         0         0   0.00   0.00
 vol engvol               0      1089         0       68m   0.00 418.25
 pl  engvol-01            0      1089         0       68m   0.00 418.25
 sd  datadg01-02          0      1089         0       68m   0.00 418.25
 vol hrvol                0     10491         0      651m   0.00  13.23
 pl  hrvol-01             0     10491         0      651m   0.00  13.23
 sd  datadg01-01          0     10491         0      651m   0.00  13.23
 vol locks                0         0         0         0   0.00   0.00
 pl  locks-01             0         0         0         0   0.00   0.00
 sd  datadg02-01          0         0         0         0   0.00   0.00
 
 Tue 09 Apr 2013 10:34:42 AM PDT
 dm  datadg01             0     10130         0      629m   0.00 367.85
 dm  datadg02             0         0         0         0   0.00   0.00
 dm  datadg03             0         0         0         0   0.00   0.00
 dm  datadg04             0         0         0         0   0.00   0.00
 vol engvol               0      4445         0      276m   0.00 819.02
 pl  engvol-01            0      4445         0      276m   0.00 819.02
 sd  datadg01-02          0      4445         0      276m   0.00 819.02
 vol hrvol                0      5685         0      353m   0.00  15.09
 pl  hrvol-01             0      5685         0      353m   0.00  15.09
 sd  datadg01-01          0      5685         0      353m   0.00  15.09
 vol locks                0         0         0         0   0.00   0.00
 pl  locks-01             0         0         0         0   0.00   0.00
 sd  datadg02-01          0         0         0         0   0.00   0.00

Stripe set performance considerations

(Back to top)

By striping data across multiple spindles (physical disks) I/O can be processed in a parallel manner, increasing peformance. Vxtrace can be used to analyze the characteristics of I/O that is being written to a volume. This is useful for distinguishing random I/O from sequential I/O, the typical length (in sectors) of each I/O transaction, and how the I/O is being fragmented across multiple columns.

More information about stripe set performance can be found here:

"Stripe set performance considerations in Veritas Storage Foundation"

https://www.veritas.com/support/en_US/article.100029158

RAID Contention

(Back to top)

Many disk arrays have their own built-in RAID capability. A single "disk," or LUN, that is presented from a disk array may actually be a group of several hardware spindles (physical disks) that are a part of a RAID set. This creates the possibility that volume performance may be affected by multiple RAID configurations at the same time: one on the hardware layer (controlled by the disk array) and one on the software layer (controlled by Veritas). When configuring a RAID set, it is important to consider the performance effect that a RAID layout at one layer will affect the performance of another layer.

For example, configuring a RAID-5 set within Veritas, using LUNs that are also a part of a RAID-5 set within the disk array, will likely result in contention between the two RAID logics, decreasing performance, creating additional work for the disk spindles and increasing the chance of a hardware failure.

Alternatively, it is common to combine striping without parity (RAID-0) and mirroring (RAID-1) into configurations that improve both performance and data availability.

DMP I/O policy

(Back to top)

Review the DMP I/O policy for the disks. In some cases, switching to a different I/O policy may improve performance. For disk arrays that support "active/active" multipathing, "MinimumQ" (also known as "Least Queue Depth") is the default I/O policy, and it often provides the best I/O performance with little configuration required. However, the appropriate policy will depend on the environment and the type of I/O.

More information about changing the DMP I/O policy can be found here:

"How to change the DMP I/O policy and monitor for performance"

https://www.veritas.com/support/en_US/article.100029158

Mount options

(Back to top)

The mount options for a volume can have a significant impact on performance. In particular, adjusting the intent log mount option may increase or decrease performance by 15-20 percent. Currently, the default mount log option is "delaylog."

Use mount, to determine the current mount options for a volume (Figure 5). Mount options can be changed by dismounting and mounting the volume while specifying the desired option. This can be done manually, using mount (Figure 5) or by modifying a system configuration file, such as etc/fstab (or vfstab).

Figure 5

Syntax:

mount | grep -i vxfs

Example, with typical output:

# mount | grep -i vxfs
 
 /dev/vx/dsk/datadg/vol1 on /vol1 type vxfs ( rw,delaylog,largefiles,ioerror=mwdisable )
 /dev/vx/dsk/datadg/locks on /var/tmp/locks type vxfs ( rw,delaylog,largefiles,ioerror=mwdisable )

Table 2 - Basic Intent log mount options.

Mount Option	Description	Performance Considerations
log	Writes are not acknowledged until the data has actually been written to the disk.	15-20 percent slower performance when compared to delaylog. Greatest level of data integrity.
delaylog	Some writes are first written to filesystem cache and then later committed to the disk, after a slight delay.	15-20 percent faster performance when compared to log. If a volume is dismounted ungracefully, the most recent writes may be lost. The default setting in current versions.
tmplog	Writes are only committed to the disk when the kernel write buffer is full	Even faster performance than delaylog. Much greater risk of losing recent writes. Only recommended for temporary data.

A detailed explanation of each of the mount options, including information on other mount options can be found here:

"Mounting a VxFS file system" (from the Veritas Storage Foundation 6.0 Administrators Guide for Solaris)
https://sort.veritas.com/public/documents/sfha/6.0.1/solaris/productguides/html/sf_admin/ch07s03.htm

Filesystem allocation unit size

(Back to top)

The default file system allocation unit size, commonly referred to as the "block size," for VxFS is 1 KB for file systems that are smaller than 1 TB. For filesystems that are 1 TB, or larger, the default file allocation unit size is 8 KB.

When creating a file system, it is possible to specify a file allocation unit size by using the "-o bsize" argument with /opt/VRTS/bin/mkfs (Figure 6).

As a rough guideline, a file system with a smaller block size tends to be the most efficient for a volume that is primarily composed of small files. For a volume that contains mostly larger files, use a larger block size. Some application vendors provide recommendations for an optimal filesystem block size. It is usually best to follow their guidelines when creating a new filesystem. Ultimately, the best way to determine the optimal block size is to use benchmarking tools to measure the performance of a file system, at different block sizes, before a volume is placed into production.

Note: Changing the file system allocation unit size requires reformatting the volume.

Figure 6 - Specifying a file allocation unit size with /opt/VRTS/bin/mkfs

Syntax:

/opt/VRTS/bin/mkfs -t|F vxfs -o bsize=<desired_block_size> <path_to_volume>

Example, with typical output:

# /opt/VRTS/bin/mkfs -t vxfs -o bsize=4096 /dev/vx/rdsk/datadg/mgmtvol
 
     version 9 layout
     102400 sectors, 12800 blocks of size 4096 , log size 256 blocks
     rcq size 256 blocks
     largefiles supported

Use fstyp to determine the filesystem block size (Figure 7).

Figure 7

Syntax:

fstyp -t|F vxfs -v <path_to_volume>

Example, with typical output:

# fstyp -t vxfs -v /dev/vx/rdsk/datadg/mgmtvol
 
 vxfs
 magic a501fcf5  version 9  ctime Wed 10 Apr 2013 11:37:59 AM PDT
 logstart 0  logend 0
 bsize  4096 size  12800 dsize  12800  ninode 0  nau 0
 defiextsize 0  ilbsize 0  immedlen 96  ndaddr 10
 aufirst 0  emap 0  imap 0  iextop 0  istart 0
 bstart 0  femap 0  fimap 0  fiextop 0  fistart 0  fbstart 0
 nindir 2048  aulen 32768  auimlen 0  auemlen 2
 auilen 0  aupad 0  aublocks 32768  maxtier 15
 inopb 16  inopau 0  ndiripau 0  iaddrlen 2   bshift 12
 inoshift 4  bmask fffff000  boffmask fff  checksum f66f5f0c
 oltext1 14  oltext2 1030  oltsize 1  checksum2 0
 free 11993  ifree 0
 efree  1 0 0 1 1 2 2 0 2 2 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Troubleshooting volume performance in InfoScale and Storage Foundation

Problem

Solution

Introduction

Checking for basic resource shortages

Verify that the correct ASL (Array Support Library) is loaded

Disabled paths and I/O errors

Filesystem Fragmentation

Volume and subdisk bottlenecks

Stripe set performance considerations

RAID Contention

DMP I/O policy

Mount options

Filesystem allocation unit size

Was this content helpful?

Translated Content

Troubleshooting volume performance in InfoScale and Storage Foundation

Problem

Solution

Introduction

Checking for basic resource shortages

Verify that the correct ASL (Array Support Library) is loaded

Disabled paths and I/O errors

Filesystem Fragmentation

Volume and subdisk bottlenecks

Stripe set performance considerations

RAID Contention

DMP I/O policy

Mount options

Filesystem allocation unit size

Was this content helpful?

Article Languages

Translated Content

Translated Content