SFHA 5.1 Virtualization (Solaris) IBM SVC managed devices require the vdc_timeout value to be set in the GUEST domain to ensure that I/O service is maintained by DMP when rebooting/restarting the LDOM service domain server
Problem
In the event that the SERVICE domain were to panic, has hung, or has been restarted intentionally, the filesystem read/write operations within the GUEST domain may hang in regards to I/O service.
Sample configuration
In this instance, the SERVICE domain configuration consists of two Solaris Logical Domains (LDOMs) for which will be referred to as the "Primary" and "Alternate" domain servers.
Both SERVICE domains each host the virtual disk backend via 2 paths each, the virtual disk backend component is then presented as a virtual disk to the GUEST domain, which is where the data is actually stored.
GUEST domain hosts the "virtual disk"
# echo | format
<snippet of paths to a single virtual disk with 4 paths>
1. c0d1 <IBM-2145-0000 cyl 10238 alt 2 hd 64 sec 256>
/virtual-devices@100/channel-devices@200/disk@1
2. c0d2 <IBM-2145-0000 cyl 10238 alt 2 hd 64 sec 256>
/virtual-devices@100/channel-devices@200/disk@2
3. c0d3 <IBM-2145-0000 cyl 10238 alt 2 hd 64 sec 256>
/virtual-devices@100/channel-devices@200/disk@3
4. c0d4 <IBM-2145-0000 cyl 10238 alt 2 hd 64 sec 256>
/virtual-devices@100/channel-devices@200/disk@4
<end of snippet>
Veritas Volume Manager (VxVM) view of the virtual disk backend content on the "Primary" SERVICE domain
In this instance the "virtual disk" provisioned presents VxVM diskgroup "datadg" is imported in the GUEST domain.
# vxdisk path | grep -w datadg
c0d4s2 san_vc0_1 disk01 datadg ENABLED
c0d3s2 san_vc0_1 disk01 datadg ENABLED
c0d2s2 san_vc0_1 disk01 datadg ENABLED
c0d1s2 san_vc0_1 disk01 datadg ENABLED
# vxdmpadm listenclosure all output
ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ARRAY_TYPE LUN_COUNT
=======================================================================================
san_vc0 SAN_VC 020063c081b2XX00 CONNECTED A/A-A-IBMSVC 1
Veritas Volume Manager (VxVM) view of the virtual disk backend content on the "Primary" SERVICE domain
# vxdisk -o alldgs list | grep -w datadg
san_vc0_10 auto:cdsdisk - (datadg) online
# vxdisk path | grep -w san_vc0_1
c2t50050768012042C2d18s2 san_vc0_10 - - ENABLED
c2t50050768012040D9d18s2 san_vc0_1 - - ENABLED
#vxdmpadm listenclosure all outputENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ARRAY_TYPE LUN_COUNT =======================================================================================san_vc0 SAN_VC 020063c081b2XX00 CONNECTED A/A-A-IBMSVC 70 <<<<<san_vc1 SAN_VC 020064e0a100XX00 CONNECTED A/A-A-IBMSVC 2disk Disk DISKS CONNECTED Disk 2
Veritas Volume Manager (VxVM) view of the virtual disk backend content on the "Alternate" SERVICE domain
# vxdisk -o alldgs list | grep -w datadg
san_vc0_10 auto:cdsdisk - (datadg) online
# vxdisk path | grep -w san_vc0_1
c0t50050768011042C2d18s2 san_vc0_10 - - ENABLED
c0t50050768011040D9d18s2 san_vc0_10 - - ENABLED
#vxdmpadm listenclosure all outputENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ARRAY_TYPE LUN_COUNT =======================================================================================san_vc0 SAN_VC 020063c081b2XX00 CONNECTED A/A-A-IBMSVC 70san_vc1 SAN_VC 020064e0a100XX00 CONNECTED A/A-A-IBMSVC 2
If the Primary SERVICE domain server were to be rebooted, 2 paths will be lost from the Primary server, leaving 2 remaining paths from the Alternate SERVICE domain server.
As a direct loss of one of the SERVICE domain servers, potentially some of the writes will now hang within the GUEST domain.
Without manual configuration of the virtual disk client (VDC) implementation from the default design, the VDC driver will block I/O. Normal service is automatically resumed when the impacted "SERVICE" returns to a normal operational service.
Error Message
To iosolate the issue a system crash dump is obtained from the Solaris based GUEST domain server, whilst a simple "cp" command line operation is in progress on the GUEST server when the Primary domain server is rebooted.
In this instance, the Solaris crash dump is analyzed using the Solaris SCAT utility.
Sample output
SolarisCAT(vmcore.0/10V)> proc | grep cp0x6001e59d9b0 8850 8823 0 2383872 1867776 344064 1 cp /usr/bin/7z /usr/bin/7za /usr/bin/7zr /usr/bin/acctcom /usr/bin/adb /usr/bin0x6001b897248 8767 8663 0 2465792 1810432 286720 0 cp /usr/lib/0@0.so.1 /usr/lib/32 /usr/lib/64 /usr/lib/7z /usr/lib/abi /usr/lib/The process address for the cp command is specificed with the proc operation to extract the thread address.SolarisCAT(vmcore.0/10V)> proc 0x6001e59d9b0 addr PID PPID RUID/UID size RSS swresv time command============= ====== ====== ========== ========== ======== ======== ====== =========0x6001e59d9b0 8850 8823 0 2383872 1867776 344064 1 cp /usr/bin/7z /usr/bin/7za /usr/bin/7zr /usr/bin/acctcom /usr/bin/adb /usr/bin thread: 0x3000ff9a7a0 state: slp wchan: 0x30003614648 sobj: condition var (from vdc:vdc_send_request+0xfc)The thread output is then shown as below, indiciated that the issue resides within the VDC layer. SolarisCAT(vmcore.0/10V)> thread 0x3000ff9a7a0==== user (LWP_SYS) thread: 0x3000ff9a7a0 PID: 8850 ====cmd: cp /usr/bin/7z /usr/bin/7za /usr/bin/7zr /usr/bin/acctcom /usr/bin/adb /usr/bint_wchan: 0x30003614648 sobj: condition var (from vdc:vdc_send_request+0xfc)t_procp: 0x6001e59d9b0 p_as: 0x6001d726bd0 size: 2383872 RSS: 1867776 hat: 0x3000fe5d680 cnum: CPU0:1/19630 cpusran: 3 zone: globalt_stk: 0x2a101ba3ae0 sp: 0x2a101ba19a1 t_stkbase: 0x2a101b9e000t_pri: 60(TS) t_tid: 1 pctcpu: 0.000098t_lwp: 0x6001359c0d0 machpcb: 0x2a101ba3ae0 mstate: LMS_SLEEP ms_prev: LMS_SYSTEM ms_state_start: 2 minutes 13.82255781 seconds earlier ms_start: 2 minutes 13.84495807 seconds earlierpsrset: 0 last CPU: 3idle: 13382 ticks (2 minutes 13.82 seconds) <------------------------------------------start: Wed May 9 01:38:18 2012age: 134 seconds (2 minutes 14 seconds)syscall: #4 write(, 0xffbe4dd8) (sysent: genunix:write32+0x0)tstate: TS_SLEEP - awaiting an eventtflg: T_DFLTSTK - stack is default sizetpflg: TP_TWAIT - wait to be freed by lwp_wait TP_MSACCT - collect micro-state accounting informationtsched: TS_LOAD - thread is in memory TS_DONT_SWAP - thread/LWP should not be swappedpflag: SMSACCT - process is keeping micro-state accounting SMSFORK - child inherits micro-state accountingpc: genunix:cv_wait+0x38: call unix:swtchgenunix:cv_wait+0x38(, 0x30003614640, 0x1, 0x0, 0x1)vdc:vdc_send_request+0xfc(0x30003614640, 0x2, 0x3001059e000, 0x10000, 0x2, 0x53ab4, 0x6001fc6dcc0, 0x1, 0x7) <------- stuck in the vdc driver !!!!!
vdc:vdc_do_op+0x64(0x30003614640, 0x2, 0x3001059e000, 0x10000, 0x2, , 0x6001fc6dcc0, 0x1, 0x7)
vdc:vdc_strategy+0xcc(0x6001fc6dcc0)genunix:bdev_strategy() - frame recycledvxdmp:gendmpstrategy+0x43c(0x6001fc6dcc0)vxio:vol_dev_strategy(0x6001fc6dcc0) - frame recycledvxio:voldiskiostart+0x530(0x6001c10a900, 0x6001337b880, 0x0, 0x53ab4, 0x80, 0x0)vxio:vol_subdisksio_start+0x534(0x6001c10a900, 0x2a101ba2bb8)vxio:volkcontext_process+0xcc(0x2a101ba2bb8)vxio:volkiostart+0xbe4(0x6001e2f4000, 0x2a101ba2bb8, 0x0)vxio:vxiostrategy+0x74(0x6001e2f4000, 0x1999c00, 0x1c, 0x1c)vxfs:vx_snap_strategy(, 0x6001e2f4000, 0x0) - frame recycledvxfs:vx_io_startnowait+0x484(0x6001b868b00, 0x6001e2f4000)vxfs:vx_io_start+0x18(0x6001b868b00, 0x6001e2f4000, 0x7aba3000, 0x7aba3, 0x7a800, 0x400)vxfs:vx_flush_pages+0x280(0x6001ff6b340, 0x0, 0x400, 0x0)vxfs:vx_putpage_dirty+0xec(0x6001ff70000, 0x30000, 0x10000, 0x400, 0x0, 0x0)vxfs:vx_do_putpage+0xb8(0x6001ff6b340, 0x30000, 0x10000, 0x400, 0x0)vxfs:vx_write_flush+0x148(0x6001ff70000, 0x40000, 0x400, 0x1)vxfs:vx_write_default+0x8a0(0x6001ff70000, 0x2a101ba3a98, 0x0, 0x644cc, 0x1, 0x0, , , 0x0)vxfs:vx_write1+0xed8(0x6001ff6b340, 0x2a101ba3a98, 0x0, 0x6001083ebd8, 0x1, 0x6001248c080)vxfs:vx_write_common_slow+0x784(0x6001ff6b340, 0x2a101ba3a98, 0x0, 0x0, 0x6001083ebd8, 0xf044)vxfs:vx_write_common+0x508(0x6001ff6b340, 0x2a101ba3a98, 0x0, 0x0, 0x0, 0x0, 0x6001083ebd8)vxfs:vx_write+0x28(0x6001ff6b340, 0x2a101ba3a98, 0x0, 0x6001083ebd8, 0x0, 0x8000)genunix:fop_write+0x20(0x6001ff6b340, 0x2a101ba3a98, 0x0, 0x6001083ebd8, 0x0)genunix:write+0x268(0x5)unix:syscall_trap32+0xcc()-- switch to user thread's user stack --
Cause
Virtual Disk Timeout
By default, if the "SERVICE" domain providing access to a virtual disk backend is down, all I/O from the guest domain to the corresponding virtual disk is blocked. The I/O automatically is resumed when the "SERVICE" domain is operational and is servicing I/O requests to the virtual disk backend.
However, there are some cases when file systems or applications might not want the I/O operation to block, but for it to fail and report an error if the service domain is down for too long. It is now possible to set a connection timeout period for each virtual disk, which can then be used to establish a connection between the virtual disk client on a guest domain and the virtual disk server on the service domain. When that timeout period is reached, any pending I/O and any new I/O will fail as long as the service domain is down and the connection between the virtual disk client and server is not reestablished.
This timeout can be set by doing one of the following:
Using the ldm add-vdisk command.
ldm add-vdisk timeout=seconds disk-name volume-name@service-name ldom
Using the ldm set-vdisk command.
ldm set-vdisk timeout=seconds disk-name ldom
Specify the timeout in seconds. If the timeout is set to 0, the timeout is disabled and I/O is blocked while the service domain is down (this is the default setting and behavior).
Alternatively, the timeout can be set by adding the following line to the /etc/system file on the guest domain.
set vdc:vdc_timeout=seconds
Reference: http://docs.oracle.com/cd/E19604-01/821-0406/virtualdisktimeout/index.html
Note: If this tunable is set, it overwrites any timeout setting done using the ldm CLI. Also, the tunable sets the timeout for all virtual disks in the guest domain.
Solution
In an effort to resolve the hung state in the GUEST domain environment, the vdc_timeout kernel tunable can be used to define the timeout value for all disks residing in the GUEST domain by adding an entry to the /etc/system file.
Note: The vdc_timeout setting equates to the number of seconds before the timeout is reached.
If vdc_timeout is 0 then no timeout is set.
To set the vdc_timeout, add the following line entry in the /etc/system file on the GUEST domain and reboot the GUEST domain for the amendment to take effect:
set vdc:vdc_timeout=30
In the above instance, the timeout is set to 30 seconds.
Once the "vdc_timeout" setting has been defined and the server rebooted, the hung state should no longer occur in the GUEST domain.
Applies To
Array configuration:
The IBM SVC arrays must be configured in ALUA mode as specified in the Hardware article:
000031529
LDOMs
Solaris Logical Domains (LDoms) is a virtualization technology on the Solaris SPARC platform that enables the creation of independent virtual machine environments on the same physical system.
LDoms are a virtualized computing environment abstracted from all physical devices, which allow you to consolidate and centrally manage your workloads on one system.
The logical domains can be specified roles such as a Control domain, Service domain, I/O domain , and Guest domain.
Each domain is a full virtual machine where the operating systems can be started, stopped, and rebooted independently.
https://sort.veritas.com/public/documents/sf/5.1/solaris/pdf/sfha_virtualization.pdf
Figure 1.0
Storage Foundation (SF) stack model with Solaris Logical Domains
Note: The SERVICE domain may be referred to as the CONTROL domain.
VDC design
The default behavior for the virtual disk client (VDC) is that I/O will be blocked in the event of loss of connectivity/standard operational performance to a SERVICE domain. If the VDC timeout property can be set to have the I/O timed out. When this property is not present or set to 0 then there's no timeout, and I/O will not timeout.
A virtual disk contains two components:
The virtual disk itself will appear in the GUEST domain, and the virtual disk backend, which is where the data is stored and where virtual I/O ends up.
The virtual disk backend is exported from a SERVICE domain by the virtual disk server (vds) driver. The vds driver communicates with the virtual disk client (vdc) driver in the GUEST domain through the hypervisor using a logical domain channel (LDC).
A virtual disk appears as /dev/[r]dsk/cXdYsZ devices in the GUEST domain.
Note: The virtual disk backend can be a physical disk, a physical disk slice, a file, a volume from a volume management framework, such as the Zettabyte File System (ZFS), Solaris Volume Manager (SVM), Veritas Volume Manager (VxVM), or any disk pseudo device accessible from the service domain.