LVMVolumeGroup resource faults when a node is rebooted.

Article: 100027706
Last Published: 2012-09-24
Ratings: 2 0
Product(s): InfoScale & Storage Foundation

Problem

LVMVolumeGroup resources could possibly fault if a passive node in a cluster is rebooted.

Error Message

VCS ERROR V-16-2-13067 (node-name) Agent is calling clean for resource(vg_resource_name) because the resource became OFFLINE unexpectedly, on its own.

Cause

Concepts:

An LVM Volume Group by nature will be in a "read/write" state (when imported via command `vgimport`) on all nodes that are accessing the shared storage, which have an LVM Volume Group configured on it. With VCS configured to monitor and manage LVM Volume Group(s) this means that concurrency violations would occur. Due to this concept, LVM Volume Group(s) required the disk reservation agent and did not support Multi-pathing. The disk reservation agent would avoid concurrency violations by disallowing access to the shared storage which have an LVM Volume Group on it on the passive node.  SCSI2 reservations does not support multi-pathing, which means no multi-pathing solution is supported when disk reservation is used.

LVM tagging removed the above restrictions. With LVM Volume Group tagging users can configure LVM Volume Group with a tag associating that LVM Volume Group to a specific node; thus removing the no multi-pathing and disk reservation requirements. With LVM Volume Group tagging LVM Volume Group(s) is supported with VERITAS DMP (Dynamic Multi-Pathing) and without the disk reservation agent.

KNOWN ISSUES:


Even though LVM Volume Group(s) support tagging. A user can still run command `vgexport` on a passive node. This essentially means that the active node in the cluster that has the LVM Volume Group(s) online will detect the LVM Volume Group resource offline outside of VCS when the monitor cycle kicks in.

When using Native VxDMP for a multi-pathing solution. There is a script that gets initiated at boot time. The script is called vxvm-boot. This script will initiate the following commands.

vgchange -a n vg
vgexport vg
vgimport vg
vgchange -a y vg

The reason for the above commands are due to OS (Operating System) device names that may change across reboots in Linux. To ensure LVM imports VG's (Volume Groups) on a dmpnode we define sd (scsi disk) devices in the filter attribute specified in the lvm.conf to reject devices of type "/dev/sd*" (paths under a dmpnode) to be used by LVM. After a reboot if the OS device names changes, then the filter created does not work and the LVM Volume Group(s) gets imported on OS devices rather than on a VERITAS dmpnode. To ensure LVM VG's are imported on dmpnode(s) the `/etc/init.d/vxvm-boot start` initiated on boot up will issue the deport and import operations.

Solution

SOLUTION:
Set the LVMVolumeGroup Agent ToleranceLimit to 1.

haconf -makerw
hatype -modify LVMVolumeGroup ToleranceLimit 1
haconf -dump -makero

Note: The default setting for the ToleranceLimit is 0. This means that when a resource is monitored and determined offline the resource will be faulted immediately. Setting the ToleranceLimit to 1 will result in an informational message on initial offline determination.

VCS INFO V-16-2-13075 (node-name) Resource(vg-resource-name) has reported unexpected OFFLINE 1 times, which is still within the ToleranceLimit(1).

On the next monitor cycle if the resource is still determined offline the resource will be faulted.

Engineering has put in a request to Redhat for enhancement of LVM filtering mechanism.

 

Note: This only affects users using VERITAS Native DMP with LVM Volume Groups and VCS. Users using VERITAS Native DMP and LVM Volume Group will not encounter an issue.


Applies To

LVMVolumeGroup {Logical Volume Management)
VxDMP (VERITAS Dymanic Multi-Pathing)
VCS ( VERITAS Cluster Server ) with LVMVolumeGroup resource configured.

 

Was this content helpful?