ProblemDiskReservation Agent and reservation conflict error messages.
Error MessageApr 2 03:12:25 sprs1950a0-27 kernel: sd 1:0:0:0: reservation conflict
Apr 2 03:12:25 sprs1950a0-27 kernel: sd 1:0:0:0: SCSI error: return code = 0x00000018
Apr 2 03:12:25 sprs1950a0-27 kernel: end_request: I/O error, dev sdc, sector 0
How the DiskReservation Agent works:
The diskreservation agent is used to prevent access to a device(s). It does this by setting a scsi2 reservation on the device(s) this prevents other nodes from being able to access the device(s). Any access to the device via an ioctl call from any node that has not exclusively set the scsi2 reservation will return status RESERVATION CONFLICT. The error message will be logged to console and /var/log/messages. The scsi "reservation conflict" error message is from the scsi midlayer driver and the "diskres" driver cannot prevent this message from not occurring.
Note any 3rd party application program, script or command that scans/probes a device(s) which has a scsi2 reservation exclusively set will produce a "reservation conflict" error. Commands like vgscan, lvscan, fdisk, sfdisk, `vxdctl enable` are typicall commands that will scan/probe a device(s).
How the LVMVolumeGroup and LVMLogicalVolume agents use the DiskReservation agent:
The LVMVolumeGroup and LVMLogicalVolume agent requires that the DiskReservation be configured. The DiskReservation resource is required because LVM Volume Groups are always active on all nodes. This means that LVM Volume Groups are accessible on any node at any time. For VERITAS CLUSTER SERVER this is an issue as it would result in a Concurrency Violation. To workaround this problem the DiskReservation agent is used to ensure that only one node has access to the LVM Volume Group.
Workaround to minimize the errors produced by our VCS Agent monitor programs when using the DiskReservation agent in conjunction with LVMVolumeGroup and LVMLogicalVolume agent resources.:
When Our LVMVolumeGroup and LVMLogicalVolume are configured under VERITAS Cluster Server we use LVM commands vgscan and lvscan to get the status of the resource(s) during the OfflineMonitorInterval, because the device(s) has SCSI2 reservation set the OfflineMonitorInterval will probe and the scsi midlayer driver will produce errors and log to console and messages. You can alleviate the errors by disabling the OfflineMonitorInterval or by extending the OfflineMonitorInterval from default 300 (5 minutes) to 600 (10 minutes).
Disabling the OfflineMonitorInterval will prevent the scsi reservation conflict errors from reporting to console and messages when the OfflineMonitor routine runs.
hatype -modify DiskReservation OfflineMonitorInterval 0
hatype -modify LVMVolumeGroup OfflineMonitorInterval 0
hatype -modify LVMLogicalVolume OfflineMonitorInterval 0
haconf -dump -makero
hatype -modify DiskReservation OfflineMonitorInterval 600
hatype -modify LVMVolumeGroup OfflineMonitorInterval 600
hatype -modify LVMLogicalVolume OfflineMonitorInterval 600
haconf -dump -makero
What is the OfflineMonitorInterval:
- This is the duration (in seconds) between two consecutive monitor calls for an offline resource. If set to 0, offline resources are not monitored.
- The default is 300 seconds for most resource types.
Symantec recommendation is to leave the default settings, but if the administrator wants to minimize the reservation conflict messages produced bye the scsi midlayer driver the above can be configured.