On AIX, HP-UX and LINUX platforms, any disk of size greater than 1 terabyte, initialized as CDS (Cross-Platform Data Sharing ) disk under VxVM, may experience block level data corruption at the 1 terabyte boundary. The data corruption pattern would match with the CDS disk’s backup label which is similar to the Solaris SMI label.
Any CDS disk resize from less than 1 terabyte to more than 1 terabyte can also lead to this issue.
User Data overwritten by the following text pattern.
<DISK-IDENTIFICATION> cyl <number-of-cylinders> alt 2 hd <number-of-tracks> sec <number-of-sectors-per-track>
"DISKARRAY ABC cyl 65533 alt 2 hd 24 sec 424"
For VxVM versions prior to 5.1SP1, the CDS (Cross-Platform Data Sharing) implementation is based on the Solaris sparc VTOC (also referred to as the Volume Table Of Contents) disk labeling scheme and the Solaris VTOC disk label does not support physical disks greater than 1 terabyte. CDS disk label depicts the details of disk geometry (the disk geometry consists of number of cylinders, heads, sectors and size of the sector, and product of these values gives the disk capacity). The disk geometry and capacity can also be obtained through SCSI mode sense commands and in general, values obtained through these two methods should match.
On AIX, and LINUX, for disks greater than 1 terabyte, if disk capacity which is calculated based on raw geometry (geometry collected from SCSI mode sense command) does not match with the disk capacity which is calculated based on the disk’s label then there is a possibility that VxVM CDS disk size would be greater than the size depicted from disk’s label. VxVM CDS disk’s backup labels will be written on the last track of last cylinder, which is calculated based on the primary CDS disk label residing in block 0. For disk of size greater than 1 terabyte, VxVM disk size (private region and public region) would be greater than 1 terabyte and in that case public region or user data region may span over 1 terabyte and CDS backup labels will be written in the public region or user data since the maximum disk size permitted by a CDS disk label is 1 terabyte.
A disk/diskgroup flush or VxVM disk online (refreshes the private region) operation can possibly re-write the CDS backup label in public or user data region.
The following is a description of the problem according to the Etrack incident listed in the Supplemental Materials section of this article.
The CDS disk maintains a SUN vtoc in the zeroth block of the disk. This VTOC maintains the disk geometry information like number of cylinders, tracks and sectors per track. These values are limited by a maximum of 65535 by design of SUN's vtoc, which limits the disk capacity to 1TB. As per SUN's requirement, few backup VTOC labels have to be maintained on the last track of the disk.
VxVM 5.0 MP3 RP3 allows to setup CDS disk on a disk with capacity more than 1TB. The data region of the CDS disk would span more than 1TB utilizing all the accessible cylinders of the disk. As mentioned above, the VTOC labels would be written at zeroth block and on the last track considering the disk capacity as 1TB. The backup labels would fall in to the data region of the CDS disk causing the data corruption.
Suppress writing the backup labels to prevent the data corruption.
Move (relocate) the user data to CDS disk of size less than 1 terabyte.
BEST PRACTICES OR RECOMMENDATIONS:
Upgrade VxVM version to VxVM 5.1SP1RP4, which supports disk size more than 1 terabyte across platforms with EFI-GPT label . VxVM 5.1SP1RP4 also allows CDS migration to other UNIX platforms.
Please run both scripts to determine if your server can run into both issues.
CPI HF has been posted in the following links
For 5.1SP1: https://sort.veritas.com/patch/detail/6559
CPI HF will detect luns attached to server that can run into issue and proactively determine if vxconfigd should be started or not. This will allow customers to continue upgrade to 5.1SP1RP4 without the risk or possible diskgroup corruption.
Download CPI HF and upgrade as follows:
Upgrade to 5.1SP1: ./installer –require <location of 5.1SP1 CPI HF> -nostart
Upgrade to 5.1SP1RP4: ./installrp -require <location of 5.1SP1RP4 CPI HF> -nostart
Since drive geometry can vary, the exact location of corruption can also vary from one drive to another.
cds_largelun_verify.sh (a diagnostic tool) script will help determine if any VxVM CDS disk is affected by this issue and if it is, then this tool tells an exact location of the possible data corruption. This tool will also display those specific data blocks for the user to determine if the data corruption is because of the VxVM commands writing the backup label.
The behavior of CDS disk initialization on AIX, SOLARIS, LINUX and HP-UX are as follows:
VxVM disk public region is determined from raw geometry in older version of VxVM releases. Raw geometry does not place any limits on disk size and hence VxVM disk public region can exceed more than 1 terabyte size for CDS disk of size greater than 1 terabyte.
VxVM disk public region size is always determined using the data obtained from the Solaris sparc/fdisk(x86) label in all the VxVM releases and not determined from raw geometry (SCSI mode sense) as it's been done on AIX. Solaris vtoc/fdisk label cannot support more than 1 terabyte and hence it is deduced that VxVM disk public region size will not exceed 1TB on CDS disk.
VxVM disk public region size is always determined using the data obtained from the existing Solaris vtoc/GPT label/MSDOS label. Linux allows creation of Solaris VTOC such that s2 size is up to 2 terabyte.
GPT label does not place any size limits and hence VxVM disk public region can exceed 1 terabyte on CDS disks on Linux platform.
VxVM disk public region is determined from raw geometry in older version of VxVM releases. Raw geometry does not place any limits on disk size and hence VxVM disk public region can exceed 1 terabyte size for CDS disk of size greater than 1 terabyte.