During Cluster Volume Manager (CVM) master takeover, all shared diskgroups undergo a re-import. Volume Manager (VxVM) disables all shared diskgroups (dgdisable) when a single shared diskgroup fails to import. It also results in the corresponding node (new master) leaving the cluster.
NOTE: If the cause of the diskgroup import failure is common to all nodes in the cluster - this can result in cascading master takeover failures resulting in cluster-wide failure.
Ex:3 -node cluster : nodeA, nodeB & nodeC with shared diskgroups dgA & dgB
-nodeA is shutdown
-nodeB master takeover is attempted
-In nodeB, during re-import if import of shared diskgroup dgA fails with the below error messages then all the shared diskgroups will be disabled and nodeB is evicted.
vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-16066 da_dg_reimport: disk <disk id> not found
vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-0 dg_import_master: failed to import dg <diskgroup> , error 183
vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-0 master_takeover: error in disk group reimport: Disk for disk group not found, errno 0
- Like nodeB, noeC will encounter the same scenario and will be evicted resulting in total cluser outage.
Aug 11 22:38:35 hostname vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-7934 Disk group dgname1: Disabled by errors
Aug 11 22:38:35 hostname vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-7934 Disk group dgname2: Disabled by errors
The code path traversed during CVM master takeover exposed a bug that resulted in all shared diskgroups being disabled due to a single shared diskgroup's failure to import.
Code changes were made to disable only the specific shared diskgroup that experienced the failure. Re-import will skip the failed diskgroup and continue importing the rest of shared diskgroups thus preventing node eviction from the cluster.
FIX INTEGRATED IN THE FOLLOWING PATCH(es)/VERSIONS:
Patch links for SFHA 5.1SP1RP3
PLATFORMS: ALL : Solaris, HPUX, LINUX & AIX
VxVM versions: 5.0.x ; 5.1.x & 6.0.x