Failure to (re)import a single shared diskgroup during Cluster Volume Manager master takeover results in all shared diskgroups being disabled and node eviction from cluster

Problem

During Cluster Volume Manager (CVM) master takeover, all shared diskgroups undergo a re-import. Volume Manager (VxVM) disables all shared diskgroups (dgdisable) when a single shared diskgroup fails to import. It also results in the corresponding node (new master) leaving the cluster.

NOTE: If the cause of the diskgroup import failure is common to all nodes in the cluster - this can result in cascading master takeover failures resulting in cluster-wide failure.
Ex:3 -node cluster : nodeA, nodeB & nodeC with shared diskgroups dgA & dgB
-nodeA is shutdown
-nodeB master takeover is attempted
-In nodeB, during re-import if import of shared diskgroup dgA fails with the below error messages then all the shared diskgroups will be disabled and nodeB is evicted.
vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-16066 da_dg_reimport: disk <disk id> not found
vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-0 dg_import_master: failed to import dg <diskgroup> , error 183
vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-0 master_takeover: error in disk group reimport: Disk for disk group not found, errno 0
 
- Like nodeB, noeC will encounter the same scenario and will be evicted resulting in total cluser outage.

Error Message

Aug 11 22:38:35 hostname vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-7934 Disk group dgname1: Disabled by errors
Aug 11 22:38:35 hostname vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-7934 Disk group dgname2: Disabled by errors
 

Cause

The code path traversed during CVM master takeover exposed a bug that resulted in all shared diskgroups being disabled due to a single shared diskgroup's failure to import.

Solution

Code changes were made to disable only the specific shared diskgroup that experienced the failure. Re-import will skip the failed diskgroup and continue importing the rest of shared diskgroups thus preventing node eviction from the cluster.

WORKAROUND: NONE 

FIX INTEGRATED IN THE FOLLOWING PATCH(es)/VERSIONS
:

-6.0RP1HF1

-5.1SP1RP3 

 

Patch links for SFHA 5.1SP1RP3

AIX

https://sort.symantec.com/patch/detail/6806

AIX 7.1

https://sort.symantec.com/patch/detail/6807


Solaris SPARC

https://sort.symantec.com/patch/detail/6816

https://sort.symantec.com/patch/detail/6817


solaris x64

https://sort.symantec.com/patch/detail/6818

https://sort.symantec.com/patch/detail/6819


RHEL5 x86_64

https://sort.symantec.com/patch/detail/6808

https://sort.symantec.com/patch/detail/6809


RHEL6 x86_64

https://sort.symantec.com/patch/detail/6814

https://sort.symantec.com/patch/detail/6815


SLES10 x86_64

https://sort.symantec.com/patch/detail/6811


SLES11 x86_64

https://sort.symantec.com/patch/detail/6813
 


Applies To

PLATFORMS:  ALL : Solaris, HPUX, LINUX & AIX 

VxVM versions: 5.0.x ; 5.1.x & 6.0.x

Terms of use for this information are found in Legal Notices.

Search

Survey

Did this article answer your question or resolve your issue?

No
Yes

Did this article save you the trouble of contacting technical support?

No
Yes

How can we make this article more helpful?

Email Address (Optional)