Failure to (re)import a single shared diskgroup during Cluster Volume Manager master takeover results in all shared diskgroups being disabled and node eviction from cluster

  • Article ID:100008863
  • Modified Date:
  • Product(s):

Problem

During Cluster Volume Manager (CVM) master takeover, all shared diskgroups undergo a re-import. Volume Manager (VxVM) disables all shared diskgroups (dgdisable) when a single shared diskgroup fails to import. It also results in the corresponding node (new master) leaving the cluster.

Note: If the cause of the diskgroup import failure is common to all nodes in the cluster - this can result in cascading master takeover failures resulting in cluster-wide failure.
Ex:3 -node cluster : nodeA, nodeB & nodeC with shared diskgroups dgA & dgB
-nodeA is shutdown
-nodeB master takeover is attempted
-In nodeB, during re-import if import of shared diskgroup dgA fails with the below error messages then all the shared diskgroups will be disabled and nodeB is evicted.
vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-16066 da_dg_reimport: disk <disk id> not found
vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-0 dg_import_master: failed to import dg <diskgroup> , error 183
vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-0 master_takeover: error in disk group reimport: Disk for disk group not found, errno 0
 
- Like nodeB, noeC will encounter the same scenario and will be evicted resulting in total cluser outage.

Error Message

Aug 11 22:38:35 hostname vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-7934 Disk group dgname1: Disabled by errors
Aug 11 22:38:35 hostname vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-7934 Disk group dgname2: Disabled by errors
 

Cause

The code path traversed during CVM master takeover exposed a bug that resulted in all shared diskgroups being disabled due to a single shared diskgroup's failure to import.

Solution

Code changes were made to disable only the specific shared diskgroup that experienced the failure. Re-import will skip the failed diskgroup and continue importing the rest of shared diskgroups thus preventing node eviction from the cluster.

WORKAROUND: NONE 

FIX INTEGRATED IN THE FOLLOWING PATCH(es)/VERSIONS :

-6.0RP1HF1

-5.1SP1RP3 

 

Patch links for SFHA 5.1SP1RP3

AIX

https://sort.Veritas.com/patch/detail/6806

AIX 7.1

https://sort.Veritas.com/patch/detail/6807


Solaris SPARC

https://sort.Veritas.com/patch/detail/6816

https://sort.Veritas.com/patch/detail/6817


solaris x64

https://sort.Veritas.com/patch/detail/6818

https://sort.Veritas.com/patch/detail/6819


RHEL5 x86_64

https://sort.Veritas.com/patch/detail/6808

https://sort.Veritas.com/patch/detail/6809


RHEL6 x86_64

https://sort.Veritas.com/patch/detail/6814

https://sort.Veritas.com/patch/detail/6815


SLES10 x86_64

https://sort.Veritas.com/patch/detail/6811


SLES11 x86_64

https://sort.Veritas.com/patch/detail/6813
 


Applies To

PLATFORMS:  ALL : Solaris, HPUX, LINUX & AIX 

VxVM versions: 5.0.x ; 5.1.x & 6.0.x

Related Articles

Cluster Volume Manager (CVM) master takeover is prone to failures when a mix of clone and standard devices are present in a diskgroup

Was this content helpful?

Get Support