After the slave node was rebooted or done the operation such as device scanning, The slave node failed to join the cluster with a "Cannot find disk" error.
Everything is fine at present if customer executed the hastop –local/hastart and the vxdctl enable in case of this problem is occurred.
From the engine_A.log:
2011/03/31 19:01:19 VCS ERROR V-16-10001-1005 (xxxx)
CVMCluster:???:monitor:node - state: out of cluster reason: Cannot find disk on
slave node: retry to add a node failed
From the messages:
Mar 31 19:01:11 xxxx vxio: [ID 567674 kern.notice] NOTICE: VxVM vxio V-5-3-0
joinsio_done: Overlapping reconfiguration, failing the join for node 1. The
join will be retried.
Mar 31 19:01:11 xxxx vxio: [ID 317193 kern.notice] NOTICE: VxVM vxio V-5-3-0
abort_joinp: aborting joinp for node 1 with err 17
Mar 31 19:01:11 xxxx vxvm:vxconfigd: [ID 702911 daemon.notice] V-5-1-12144
CVM_VOLD_JOINOVER command received with error
We noticed all shared disks on slave node are in error state, before joining the cluster we need ensure the shared disks could be accessed from slave node.
c1t50060E80035BDF10d4s2 auto - - error
Even prtvtoc could not read the OS raw disk.
prtvtoc: /dev/rdsk/c1t50060E80035BDF10d4s2: Unable to read Disk geometry errno = 0x16
It looks like an array. OS raw device could not be read as abnormal, there may have a delay in device recognizing.
As a result of double checking to see whether the required condition is met, We found out that there was an unsupported configuration on the array side regarding the required conditions described in our HCL. The machine has set the system mode option to only a “186” compared to our recommendation on the HCL and then Customer let Hitachi engineer changed it to “186, 254”. After that, We tried to do the same test again then the same issue didn’t occur again and everything is fine. Consequently, we did turn out this problem didn’t cause from our side.
2 nodes SFRAC 5.0 MP3RP4 cluster
Solaris 10 Sparc