FSS node join failed after adding new VMDK disk to Virtual Machine

Article: 100076689
Last Published: 2026-01-30
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

FSS node join failed after adding new VMDK disk to Virtual Machine. 

This issue is currently reported in Infoscale 7.4.2 and occurs when a node is outside of the cluster while new vmdk LUNs are added to the servers, followed by the execution of either dmpdr refresh or vxddladm assign names

Error Message

Slave node join failed with below messages in OS syslog

Jan 28 20:54:05 rhel81 vxvm:vxconfigd[58555]: V-5-1-18996 setup_remote_disks: da_online failed with error 142 for remote disk VMware%5FVirtual%20disk%5Fvmdk%5F6000C29AF0D8CC6AC1B94184A09BE356, returning error VE_REMOTE_DA_FAIL
Jan 28 20:54:05 rhel81 vxvm:vxconfigd[58555]: V-5-1-19007 slave_response: setup_remote_disks failed with return value 478. Aborting join.
Jan 28 20:54:05 rhel81 vxvm:vxconfigd[58555]: V-5-1-11092 cleanup_client: (Slave failed to create remote disk) 478
Jan 28 20:54:05 rhel81 vxvm:vxconfigd[58555]: V-5-1-11467 kernel_fail_join() : #011#011Reconfiguration interrupted: Reason is retry to add a node failed (13, 0)
Jan 28 20:54:05 rhel81 kernel: VxVM vxio V-5-0-164 Failed to join cluster test, aborting :
Jan 28 20:54:05 rhel81 kernel: VxVM vxio V-5-3-1250 joinsio_done: Node aborting, join for node 0 being failed :
Jan 28 20:54:05 rhel81 kernel: VxVM vxio V-5-3-672 abort_joinp: aborting joinp for node 0 with err 11 :
Jan 28 20:54:05 rhel81 kernel: VxVM vxio V-5-3-1250 joinsio_done: Node aborting, join for node 1 being failed :
Jan 28 20:54:05 rhel81 kernel: VxVM vxio V-5-3-672 abort_joinp: aborting joinp for node 1 with err 11 :

Cause

In FSS cluster configured on VMWare hypervisor environment, on adding new vmdk disk to a cluster node using dmpdr tool regenerates all local DMP device names which causes inconsistencies in disk connectivity map in kernel. That further fails IOs issued on REMOTE disks and also causes node join failure.

Solution

The following workaround steps are recommended to overcome the problem (i.e. joining the slave node to the cluster)

1) For existing DMP disks, user-defined names can be set without downtime. However, if the system already shows problems such as incorrect or duplicated names, disks in error states, or similar issues, a reboot is required first to correct them. After reboot, user-defined names can be assigned to each disk. 

Note :
A reboot is required only if the system already has issues such as disks in error state, duplicate darecs, or renamed danames from previous VMDK additions. If no such problems exist, a reboot is not necessary.

2) Add new VMDK disks after ensuring the above step. Once the disk is Visible to VxVM, assign a unique user-defined name to the new disk before including it in a disk group.

Please find below command to set a user-defined name for the DMP device using:

Command :
vxdmpadm setattr dmpnode <dmpnodename> name=<unique_name_within_cluster>

Example:
vxdmpadm setattr dmpnode rhel77_vmdk0_0 name=rhel77_vmdk0_001

Note: These names are stored in /etc/vx/dmpnames.info and persist across reboots. Running vxddladm -c assign names will clear the entry, so kindly avoid using this command.

 

If the dmp device names are consistently maintained across the cluster, the above issue can be avoided.

 

A supported hotfix has been made available for this issue. Please contact Technical Support to obtain this fix. This hotfix has not yet gone through any extensive Q&A testing. Consequently, if you are not adversely affected by this problem and have a satisfactory temporary workaround in place, we recommend that you wait for the public release of this hotfix. 

 

References

JIRA : STESC-9848

Was this content helpful?