Search <product_name> all support & community content...

5.1 SP1 RP1 executing vxdisk updateudid on a disk which is initially reported as "online udid_mismatch" followed by "online invalid" can result in the vxconfigd daemon core dumping

Article: 100007255

Last Published: 2017-08-10

Ratings: 0 0

Product(s): InfoScale & Storage Foundation

Problem

When attempting to recover a series of failed devices back into an imported diskgroup, the vxdisk updateudid command was executed to clear the harmless udid_mismatch flag.

As a result of running the vxdisk updateudid <da-name> command, the vxconfigd daemon core dumped.

Error Message

Sample pstack output for vxconfigd core file

# cd /

# file core
core: ELF 32-bit MSB core file SPARC Version 1, from 'vxconfigd'

# ls -la core
-rw------- 1 root root 27801826 Nov 19 00:40 core

# pstack core
core 'core' of 59: vxconfigd -x syslog -m boot ----------------- lwp# 1 / thread# 1 -------------------- 00116364 priv_join (8427b8, 6e9150, 6, 2, 36f000, 6daa08) + c 0009efb0 req_disk_updateudid (0, 1, 0, 6e9150, 6c2e58, 6e9148) + 230 00133ef4 request_loop (0, 379f88, 811b20, a8c0, ffffffff, 1537e) + b38 000ffac4 main (36d800, 3cf400, 38d400, 2ec000, ffbffe3c, 0) + fd0 00041cf0 _start (0, 0, 0, 0, 0, 0) + 108 ----------------- lwp# 139 / thread# 139 -------------------- ff1c9594 __lwp_park (0, 3cad38, 0, 0, 6d0dc, 0) + 14 ff1c35d8 cond_wait_queue (3cad28, 3cad38, 0, 0, 1c00, 0) + 4c ff1c3b20 cond_wait (3cad28, 3cad38, 0, 1c00, 0, 3cad38) + 10 ff1c3b5c pthread_cond_wait (3cad28, 3cad38, 0, 0, 3cad38, ff1c2a78) + 8 00134acc vold_dispatch_requests (2, 379c00, 3cad38, 377e80, 3cac00, 3cac00) + 7c ff1c94f0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 140 / thread# 140 -------------------- ff1ccd90 _pause (

Cause

This issue is identified as a product defect.

Solution

Workaround

Following the vxconfigd death state, the daemon can be restarted by typing:

# vxconfigd

To validate that the vxconfigd daemon has been restarted successfully, type:

# vxdctl mode

The command should state that the vxconfigd daemon is in an enabled state:

# vxdctl mode
mode: enabled

Recommendations

To increase the chances of diskgroup recovery, it is recommended that the number of configuration copies saved in the /etc/vx/cbr/bk/<disk_group> directory be increased from the default of "1" (for all VRTSvxvm 5.x releases) to "3" where possible.

The CBR directory can have more than 1 copy by creating the /etc/vx/cbr/bk_config file with "NUM_BK=<value>" for the number of configuration copies to be maintained:

# echo "NUM_BK=3" >> /etc/vx/cbr/bk_config

The above file would increase the number of copy copies to "3" for each diskgroup.

Fixes

Fixes for this issue are included in the following patches :

vm-sol_sparc-5.1SP1RP3P2, https://sort.veritas.com/patch/detail/7443;

vm-sol10_x64-5.1SP1RP3P2, https://sort.veritas.com/patch/detail/7444;

sfha-aix-5.1SP1RP4, https://sort.veritas.com/patch/detail/7886;

sfha-aix71-5.1SP1PR1RP4: https://sort.veritas.com/patch/detail/7887

sfha-rhel5_x86_64-5.1SP1RP4, https://sort.veritas.com/patch/detail/7888;

sfha-rhel5_x86_64-5.1SP1PR3RP4, https://sort.veritas.com/patch/detail/7889;

sfha-rhel6_x86_64-5.1SP1PR2RP4, https://sort.veritas.com/patch/detail/7890;

sfha-rhel6_x86_64-5.1SP1PR3RP4, https://sort.veritas.com/patch/detail/7891;

sfha-sles10_x86_64-5.1SP1RP4, https://sort.veritas.com/patch/detail/7892;

sfha-sles11_x86_64-5.1SP1RP4, https://sort.veritas.com/patch/detail/7893;

sfha-sol_sparc-5.1SP1RP4, https://sort.veritas.com/patch/detail/7894;