Important Update: Cohesity Products Knowledge Base Articles


All Cohesity Knowledge Base Articles are now managed via the Cohesity Support Portal: https://support.cohesity.com/s/searchunify. The Knowledge Base articles available here will not reflect the latest information or may no longer be accessible.

Vxstat can cause a system panic in InfoScale 7.4.1 and 7.4.2 "Kernel panic - not syncing: Hard LOCKUP"

Article: 100062756
Last Published: 2024-01-30
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

Vxstat can cause a system panic in InfoScale 7.4.1 and 7.4.2 "Kernel panic - not syncing: Hard LOCKUP".

Vxstat can be executed on its own by the user and it also gets executed in the background as part of other operations within VxVM. For example running a VRTSexplorer will cause vxstat to be executed in the background. If the bug is present then execution of vxstat can lead to a system panic.

 

Error Message

The following panic string was seen.

PANIC: "Kernel panic - not syncing: Hard LOCKUP"
 

Backtrace from the vmcore.

crash> bt
PID: 3615 TASK: ffff8ca06818e300 CPU: 19 COMMAND: "vxstat"
#0 [ffff8ca61efc8980] machine_kexec at ffffffffa7069514
#1 [ffff8ca61efc89e0] __crash_kexec at ffffffffa7129e82
#2 [ffff8ca61efc8ab0] panic at ffffffffa77ab713
#3 [ffff8ca61efc8b30] nmi_panic at ffffffffa709f523
#4 [ffff8ca61efc8b40] watchdog_overflow_callback at ffffffffa7157409
#5 [ffff8ca61efc8b58] __perf_event_overflow at ffffffffa71b32a7
#6 [ffff8ca61efc8b90] perf_event_overflow at ffffffffa71bcd64
#7 [ffff8ca61efc8ba0] handle_pmi_common at ffffffffa700acf0
#8 [ffff8ca61efc8de0] intel_pmu_handle_irq at ffffffffa700afef
#9 [ffff8ca61efc8e38] perf_event_nmi_handler at ffffffffa77bb039
#10 [ffff8ca61efc8e58] nmi_handle at ffffffffa77bc9cc
#11 [ffff8ca61efc8eb0] do_nmi at ffffffffa77bcbed
#12 [ffff8ca61efc8ef0] end_repeat_nmi at ffffffffa77bbdf4
[exception RIP: native_queued_spin_lock_slowpath+470]
RIP: ffffffffa711ec86 RSP: ffff8ca0c6f9faf0 RFLAGS: 00000002
RAX: 0000000000000001 RBX: 0000000000000082 RCX: 0000000000000001
RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff8c99f9ec8f70
RBP: ffff8ca0c6f9faf0 R8: 0000000000000101 R9: 0000000000000040
R10: 00007ffee9828a88 R11: 0000000000000000 R12: 0000000000000013
R13: ffff8ca0c6f9fc50 R14: 00007ffee9828a70 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- ---
#13 [ffff8ca0c6f9faf0] native_queued_spin_lock_slowpath at ffffffffa711ec86
#14 [ffff8ca0c6f9faf8] queued_spin_lock_slowpath at ffffffffa77ac21a
#15 [ffff8ca0c6f9fb08] _raw_spin_lock_irqsave at ffffffffa77ba7bb
#16 [ffff8ca0c6f9fb20] volget_rwspinlock at ffffffffc0bc244a [vxio]
#17 [ffff8ca0c6f9fb40] vol_get_one_io_stat at ffffffffc0c32015 [vxio]
#18 [ffff8ca0c6f9fc20] volinfo_ioctl at ffffffffc0c3698f [vxio]
#19 [ffff8ca0c6f9fca8] volsioctl_real at ffffffffc0ce5cf5 [vxio]
#20 [ffff8ca0c6f9fd80] vols_ioctl at ffffffffc039e452 [vxspec]
#21 [ffff8ca0c6f9fda0] vols_unlocked_ioctl at ffffffffc039e4c1 [vxspec]
#22 [ffff8ca0c6f9fdb0] do_vfs_ioctl at ffffffffa7271918
#23 [ffff8ca0c6f9fe30] sys_ioctl at ffffffffa7271bb1
#24 [ffff8ca0c6f9fe70] unload_network_ops_symbols at ffffffffc037b9e8 [falcon_lsm_pinned_15508]
#25 [ffff8ca0c6f9fee0] unload_network_ops_symbols at ffffffffc0661688 [falcon_lsm_pinned_16004]
#26 [ffff8ca0c6f9ff50] system_call_fastpath at ffffffffa77c539a
RIP: 00007f20fe29b4a7 RSP: 00007ffee982a6b0 RFLAGS: 00010203
RAX: 0000000000000010 RBX: 0000000001b8e030 RCX: 0000000000000000
RDX: 00007ffee9828920 RSI: 00000000564f4c96 RDI: 0000000000000005
RBP: 00007ffee9828a90 R8: 0000000000000005 R9: 0000000000000060
R10: 0000000000000007 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000001b8a190 R14: 00007ffee9828930 R15: 0000000000000000
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
crash>

 

Cause

The crash was caused by a spinlock not released properly during the vxstat collection. 

As part of collecting the IO statistics collection, the vxstat thread acquires a spinlock and tries to copy data to the user space. During the data copy, if some page fault happens, then the thread would relinquish the CPU and provide the same to some other thread. If the thread which gets scheduled on the CPU requests the same spinlock which vxstat thread had acquired, then this results in a hard lockup situation.

 

Solution

Code has been changed to properly release the spinlock before copying out the data to the user space during vxstat collection. The fix is included in the patches listed below.

InfoScale 7.4.1 on RHEL7: https://www.veritas.com/support/en_US/downloads/update.UPD691569

InfoScale 7.4.1 on RHEL8: https://www.veritas.com/support/en_US/downloads/update.UPD356743

InfoScale 7.4.2 on RHEL7: https://www.veritas.com/support/en_US/downloads/update.UPD226123

InfoScale 7.4.2 on RHEL8: https://www.veritas.com/support/en_US/downloads/update.UPD424106

 

References

Etrack : 4010207
Content in the knowledge base article has been created with the assistance of an artificial intelligence language model.

Was this content helpful?