Some VxVM commands can be very I/O intensive and occasionally panic the system. The root cause of the panic can be determined by examining the system core file. However the system must be previously configured to generate and save the system core file.
If the root cause is an I/O related hang, there may be no indication in the messages file about the hang or panic. A core analysis may be needed.
If the problem is repeatable, then enable kdump and load crash and kernel debug rpms on your machine. In this example we are running Red Hat Enterprise Linux Server release 5.5 (Tikanga). Verify that you have kernel headers, kernel-debuginfo-common and kernel-debuginfo, kdump and crash:
Some of these RPMs are on the install disk, others must be downloaded from RedHat at: ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/
For kernel-debug and kernel-debug-common rpms.
First, configure kdump. Few admins seem to do this on linux system, but it can be done with the graphic GUI tool : /usr/bin/system-config-kdump. Using this utility will reserve 128 MB from your system memory for the "crash kernel" that does the dump.
Add or modify /etc/sysctl.conf to add these parameters:
# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 1
# Enable auto system reboot after system crash
kernel.panic = 60
Set these parameters interactivly on the OS command line if desired:
# sysctl -w kernel.sysrq=1
# sysctl -w kernel.panic=60
----- After the Crash ----
In this test case, the panic problem was recreated and the system core was written to /var/crash/<date:time>/vmcore. Crash analysis can begin with the following command:
# crash /boot/System.map-2.6.18-194.el5 /usr/lib/debug/lib/modules/2.6.18-194.el5/vmlinux ./vmcore
This shows that the last command issued was the OS "vol_id" command. (This information is shown by default when the utility is run.)
SYSTEM MAP: /boot/System.map-2.6.18-194.el5
DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.18-194.el5/vmlinux (2.6.18-194.el5)
DATE: Thu Nov 11 13:09:34 2010
LOAD AVERAGE: 0.05, 0.23, 0.12
VERSION: #1 SMP Tue Mar 16 21:52:39 EDT 2010
MACHINE: x86_64 (1596 Mhz)
MEMORY: 2 GB
PANIC: "Oops: 0000  SMP " (check log for details)
TASK: ffff81005214b820 [THREAD_INFO: ffff810051fe8000]
STATE: TASK_RUNNING (PANIC)
The most useful piece of information is a so called stacktrace, or "backtrace." Typing "bt" at the prompt asks crash/gdb to print one:
PID: 7957 TASK: ffff81005214b820 CPU: 3 COMMAND: "vol_id"
#0 [ffff810051fe9730] crash_kexec at ffffffff800aeb6b
#1 [ffff810051fe97f0] __die at ffffffff80066157
#2 [ffff810051fe9830] do_page_fault at ffffffff80067dd7
#3 [ffff810051fe9920] error_exit at ffffffff8005ede9
[exception RIP: part_round_stats+19]
RIP: ffffffff801447a1 RSP: ffff810051fe99d8 RFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff81007ab57ac0 RCX: d600000000000000
RDX: 0000000000000000 RSI: 8000000000000000 RDI: ffff81007ab57ac0
RBP: 0000000100000d7e R8: 000000000000000f R9: 0000000000000000
R10: ffff810009930388 R11: ffffffff8014c80a R12: 0000000000000000
R13: 0000000000000001 R14: 00000000013efd00 R15: 0000000000800032
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#4 [ffff810051fe99f0] drive_stat_acct at ffffffff80144969
It shows that the exception occurred in part_round. Searching RedHat for these codes gives a possible match for a known bugzilla:
This concludes the research necessary to find the cause of the crash. In this case, Redhat will provide a fix or workround to this problem.
This document uses the following configuration:
Redhat 5.5, SF 5.1, EMC Clariion Disk with multipath.