When Oracle Clusterware (10g Release 2 or 11g Release 1) panics the server due to issues such as loss of storage access to voting disks, it fails to perform a core dump due to incorrect settings in the Oracle Clusterware init scripts.
This issue applies to AIX systems only.
The SLOW_REBOOT parameter in the Oracle Clusterware init script (init.cssd) is incorrectly set to “fast boot” for vendor clusterwares.
As a result, the system fails to perform a core dump when panicked by Oracle Clusterware. This causes difficulties in root cause analysis and troubleshooting of the issue.
Modify the init.cssd script to update the SLOW_REBOOT parameter as follows:
Replace the following existing line:
SLOW_REBOOT="/bin/kill -HUP `$CAT /etc/syslog.pid`; /bin/sync & $SLEEP 2; /usr/sbin/fastboot -n -q"
With the following line:
SLOW_REBOOT="/bin/kill -HUP `$CAT /etc/syslog.pid`; /bin/sync & $SLEEP 2; /usr/bin/sysdumpstart -p"
This update ensures that the core dump is collected if Oracle Clusterware panics the system.
Oracle Clusterware 10g Release 2 or 11g Release 1 on AIX