RAC cluster panic, duplicate "init.cssd" process running

Problem

 When starting the cluster engine, node rebooted.

Found that there are duplicate "init.cssd" process running

Error Message

 DIAGNOSTIC STEPS:

core file: /evidence/mtv/51/419-015-751/2012-08-15/vmcore.0
user: Super-User (root:0)
release: 5.10 (64-bit)
version: Generic_142909-17
machine: sun4v
node name: usbdc3me003
hw_provider: Sun_Microsystems
system type: SUNW,T5240 (UltraSPARC-T2+)
hostid: 8534c92e
dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/dsk/c1t0d0s1(16G)
time of crash: Wed Aug 15 16:28:13 GMT 2012
age of system: 653 days 13 hours 54 minutes 1.78 seconds
panic CPU: 23 (64 CPUs, 31.7G memory, 1 nodes)
panic string: forced crash dump initiated at user request

------------------------------

panic string: forced crash dump initiated at user request
==== panic user (LWP_SYS) thread: 0x30078b48860 PID: 2047 on CPU: 23 affinity CPU: 23 ====
cmd: /sbin/uadmin 5 1
t_procp: 0x3005c7db2f0
p_as: 0x300a56cd6b0 size: 2686976 RSS: 1662976
hat: 0x30071636480
cnum: CPU16:19475/5986
cpusran: 23
zone: global
t_stk: 0x2a104c99ae0 sp: 0x2a104c990b1 t_stkbase: 0x2a104c94000
t_pri: 59(TS) t_tid: 1 pctcpu: 0.058453
t_lwp: 0x3007e761650 machpcb: 0x2a104c99ae0
mstate: LMS_SYSTEM ms_prev: LMS_USER
ms_state_start: 0.570323688 seconds later
ms_start: 0.549774969 seconds later
psrset: 0 last CPU: 23
idle: 2 ticks (0.02 seconds)
start: Wed Aug 15 16:28:13 2012
age: 0 seconds (0 seconds)
syscall: #55 uadmin(, 0xffbffb18) (sysent: genunix:uadmin+0x0)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_PANIC - thread initiated a system panic
T_DFLTSTK - stack is default size
tpflg: TP_TWAIT - wait to be freed by lwp_wait
TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SMSACCT - process is keeping micro-state accounting
SMSFORK - child inherits micro-state accounting

pc: unix:panic+0x1c: call unix:vpanic

unix:panic+0x1c(0x1299240, 0x1202400, 0x1, 0x183f800, 0x183f800, 0x0)
genunix:kadmin+0x544(, 0x1, 0x0, 0x60031c17d98)
genunix:uadmin+0x11c(, 0x1)
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack --

------------------------------

Walking parent process id tree(PPID)

CAT(vmcore.0/10V)> proc -t 2047
addr PID PPID RUID/UID size RSS swresv time command
============= ====== ====== ========== ========== ======== ======== ====== =========
0x3005c7db2f0 2047 1636 0 2686976 1662976 311296 1 /sbin/uadmin 5 1
thread: 0x30078b48860 state: onpr wchan: 0x0 sobj: undefined
idle: 2 ticks (0.02 seconds)


CAT(vmcore.0/10V)> proc 1636
addr PID PPID RUID/UID size RSS swresv time command
============= ====== ====== ========== ========== ======== ======== ====== =========
0x300fcc870b0 1636 6442 0 1900544 1630208 286720 13 /bin/sh /etc/init.d/init.cssd daemon
thread: 0x30053e055e0 state: slp wchan: 0x300fcc87170 sobj: condition var (from genunix:waitid+0x484)

CAT(vmcore.0/10V)> proc -t 6442
addr PID PPID RUID/UID size RSS swresv time command
============= ====== ====== ========== ========== ======== ======== ====== =========
0x300743faf18 6442 1 0 1900544 1343488 286720 9 /bin/sh /etc/init.d/init.cssd fatal
thread: 0x3005483aee0 state: slp wchan: 0x300743fafd8 sobj: condition var (from genunix:waitid+0x484)
idle: 22 ticks (0.22 seconds)

------------------------------

No busy devices:

CAT(vmcore.0/10V)> dev busy

Scanning for busy devices:
No busy/hanging devices found
Scanning for threads in biowait:

no threads in biowait() found.

Scanning for procs with aio:
CAT(vmcore.0/10V)>

------------------------------
CAT(vmcore.0/10V)> tlist pinned
==== user (LWP_USER) thread: 0x300863c74e0 PID: 14670 on CPU: 8 ====
cmd: ./SunOS/device_config.SunOS
t_procp: 0x300bfd0fa48
p_as: 0x300ae441de8 size: 4259840 RSS: 1875968
hat: 0x3005171e940
cnum: CPU0:88420/97 CPU8:27872/55 CPU16:19475/35 CPU24:20629/53 CPU64:88584/312 CPU72:34825/83 CPU80:30403/110 CPU88:29733/135
cpusran: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95
zone: global
t_stk: 0x2a100137ae0 sp: 0x2a1001372e1 t_stkbase: 0x2a100132000
t_pri: 0(TS) t_tid: 1 pctcpu: 99.992119
t_lwp: 0x3005cd51898 machpcb: 0x2a100137ae0
mstate: LMS_USER ms_prev: LMS_SYSTEM
ms_state_start: 0.570387612 seconds later
ms_start: 390 days 15 hours 14 minutes 16.315036382 seconds earlier
psrset: 0 last CPU: 8
idle: 35 ticks (0.35 seconds)
start: Fri Jul 22 01:08:06 2011
age: 33751207 seconds (390 days 15 hours 20 minutes 7 seconds)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_DFLTSTK - stack is default size
tpflg: TP_TWAIT - wait to be freed by lwp_wait
TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
TS_SIGNALLED - thread was awakened by cv_signal()
pflag: SMSACCT - process is keeping micro-state accounting
SMSFORK - child inherits micro-state accounting

pc: unix:utl0+0x4c: jmpl %l3, %o7 ( call %l3 )

unix:user_rtt+0x0()
-- switch to user thread's user stack --


1 pinned thread found.

------------------------------

Cause

 CAT(vmcore.0/10V)> proc 1636

addr PID PPID RUID/UID size RSS swresv time command
============= ====== ====== ========== ========== ======== ======== ====== =========
0x300fcc870b0 1636 6442 0 1900544 1630208 286720 13 /bin/sh /etc/init.d/init.cssd daemon
thread: 0x30053e055e0 state: slp wchan: 0x300fcc87170 sobj: condition var (from genunix:waitid+0x484)

CAT(vmcore.0/10V)> proc -t 6442
addr PID PPID RUID/UID size RSS swresv time command
============= ====== ====== ========== ========== ======== ======== ====== =========
0x300743faf18 6442 1 0 1900544 1343488 286720 9 /bin/sh /etc/init.d/init.cssd fatal
thread: 0x3005483aee0 state: slp wchan: 0x300743fafd8 sobj: condition var (from genunix:waitid+0x484)
idle: 22 ticks (0.22 seconds)

Solution

 The customer confirmed init.cssd was indeed running twice when the server 

crashed:
1) Started by un as part of VCS startup (hastart) on Aug 15th,
2) Started manually by someone, previously and not under VCS.


Terms of use for this information are found in Legal Notices.

Search

Survey

Did this article answer your question or resolve your issue?

No
Yes

Did this article save you the trouble of contacting technical support?

No
Yes

How can we make this article more helpful?

Email Address (Optional)