After one of the nodes in SF RAC cluster crashed, the remaining nodes hung and had to be rebooted.

Article: 100026066
Last Published: 2012-02-21
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

In an SF RAC environment one of the cluster nodes crashed for non-Veritas related reason.  After the node crashed, the remaining nodes were trying to reconnect and redistribute the load, reaching the LMX maximum connections limit.

Error Message

The following message is found in the syslog:

VCS RAC LMX ERROR V-10-1-00011 lmxopen return, no devices available

Oracle errors:

ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:lcin: open of lmx device (/dev/l failed with status: 16
ORA-27301: OS failure message: Device busy
ORA-27302: failure occurred at: vcsipc_lmxci
ORA-27303: additional information: lcin: open of lmx device (/dev/lmx) failed: errno 16[Device busy], c0x1038d5f88

Cause

LMX is a kernel module performing communications between processes of the database instances on multiple cluster nodes.  Whenever a query or IO is performed, the client needs to communicate to the other instances.  The communications are registered via VCSMM module.  The number of allowed registrations is defined as slave_members value in /kernel/drv/vcsmm.conf file.

The value may be verified as follows:

# echo mm_slave_max / D | mdb –k   
        mm_slave_max: 8192

The value above (‘8192’) is the default value in SF RAC versions previous to 5.1. This value is increased in versions 5.1 and later to 32,768 (32K).

Solution

To change the maximum of allowed connections the entire RAC instance and the cluster must be brought down.   The value must be set the same on all cluster nodes, otherwise VCSMM will fail to start.

Here is a procedure to modify the maximum amount of slave connections.   These steps should be performed in parallel on all nodes of the cluster:

1.  Shutdown Oracle service groups on each system:

      # hagrp –offline <sg> -sys <system>

2. Stop all Oracle client processes on each system

3. Un-configure VCSMM module:

      # /sbin/vcsmmconfig –U

4. Unload “vcsmm” kernel module:

   # modinfo |grep vcsmm

   # modunload <vcsmm_module_ID>

5. Modify /kernel/drv/vcsmm.conf to look like this:

     name="vcsmm" parent="pseudo" slave_members=16384 instance=0;

6. Restart VCSMM:

      # /sbin/vcsmmconfig –c

7. Confirm that VCSMM port is open in GAB (port ‘o’):

      # gabconfig –a

8. Start VCS on each node

      # hastart

 

It is possible to modify the tunable on a live kernel as follows (this is not recommended on a production system):

# echo “mm_slave_max / W 4000” | mdb –kw

In this example the ‘4000’ is the hex value of the desired maximum slaves connections, the decimal value is   16384.


Was this content helpful?