Cluster node becomes unresponsive due to Process offline monitoring registrations through the Asynchronous Monitoring Framework on some service packs of AIX 7.1 and 6.1

Cluster node becomes unresponsive due to Process offline monitoring registrations through the Asynchronous Monitoring Framework on some service packs of AIX 7.1 and 6.1

Article: 100013295
Last Published: 2015-06-19
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

This article describes an issue that is related to the AMF (Asynchronous Monitoring Framework) process registrations on AIX systems, as well as a workaround for the issue.

AIX operating systems facing the AMF issue:

  • AIX 6.1 TL9 SP3 and SP4
  • AIX 7.1 TL2 SP5
  • AIX 7.1 TL3 SP3 and SP4

AMF versions impacted:

  • 5.1SP1 and later
  • 6.0  and later
  • 6.1 and later
  • 6.2
  • 7.x

Error Message

 

 

Cause

The above mentioned AIX service packs on AIX 7.1 and AIX 6.1 prevent the AMF driver from fetching process information from the /proc/<pid>/psinfo file when the process is in the “EXEC” state.

Whenever AIX starts (EXECs) a new process, the AMF driver receives a callback from the AIX operating system. In the callback context, the AMF driver reads the process information. This causes a deadlock, as the read call waits for the EXEC operation to complete while the EXEC operation blocks the read call.

Refer to the AIX authorized program analysis report (APAR) from IBM for detailed information:

 

Solution

Install the IBM service pack AIX 6.1 TL9 SP5, or AIX 7.1 TL3 SP5. Alternatively, AIX 7.1TL3SP4 + ifix for APAR IV66484. For information related to the APAR, contact IBM Support.


Additional workaround

Disable Process offline monitoring with AMF.

Before you upgrade to AIX 7.1 TL2 SP5, TL3 SP3 and SP4 or AIX 6.1 TL9 SP3 and SP4, disable Process offline monitoring of the Intelligent Monitoring Framework (IMF) aware agents, such as the Process agent and the Application agent.

The steps to disable Process offline monitoring with AMF are as follows:

1. Check the current IMF Mode attribute value.

# hatype -display <resource-type> -attribute IMF

# Type Attribute Value

Process IMF Mode 3 MonitorFreq 1 RegisterRetryLimit 3
 

2. If the value of the Mode attribute is 3, set it to 2 to disable Process offline monitoring.

# hatype -modify <resource-type> IMF -update Mode 2

# hatype -display <resource-type> -attribute IMF

# Type Attribute Value

Process IMF Mode 2 MonitorFreq 1 RegisterRetryLimit 3
 

3. If the value of Mode attribute is 1, set it to 0 to disable Process offline monitoring.

# hatype -modify <resource-type> IMF -update Mode 0

# hatype -display <resource-type> -attribute IMF

# Type Attribute Value

Process IMF Mode 0 MonitorFreq 1 RegisterRetryLimit 3
 

Was this content helpful?