How to improve the online/offline/monitor times of VCS resources in large setups by tuning the NumThreads attribute of VCS Agents
Problem
In large setups it is not unusual to have hundreds of resources of any particular agent type. In such scenarios, the time taken to Online/Offline/Monitor these resources by the VCS agents may take far longer than under normal circumstances in smaller setups. There is a possibility that this may also result in Monitor timeouts if there are multiple different resource types having a large number of resources configured. This in turn could increase the failover times and may take much longer to switch Service Groups to a different node.
Cause
The default value of the NumThreads Attribute of VCS agents is set to 10
Solution
VCS Agent Framework defines the attribute of NumThreads for each agent, which specifies the maximum number of service threads that an agent is allowed to create. Service threads are the threads in the agent that service resource commands, the main one being entry point processing. NumThreads does not control the number of threads used for other internal purposes.
Agents dynamically create service threads depending on the number of resources that the agent has to manage. Until the number of resources is less than the NumThreads value, the addition of a new resource will make the agent create an additional service thread. Also, if the number of resources falls below the NumThreads value as a result of deletion of resources, the agent will correspondingly delete service threads. Since an agent for a type will be started by the engine only if there is at least one resource for that type in the configuration, an agent will always have at least 1 service thread. Setting NumThreads to 1 will thus prevent any additional service threads from being created even if more resources are added.
The default value of the NumThread attribute is set to 10 for most agents (with a few exceptions where the NumThreads is restricted to 1), and this value is sufficient for most standard configurations. However, in large setups if we have more than hundreds of resources of any particular resource type, then we can increase the value of this NumThread attribute for those VCS Agents to reduce the overall time required for standard online/offline/monitor entry points to complete for all the resources of that particular resource type. Please note that the maximum value that can be set for NumThreads is 30.
The Attribute of NumThreads can be modified using any one of the two methods shown below. Consider an environment having hundreds of Mount resources configured in VCS, where it would be appropriate to test the improvement in the times taken for the Online/Offline/Monitor of those Mount resources. This can be done by modifying the NumThreads value for the VCS Mount Agent from the default value of 10 to increase it to a value desired up to the maximum value of 30 as shown below. Please note that some VCS Agents may not support increasing the NumThreads Attribute, for instance the VCS clusters running on AIX and utilizing the LVMVG agent must not change the value of NumThreads for the LVMVG agent type from the default value of 1. This is documented in the following Article 000030815. Further, the DiskGroup and Volume Agent on all platforms are restricted to the NumThread value of 1.
METHOD 1: Modify the Mount Agent's NumThread Attribute values using the command line:
Check the present value setting of the NumThread attribute for the Mount Agent:
# hatype -display Mount -attribute NumThreads
#Type Attribute Value
Mount NumThreads 10
Modify the value of the NumThread attribute for the Mount Agent:
# haconf -makerw
# hatype -modify Mount NumThreads 30
# haconf -dump -makero
Verify the new values of the NumThreads attribute for the Mount Agent:
# hatype -display Mount -attribute NumThreads
#Type Attribute Value
Mount NumThreads 30
METHOD 2: Modify the Mount Agent's NumThread Attribute values by modifying the Types definition file (/etc/VRTSvcs/conf/config/types.cf) :
This method requires a restart of the VCS cluster services. No need to have an outage for the Production as only cluster services need to restart.
Freeze the systems persistently, dump the configuration and stop the cluster services only.
# haconf -makerw
# hasys -freeze <system_name> -persistent
# haconf -dump -makero
# hastop -all -force
Wait for the VCS services to stop completely and ensure none of the VCS Agent processes are still running or in zombie state in the OS process table.
Perform the following steps on each node of the cluster:
# hastatus -sum
VCS ERROR V-16-1-10600 Cannot connect to VCS engine
VCS WARNING V-16-1-11046 Local system not available
# ps -aef | egrep -i "had|agent"
If any VCS related processes are still running, then kill the processes manually.
# kill -9 <pid>
Now open the Types definition file "/etc/VRTSvcs/conf/config/types.cf" using a text editor like vi to modify the file and navigate to the Mount type definition in the file. Note that each node in the cluster needs to update the separately since the changes made on one system will not be propagated to the other nodes.
# vi /etc/VRTSvcs/conf/config/types.cf
...
type Mount (
static keylist SupportedActions = { "mountpoint.vfd", "mounted.vfd", "vxfslic.vfd", chgmntlock, "mountentry.vfd" }
static str ArgList[] = { MountPoint, BlockDevice, FSType, MountOpt, FsckOpt, SnapUmount, CkptUmount, SecondLevelMonitor, SecondLevelTimeout, OptCheck, CreateMntPt, MntPtPermission, MntPtOwner, MntPtGroup, AccessPermissionChk, RecursiveMnt, VxFSMountLock }
static str IMFRegList[] = { MountPoint, BlockDevice, FSType }
int SnapUmount
str MountOpt
boolean RecursiveMnt = 0
int VxFSMountLock = 1
boolean SecondLevelMonitor = 0
int CreateMntPt
str MntPtOwner
str FsckOpt
int ReuseMntPt
str MntPtGroup
str BlockDevice
int OptCheck
str MountPoint
int CkptUmount = 1
str MntPtPermission
int AccessPermissionChk
int SecondLevelTimeout = 30
str FSType
)
....
As we can see from the snippet of the types.cf file, the NumThreads value is not visible with the current default configuration since the default values are set as 10.
# hatype -display Mount -attribute NumThreads
#Type Attribute Value
Mount NumThreads 10
Modify the NumThreads value to the desired value. Remember the maximum possible NumThreads value is 30.
---
type Mount (
static keylist SupportedActions = { "mountpoint.vfd", "mounted.vfd", "vxfslic.vfd", chgmntlock, "mountentry.vfd" }
static int NumThreads = 30
static str ArgList[] = { MountPoint, BlockDevice, FSType, MountOpt, FsckOpt, SnapUmount, CkptUmount, SecondLevelMonitor, SecondLevelTimeout, OptCheck, CreateMntPt, MntPtPermission, MntPtOwner, MntPtGroup, AccessPermissionChk, RecursiveMnt, VxFSMountLock }
static str IMFRegList[] = { MountPoint, BlockDevice, FSType }
int SnapUmount
str MountOpt
boolean RecursiveMnt = 0
int VxFSMountLock = 1
boolean SecondLevelMonitor = 0
int CreateMntPt
str MntPtOwner
str FsckOpt
int ReuseMntPt
str MntPtGroup
str BlockDevice
int OptCheck
str MountPoint
int CkptUmount = 1
str MntPtPermission
int AccessPermissionChk
int SecondLevelTimeout = 30
str FSType
)
---
Once the changes are done, save the file and quit the editor. Once the file is updated on all the nodes individually, we can start the cluster services again on each node.
# hastart
Verify the new NumThreads value in the configuration:
# hatype -display Mount -attribute NumThreads
#Type Attribute Value
Mount NumThreads 30
Again unfreeze all the nodes in the Cluster:
# haconf -makerw
# hasys -freeze -persistent <system-name>
# haconf -makerw.