General explanation about monitor interval & timeout

Article: 100021530
Last Published: 2022-01-08
Ratings: 4 0
Product(s): InfoScale & Storage Foundation

Problem

General explanation about monitor interval & timeout

Solution


# What is the monitor for resource on VCS ?

'monitor' typicallys contains the logic to determine the status of a resource.

'monitor' is called after completing the 'online' and 'offline' to determine if bringing the resource online or taking it offline was effective.

Also, 'monitor' is called periodically to detect if the resource was brought online or taken offline unexpectedly.

Unless certain attribute values have been modified from their values, the 'monitor' runs every sixty seconds (the default value of the 'MonitorInterval') when a resource is online. When a resource is expected to be offline, the 'monitor' runs every 300 seconds the default value ofr the 'OfflineMonitorInterval').

The 'monitor' returns the resource status (online, offline, or unknown), and the confidence level 0-100. 'monitor' returns confidence level only when the resource status is online. The confidence level is informative only and is not used by the engine.


- 100: indicates offline.
- 101: indicates online and confidence level 10.
- 102: indicates online and confidence level 20.
- 103~109: indicates online and confidence level 30~90.
- 110: indicates online and confidence level 100.
- other values: the status is considered unknown.


# MonitorInterval & MonitorTimeout

MonitorInterval
- This is the duration (in seconds) between two consecutive monitor calls for an online or transitioning resource.
- The default is 60 seconds for most resource types.

OfflineMonitorInterval
- This is the duration (in seconds) between two consecutive monitor calls for an offline resource. If set to 0, offline resources are not monitored.
- The default is 300 seconds for most resource types.

MonitorTimeout
- This values define the maximum time within which the monitor must finish or else be terminated.
- The default value is 60 seconds for most resource types.


# Tests for 'MonitorInterval' & 'MonitorTimeout'

1. Configuration of service group & resource

group seleeSG (
SystemList = { hpux1 = 0, hpux2 = 1 }
AutoFailOver = 0
AutoStartList = { hpux1, hpux2 }
)

Application seleeApp (
StartProgram = "/selee/start"
StopProgram = "/selee/stop"
CleanProgram = "/selee/clean"
MonitorProgram = "/selee/monitor"
)

[hpux2:/selee]hatype -display Application
#Type Attribute Value
......
Application FaultOnMonitorTimeouts 4
......
Application MonitorInterval 75
......
Application MonitorTimeout 60
......
Application ToleranceLimit 0
[hpux2:/selee]


2. Simulation when monitor thread got hung (interval: 75 secs, time-out: 60 secs)

2009/07/17 14:01:32 SELEE_MONITOR: Monitor is called
2009/07/17 14:01:32 SELEE_MONITOR: ONLINE
2009/07/17 14:02:47 SELEE_MONITOR: Monitor is called (14:02:47 - 14:01:32 = 75 secs)
2009/07/17 14:02:47 SELEE_MONITOR: ONLINE
2009/07/17 14:04:02 SELEE_MONITOR: Monitor is called (14:04:02 - 14:02:47 = 75 secs)
2009/07/17 14:04:02 SELEE_MONITOR: ONLINE
2009/07/17 14:05:17 SELEE_MONITOR: Monitor is called (14:05:17 - 14:04:02 = 75 secs; monitor got hung)
2009/07/17 14:05:27 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 14:05:37 SELEE_MONITOR: Monitor is sleeping for 20 secs
2009/07/17 14:05:47 SELEE_MONITOR: Monitor is sleeping for 30 secs
2009/07/17 14:05:57 SELEE_MONITOR: Monitor is sleeping for 40 secs
2009/07/17 14:06:07 SELEE_MONITOR: Monitor is sleeping for 50 secs
2009/07/17 14:06:12 SELEE_MONITOR: Monitor is sleeping for 55 secs
2009/07/17 14:06:13 SELEE_MONITOR: Monitor is sleeping for 56 secs
2009/07/17 14:06:14 SELEE_MONITOR: Monitor is sleeping for 57 secs
2009/07/17 14:06:15 SELEE_MONITOR: Monitor is sleeping for 58 secs
2009/07/17 14:06:16 SELEE_MONITOR: Monitor is sleeping for 59 secs (14:06:16 - 14:05:17 = 59 secs; 1st time-out)
2009/07/17 14:06:18 VCS ERROR V-16-2-13027 (hpux2) Resource(seleeApp) - monitor procedure did not complete within the expected time.
2009/07/17 14:06:32 SELEE_MONITOR: Monitor is called (14:06:32 - 14:05:17 = 75 secs; monitor got hung)
2009/07/17 14:06:42 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 14:06:52 SELEE_MONITOR: Monitor is sleeping for 20 secs
2009/07/17 14:07:02 SELEE_MONITOR: Monitor is sleeping for 30 secs
2009/07/17 14:07:12 SELEE_MONITOR: Monitor is sleeping for 40 secs
2009/07/17 14:07:22 SELEE_MONITOR: Monitor is sleeping for 50 secs
2009/07/17 14:07:27 SELEE_MONITOR: Monitor is sleeping for 55 secs
2009/07/17 14:07:28 SELEE_MONITOR: Monitor is sleeping for 56 secs
2009/07/17 14:07:29 SELEE_MONITOR: Monitor is sleeping for 57 secs
2009/07/17 14:07:30 SELEE_MONITOR: Monitor is sleeping for 58 secs
2009/07/17 14:07:31 SELEE_MONITOR: Monitor is sleeping for 59 secs
2009/07/17 14:07:32 SELEE_MONITOR: Monitor is sleeping for 60 secs (14:07:32 - 14:06:32 = 60 secs; 2nd time-out)
2009/07/17 14:07:47 SELEE_MONITOR: Monitor is called (14:07:47 - 14:06:32 = 75 secs; monitor got hung)
2009/07/17 14:07:57 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 14:08:07 SELEE_MONITOR: Monitor is sleeping for 20 secs
2009/07/17 14:08:17 SELEE_MONITOR: Monitor is sleeping for 30 secs
2009/07/17 14:08:27 SELEE_MONITOR: Monitor is sleeping for 40 secs
2009/07/17 14:08:37 SELEE_MONITOR: Monitor is sleeping for 50 secs
2009/07/17 14:08:42 SELEE_MONITOR: Monitor is sleeping for 55 secs
2009/07/17 14:08:43 SELEE_MONITOR: Monitor is sleeping for 56 secs
2009/07/17 14:08:44 SELEE_MONITOR: Monitor is sleeping for 57 secs
2009/07/17 14:08:45 SELEE_MONITOR: Monitor is sleeping for 58 secs
2009/07/17 14:08:46 SELEE_MONITOR: Monitor is sleeping for 59 secs
2009/07/17 14:08:47 SELEE_MONITOR: Monitor is sleeping for 60 secs (14:08:47 - 14:07:47 = 60 secs; 3rd time-out)
2009/07/17 14:09:02 SELEE_MONITOR: Monitor is called (14:09:02 - 14:07:47 = 75 secs; monitor got hung)
2009/07/17 14:09:12 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 14:09:22 SELEE_MONITOR: Monitor is sleeping for 20 secs
2009/07/17 14:09:32 SELEE_MONITOR: Monitor is sleeping for 30 secs
2009/07/17 14:09:42 SELEE_MONITOR: Monitor is sleeping for 40 secs
2009/07/17 14:09:52 SELEE_MONITOR: Monitor is sleeping for 50 secs
2009/07/17 14:09:57 SELEE_MONITOR: Monitor is sleeping for 55 secs
2009/07/17 14:09:58 SELEE_MONITOR: Monitor is sleeping for 56 secs
2009/07/17 14:09:59 SELEE_MONITOR: Monitor is sleeping for 57 secs
2009/07/17 14:10:00 SELEE_MONITOR: Monitor is sleeping for 58 secs
2009/07/17 14:10:01 SELEE_MONITOR: Monitor is sleeping for 59 secs
2009/07/17 14:10:02 SELEE_MONITOR: Monitor is sleeping for 60 secs (14:10:02 - 14:09:02 = 60 secs; 4th time-out)
2009/07/17 14:10:03 VCS ERROR V-16-2-13210 (hpux2) Agent is calling clean for resource(seleeApp) because 4 successive invocations of the monitor procedure did not complete within the expected time.
2009/07/17 14:10:03 SELEE_CLEAN: clean is called
2009/07/17 14:10:03 SELEE_CLEAN: clean is done
2009/07/17 14:10:04 VCS INFO V-16-2-13068 (hpux2) Resource(seleeApp) - clean completed successfully.


3. Simulation when monitor hang got dis-appeared (interval: 75 secs, time-out: 60 secs)

2009/07/17 14:18:38 SELEE_MONITOR: Monitor is called
2009/07/17 14:18:38 SELEE_MONITOR: ONLINE
2009/07/17 14:19:53 SELEE_MONITOR: Monitor is called (14:19:53 - 14:18:38 = 75 secs)
2009/07/17 14:19:53 SELEE_MONITOR: ONLINE
2009/07/17 14:21:08 SELEE_MONITOR: Monitor is called (14:21:08 - 14:19:53 = 75 secs)
2009/07/17 14:21:08 SELEE_MONITOR: ONLINE
2009/07/17 14:22:24 SELEE_MONITOR: Monitor is called (14:22:24 - 14:21:08 = 76 secs; monitor got hung)
2009/07/17 14:22:34 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 14:22:44 SELEE_MONITOR: Monitor is sleeping for 20 secs
2009/07/17 14:22:54 SELEE_MONITOR: Monitor is sleeping for 30 secs
2009/07/17 14:23:04 SELEE_MONITOR: Monitor is sleeping for 40 secs
2009/07/17 14:23:14 SELEE_MONITOR: Monitor is sleeping for 50 secs
2009/07/17 14:23:19 SELEE_MONITOR: Monitor is sleeping for 55 secs
2009/07/17 14:23:20 SELEE_MONITOR: Monitor is sleeping for 56 secs
2009/07/17 14:23:21 SELEE_MONITOR: Monitor is sleeping for 57 secs
2009/07/17 14:23:22 SELEE_MONITOR: Monitor is sleeping for 58 secs
2009/07/17 14:23:23 SELEE_MONITOR: Monitor is sleeping for 59 secs
2009/07/17 14:23:24 SELEE_MONITOR: Monitor is sleeping for 60 secs (14:23:24 - 14:22:24 = 60 secs; 1st time-out)
2009/07/17 14:23:25 VCS ERROR V-16-2-13027 (hpux2) Resource(seleeApp) - monitor procedure did not complete within the expected time.
2009/07/17 14:23:38 SELEE_MONITOR: Monitor is called (14:23:38 - 14:22:24 = 74 secs; monitor got hung)
2009/07/17 14:23:48 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 14:23:58 SELEE_MONITOR: Monitor is sleeping for 20 secs
2009/07/17 14:24:08 SELEE_MONITOR: Monitor is sleeping for 30 secs
2009/07/17 14:24:18 SELEE_MONITOR: Monitor is sleeping for 40 secs
2009/07/17 14:24:29 SELEE_MONITOR: Monitor is sleeping for 50 secs
2009/07/17 14:24:34 SELEE_MONITOR: Monitor is sleeping for 55 secs
2009/07/17 14:24:35 SELEE_MONITOR: Monitor is sleeping for 56 secs
2009/07/17 14:24:36 SELEE_MONITOR: Monitor is sleeping for 57 secs
2009/07/17 14:24:37 SELEE_MONITOR: Monitor is sleeping for 58 secs
2009/07/17 14:24:38 SELEE_MONITOR: Monitor is sleeping for 59 secs (14:24:38 - 14:23:38 = 60 secs; 2nd time-out)
2009/07/17 14:24:53 SELEE_MONITOR: Monitor is called (14:24:53 - 14:23:38 = 75 secs; monitor got hung)
2009/07/17 14:25:03 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 14:25:13 SELEE_MONITOR: Monitor is sleeping for 20 secs
2009/07/17 14:25:23 SELEE_MONITOR: Monitor is sleeping for 30 secs
2009/07/17 14:25:33 SELEE_MONITOR: Monitor is sleeping for 40 secs
2009/07/17 14:25:33 SELEE_MONITOR: ONLINE (monitor hang got away)
2009/07/17 14:25:34 VCS INFO V-16-2-13026 (hpux2) Resource(seleeApp) - monitor procedure finished successfully after failing to complete within the expected time for (2) consecutive times.
2009/07/17 14:26:08 SELEE_MONITOR: Monitor is called (14:26:08 - 14:24:53 = 75 secs)
2009/07/17 14:26:08 SELEE_MONITOR: ONLINE
......
2009/07/17 14:28:38 SELEE_MONITOR: Monitor is called
2009/07/17 14:28:48 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 14:28:58 SELEE_MONITOR: Monitor is sleeping for 20 secs
2009/07/17 14:29:08 SELEE_MONITOR: Monitor is sleeping for 30 secs
2009/07/17 14:29:18 SELEE_MONITOR: Monitor is sleeping for 40 secs
2009/07/17 14:29:18 SELEE_MONITOR: ONLINE
2009/07/17 14:29:53 SELEE_MONITOR: Monitor is called (14:29:53 - 14:28:38 = 75 secs)
2009/07/17 14:30:03 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 14:30:13 SELEE_MONITOR: Monitor is sleeping for 20 secs
2009/07/17 14:30:23 SELEE_MONITOR: Monitor is sleeping for 30 secs
2009/07/17 14:30:33 SELEE_MONITOR: Monitor is sleeping for 40 secs
2009/07/17 14:30:33 SELEE_MONITOR: ONLINE
2009/07/17 14:31:08 SELEE_MONITOR: Monitor is called (14:31:08 - 14:29:53 = 75 secs)
2009/07/17 14:31:08 SELEE_MONITOR: ONLINE


4. Simulation when monitor thread got hung (interval: 60 secs, time-out: 60 secs)

2009/07/17 12:57:43 SELEE_MONITOR: Monitor is called
2009/07/17 12:57:43 SELEE_MONITOR: ONLINE
2009/07/17 12:58:43 SELEE_MONITOR: Monitor is called
2009/07/17 12:58:43 SELEE_MONITOR: ONLINE
2009/07/17 12:59:43 SELEE_MONITOR: Monitor is called (monitor got hung)
2009/07/17 12:59:53 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 13:00:03 SELEE_MONITOR: Monitor is sleeping for 20 secs
......
2009/07/17 13:00:42 SELEE_MONITOR: Monitor is sleeping for 59 secs
2009/07/17 13:00:43 SELEE_MONITOR: Monitor is sleeping for 60 secs (13:00:43 - 12:59:43 = 60 secs; 1st time-out; monitor interval was past, so this time was ignored)
2009/07/17 13:00:44 VCS ERROR V-16-2-13027 (hpux2) Resource(seleeApp) - monitor procedure did not complete within the expected time.
2009/07/17 13:01:43 SELEE_MONITOR: Monitor is called (13:01:43 - 13:00:44 = 60 secs)
2009/07/17 13:01:53 SELEE_MONITOR: Monitor is sleeping for 10 secs
2009/07/17 13:02:03 SELEE_MONITOR: Monitor is sleeping for 20 secs
......
2009/07/17 13:02:41 SELEE_MONITOR: Monitor is sleeping for 58 secs
2009/07/17 13:02:42 SELEE_MONITOR: Monitor is sleeping for 59 secs 13:02:42 - 13:01:43 = 59 secs; 2nd time-out; monitor interval was past, so this time was ignored)
2009/07/17 13:03:43 SELEE_MONITOR: Monitor is called (13:03:43 - 13:02:42 = 61 secs)


# Note: If the monitor thread is still running (including a hang),  the agent doesn't restart the monitor thread.

 

Was this content helpful?