ProblemZone resource faults with message 'zoneadm: failed to get zone name: Invalid argument'
The current Zone agent within VCS is exposed to a Solaris bug (Sun incident 6757506) whereby the running of multiple zoneadm commands can result in the zoneadm command reporting erroneous information. Since the monitor script of the Zone agent uses the zoneadm command to query zone information, its failure can result in zones offlining, when they are actually still online. If this issue has been hit, then the following sequence of messages may be apparent in the VCS engine log:
VCS INFO V-16-2-13001 (HOSTNAME) Resource(TEST1_ZONE): Output of the completed operation (monitor) zoneadm: failed to get zone name: Invalid argument
VCS INFO V-16-2-13001 (HOSTNAME) Resource(TEST2_ZONE): Output of the completed operation (monitor) zoneadm: failed to get zone name: Invalid argument
VCS ERROR V-16-2-13067 (HOSTNAME) Agent is calling clean for resource(TEST1_ZONE) because the resource became OFFLINE unexpectedly, on its own.
VCS ERROR V-16-2-13067 (HOSTNAME) Agent is calling clean for resource(TEST2_ZONE) because the resource became OFFLINE unexpectedly, on its own.
Sun plan to rectify issue in the underlying zoneadm command in Solaris 11. VCS will incorporate revised checks in the following patches, to help work round the issue (via e1956481):
VCS 50MP3RP4 & 51RP2
In the meantime, to help lessen the chance of the issue occurring again, consider reducing the NumThreads attribute of the Zone agent from its default 10 to 5. If the errors persist, then consider reducing down to 1. This will force the Zone agent to fire off less zoneadm threads at any one time, thus reducing the chance of the zoneadm bug.
The caveat to note with tuning down NumThreads is that offlining, onlining, monitoring of zone resources will all take a little longer. If this happens to present a problem, consider revising NumThreads upwards slightly.
For further information on the bug, please refer to Sun bug id: 6757506