Problem
The listener agent is being offlined after several "invalid owner "errors, although the user displayed is the owner of the Oracle binaries and it exists both in LDAP and /etc/password files.
The issue is intermittent.
Error Message
2011/06/15 11:37:15 VCS ERROR V-16-20002-204 (server-1) Netlsnr:listner_XXXXX:monitor:Invalid owner USERNAME for Oracle executables was specified
2011/06/21 19:19:46 VCS ERROR V-16-20002-204 (server-1) Oracle:ora_XXXXX:monitor:Invalid owner USERNAME for Oracle executables was specified
2011/08/02 13:35:12 VCS INFO V-16-20002-211 (server-1) Netlsnr:listner_XXXXX:monitor:Monitor procedure /opt/VRTSvcs/bin/Netlsnr/LsnrTest.pl returned the output: su: Unknown id: USERNAME
2011/08/02 13:35:12 VCS ERROR V-16-2-13067 (server-1) Agent is calling clean for resource(listner_XXXXX) because the resource became OFFLINE unexpectedly, on its own.
2011/08/02 13:35:12 VCS NOTICE V-16-20002-42 (server-1) Netlsnr:listner_XXXXX:clean:Listener(LISTENER) kill TERM 7297
2011/08/02 13:35:23 VCS INFO V-16-2-13068 (server-1) Resource(listner_XXXXX) - clean completed successfully.
Cause
Although customer has both ldap and files in the nsswitch.conf configuration for passwords, when the LDAP delays returning the information the VCS agent receives a "NULL" value for the user/password, and that is interpreted as "invalid owner" in the code.
In the following error message, the monitor is trying to do a "su" to become the Oracle user, and the LDAP is not answering fast enough for the OS to recognize the user, so the error "su: Unknown id" is displayed.
Netlsnr:listner_XXXXX:monitor:Monitor procedure /opt/VRTSvcs/bin/Netlsnr/LsnrTest.pl returned the output: su: Unknown id: USERNAME
Oracle agent for version 5.0 still uses the function getpwnam() to ask the Operating system about users / passwords, which is out to date and not recommended for configurations that use ldap or nis authentication as stated in the Solaris 10 nnswitch.conf manpage:
"Many of the databases have enumeration functions: passwd has getpwent(), hosts has gethostent(), and so on. These were reasonable when the only source was files but often make little sense for hierarchically structured sources that contain large numbers of entries, much less for multiple sources. The interfaces are still provided and the implementations strive to provide reasonable results, but the data returned may be incomplete (enumeration for hosts is simply not supported by the dns source), inconsistent (if multiple sources are used), formatted in an unexpected fashion (for a host with a canonical name and three aliases, the nisplus source will return four hostents, and they may not be con- secutive), or very expensive (enumerating a passwd database of 5,000 users is probably a bad idea). Furthermore, multiple threads in the same process using the same reentrant enumeration function (getXXXent_r() are supported beginning with SunOS 5.3) share the same enumeration position; if they interleave calls, they will enumerate disjoint subsets of the same database...."
Solution
Ensure LDAP response is fast enough.
Newer versions of Storage Foundations are not using getpwnam() anymore in the agents to retrieve user/password information.
Applies To
Cluster Server 5.0.X.
LDAP authentication.