All listeners in the cluster resgistered errors and stopped getting connections at the same time. As they were configured as critical resources, on the fault event all databases went down as result. Service groups failed over, but could not start until several hours had passed.
2011/07/10 00:29:47 VCS INFO V-16-20002-211 (ukblx204) Netlsnr:int10p_listener:monitor:Monitor procedure /opt/VRTSvcs/bin/Netlsnr/LsnrTest.pl returned the output: LD_LIBRARY_PATH - /usr/lib: LSNRCTL for Solaris: Version 10.2.0.4.0 - Production on 10-JUL-2011 00:29:46 Copyright (c) 1991, 2007, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=pint10db.ukhx.astrazeneca.net)(PORT=1525))) TNS-12545: Connect failed because target host or object does not exist
TNS-12560: TNS:protocol adapter error
TNS-00515: Connect failed because target host or object does not exist Solaris Error: 146: Connection refused
/usr/lib:/pgen/d01/app/oracle/product/10.2.0/db_1/lib: ORA-00119: invalid specification for system parameter LOCAL_LISTENER ORA-00130: invalid listener address '(ADDRESS=(PROTOCOL=TCP)(HOST=PGENDB.UKHX.ASTRAZENECA.NET)(PORT=1537))'
There are no error messages (other than the ones that are listed here) that could point to the solution in the explorer, nor the customer had any idea of what had happened, as issue happened in the middle of the night.
However, the whole enviroment pointed in the problem direction.
- No network loss detected.
- No hardware problems detected
- No database issues.
- No more faulty resources other than the listeners.
- /etc/hosts configuration, where the database connection strings do not appear.
- No configuration changes.
The only possible thing that could have make all cluster listeners to fail at the same time, with the same error " invalid listener address" and "Connect failed because target host or object does not exist" is a total DNS failure.
- Ensure DNS redundancy
- Add database connection strings to the /etc/hosts file
- Configure listeners to non critical, so they fault but don't stop the databases as well, as other products may be using different types of connections.