Problem
NBPEM (NetBackup Policy Execution Manager) appears to be hung, or new backup jobs show up in the Activity monitor very slowly.
Note: General hostname lookup tests, NetBackup network test tools (nbdiag) / commands (bpclntcmd), and other applications are not experiencing hostname resolution problems. This can be because the hosts being resolved are cached, and therefore, the operating system does not need to perform an actual hostname lookup.
Error Message
NBPEM log example:
01/03/13 01:25:52.497 [Debug] NB 51216 nbpem 116 PID:6308 TID:14 File ID:116 [No context] 1 [PemNameMgrBase::updateHostEntry] (ID:101849a08) Lookup of host < client_name > took too long to complete : 14 seconds.(PemNameMgrBase.cpp:951)
01/03/13 01:26:06.548 [Debug] NB 51216 nbpem 116 PID:6308 TID:14 File ID:116 [No context] 1 [PemNameMgrBase::updateHostEntry] (ID:10184a768) Lookup of host < client_name > took too long to complete : 14 seconds.(PemNameMgrBase.cpp:951)
NOTE: The NBPEM log will display hostname lookups which takes longer than 5 seconds with ' DebugLevel=1 ' set. To see lookup delays between 1 and 5 seconds, enable the NBPEM logging to ' DebugLevel=5 ' or above.
Cause
Since NetBackup 7.1, NetBackup has the ability the cache hostnames, this aids with the repeated hostname lookups. By default the NetBackup hostname cache is held for 60 minutes, and if a hostname lookup is required after the hostname has cleared from the NetBackup cache, then the operating system is required to resolve the hostname using the function ' getaddrinfo() '.
In general, the operating system will also cache the hostnames, but this operating system cache will expire after a period of time. After the operating system cache is unable to resolve the hostname, the operating system must determine what hostname resolution is available to resolve the requested address. If there are any delays when resolving the hostname (when the hostname is not available in the cache), this will cause delays for NetBackup and in turn hinder overall performance.
If the operating system is unable to resolve the hostname within a fraction of a second (under 0.25 second), NetBackup can suffer delays.
Solution
Resolve the hostname lookup problems. This does not mean there are typo's or problems with hostnames or IP addresses. The problems will be due to either, invalid entries in the resolver configurations files (for example, /etc/resolv.conf on Unix), invalid routes, very poor network responses due to hardware problems, offline/unavailable DNS or NIS servers, or recently added operating system patches.
The key will be to test hostnames which are not cached with the command ' ping '. If the hostname lookup takes longer than 1 second, there is a potential for NetBackup to experience delays when starting backups. To ensure the hostnames are not cached, flush the cache and/or stop the name caching service daemon. The nslookup command can also be used to verify DNS name resolution (if DNS is being used in the environment for name resolution).
To work-around the problems, adding *all* client names which exist in active policies to the master server local hosts ( /etc/hosts in Unix, or C:\windows\system32\drivers\etc\hosts on Windows ) file, will improve the performance and help prevent delays in starting the backups.