The nbpxyhelper log shows "(database is locked)"

Article: 100044566
Last Published: 2022-07-11
Ratings: 1 2
Product(s): NetBackup & Alta Data Protection

Problem

This issue is known to occur on a heavily loaded Master Server running NetBackup 8.1 or newer. Database application backups are frequently involved when the issue occurs, for options such as Lotus, SAP, Oracle, SQL, or Exchange.

Some jobs may produce Status 25 errors in Activity Monitor with Job Details showing the following:

12/12/2018 13:49:23 - Error nbjm(pid=4625) [PROXY] Connecting host: <hostname>
12/12/2018 13:49:23 - Error nbjm(pid=4625) [PROXY] ConnectionId: {906ED914-FE3E-11E8-B7FC-C92A40367DE3}:OUTBOUND
12/12/2018 13:49:23 - Error nbjm(pid=4625) [PROXY] pid: 4008
12/12/2018 13:49:23 - Error nbjm(pid=4625) [PROXY] Received status: 7643 with message Unable to perform peer host name validation.
12/12/2018 13:49:23 - Error nbjm(pid=4625) [PROXY] Encountered error (VALIDATE_PEER_HOST_PROTOCOL_RUNNING) while processing(ValidatePeerHostProtocol).
Cannot connect on socket (25)

 

Error Message

The nbpxyhelper log file is created in the unified log format.  It can be converted using the following command:
vxlogview -d all -i 486   (convert all the log content in the nbpxyhelper folder)
vxlogview -d all -i 486 -t 00:50:00   (to convert the last 50 minutes of nbpxyhelper log content)

Command location - vxlogview:

Windows: install_path\NetBackup\bin\
Unix: /usr/openv/netbackup/bin

 

Each line in the nbpxyhelper log will be marked as associated with the ConnectionId from the Job Monitor.  Find all lines in the nbpxyhelper log which are marked with the desired ConnectionId.

Example:

12/02/2018 07:58:30.451 [Debug] NB 51216 nbpxyhelper 486 PID:1666 TID:140737200039680 File ID:486 [{906ED914-FE3E-11E8-B7FC-C92A40367DE3}:OUTBOUND] 1 [_fetch_cs_cache_info] DBG - Failed while gathering results from query(SELECT * FROM hostname_to_hostid WHERE name=? AND domain=? AND birth<= strftime('%s', 'now') AND ttl> strftime('%s', 'now');): 5(database is locked)

In order to make the log snip below easier to read, each line has been shortened by removing some prefix information.

 

nbpxyhelper (OID 486) log snip:

This sequence attempts to identify if the connecting peer is a valid host.  This sequence is designed to be accomplished quickly by storing a list of known 'valid' peers in what is called a white-list cache.  NetBackup first attempts to check this cache for the peer hostname:  

[validateHostDetailsFromCache] DBG - Checking cache for hostname:<client_name> hostid: for master:<master_name>
 

The white-list cache database is busy, and the attempt to access it fails:

[_fetch_cs_cache_info] DBG - Failed while gathering results from query(SELECT * FROM hostname_to_hostid WHERE name=? AND domain=? AND birth<= strftime('%s', 'now') AND ttl> strftime('%s', 'now');): 5(database is locked)
[_fetch_cs_cache_info] DBG - Failed while gathering results from query(SELECT * FROM hostname_to_hostid WHERE name=? AND domain=? AND birth<= strftime('%s', 'now') AND ttl> strftime('%s', 'now');): 5(database is locked)

<repeated lines removed>

 

The failure is deemed non-fatal, and NetBackup attempts to connect directly to Web Services on the Master Server to get the information:

[_fetch_cs_cache_info] DBG - SQLITE_BUSY_OR_LOCKED , Not a fatal error
[validateCacheStatus] DBG - Entering
[validateCacheStatus] DBG - SQLITE_BUSY_OR_LOCKED , Performing fetch from WEB API
[validateCacheStatus] DBG - Exiting
[validateHostDetailsFromCache] DBG - Exiting
[ValidatePeerHostProtocol::validateAndInitializePeerValidation] Peer host validation request NOT served from cache; request will now be sent to server 

[NBClientAsyncCURL::connectNioPBX] DBG - Connecting to :[<master_name>].

 

But Web Services is so busy, it does not reply within 30 seconds ... and this attempt also fails:

[NBClientAsyncCURL::performOperation] DBG - Received http response code:503 from server. Will retry
[NBClientAsyncCURL::performOperation] DBG - Retrying previous connection
[NBClientAsyncCURL::performOperation] DBG - Connection retry timeout: 30 seconds reached with current time: 1543755544, m_connectionStartTime: 1543755512 returning ASYNC_FAILED

 

The attempt to validate the peer host fails after 30 seconds.  Error statements are issued, and eventually make their way into Activity Monitor Job Details:

[ValidatePeerHostProtocol::mapStateMachineStatusMsg] Async operation completed for: m_remoteHostId: c6f62c7d-eec8-4956-9aa5-17c869e7a048, m_remoteHostName: <client_name> 
[ValidatePeerHostProtocol::mapStateMachineStatusMsg] json response: {"errorCode":"8625","errorMessage":"Server is unavailable to process the request. Please try later."}
[ValidatePeerHostProtocol::mapStateMachineStatusMsg] Error: Web API call failed with http status: [503
[ValidatePeerHostProtocol::setState] Transitioning state from: VALIDATE_PEER_HOST_PROTOCOL_RUNNING to: VALIDATE_PEER_HOST_PROTOCOL_ERROR 
[ValidatePeerHostProtocol::setState] Unable to perform peer host name validation., nbu status = 7643, severity = 2, Additional Message: [PROXY] Encountered error (VALIDATE_PEER_HOST_PROTOCOL_RUNNING) while processing(ValidatePeerHostProtocol)., ...snip...
[StatusProtocol::start] Output status: {"status": 7643, "remote_proxy_info": {"ca_roots": ["f96061a4-f87f-4a5d-b9cd-3dfb3956e9ee"], "ca_roots_excluded": ["UNCONSTRAINED"],  <rest of line removed for clarity>
[ProxyConnection::setState] [1920] Transitioning state from: PROXY_CONNECTION_SENDING_STATUS to: PROXY_CONNECTION_ERROR 


 

Cause

The operation of performing peer host name validation is performed by vnetd  (inbound_proxy /  outbound_proxy daemons) and is logged within the nbpxyhelper log folder.  The error observed in Activity Monitor occurs when peer host validation has failed.

Log location - nbpxyhelper:

Windows: install_path\NetBackup\logs\nbpxyhelper
Unix: /usr/openv/logs/nbpxyhelper

 

Solution

For NetBackup versions above 8.2 through 10.0.x:

Modify the NetBackup configuration on all media servers (including the primary server if it is acting as a media server) to minimize the number of peer host validation requests to nbwmc during bursts of jobs for the same client.

echo DISABLE_WHITELISTCACHE_CLEANUP_ON_JOB_START=1|nbsetconfig

For NetBackup 8.1.2 and 8.2:

Make the same configuration change, but it requires a companion Emergency Engineering Binary (EEB) to implement the change.  Install EEB 3975178 or 3991273 respectively.

For older NetBackup versions 8.1, 8.1.1:

There are several measures which can be taken to mitigate the situation.  The first two steps can be done without the third.  It is recommended to start with the first two steps, and see if the issue is resolved.  For the most efficient operation, contact Veritas Support for the EEB associated with Step 3.

Step 1. Increase the amount of time a host's name remains in the hostname validation cache from the default of 60 minutes.  Add the following line to the web.conf file:
ttl.connectionContinue.minutes=360

File location - web.conf:

Windows: install_path\var\global\wsl\config\web.conf
Linux: /usr/openv/var/global/wsl/config/web.conf

(Step 1. is only for the Master Server)

(This change will persist from NetBackup 8.1.2 to NetBackup 8.2 following an upgrade.)

Step 2. Enable WAL (Write-Ahead Logging) mode in the whitelist cache database to allow Reading and Writing to proceed concurrently.

Note: Step 2 is for the Master and Media Servers, if sqlite3 utility is not installed on the system, then /usr/openv/pdde/pdopensource/bin/nbsqlite can be used instead. Available on unix at NBU 8.2 and newer.

(This change will not persist from NetBackup 8.1 2 to NetBackup 8.2, reapply the change following the upgrade.)

To check the current mode setting (on Linux):
A. Run the following:  /usr/openv/pdde/pdopensource/bin/sqlite3 /usr/openv/var/vnetd/whitelist_cache.db
B. At the sqlite> prompt, enter: pragma journal_mode;   <press enter, to see the current mode setting>
C. At the sqlite> prompt, enter: .quit   <press enter, to exit the sqlite interface>

Note: the default mode is 'delete'
Note: if the mode shows 'WAL' no changes need to be made
Note: the value is not case sensitive

To adjust the mode setting to WAL (on Linux):
A. Run the following: 

Pre-9.1  /usr/openv/pdde/pdopensource/bin/sqlite3 /usr/openv/var/vnetd/whitelist_cache.db

9.1+  /usr/openv/pdde/pdopensource/bin/nbsqlite /usr/openv/var/vnetd/allowedlist_cache.db


B. At the sqlite> prompt, enter: pragma journal_mode = WAL;   <press enter to adjust the current mode setting>
C. At the sqlite> prompt, enter: .quit   <press enter to exit the sqlite interface>

Note: Once adjusted, the setting will persist through NetBackup restarts.


To check the current mode setting (on Windows):
1. Go to the following URL https://www.sqlite.org/download.html and download the current version of sqlite tools for windows (currently sqlite-tools-win32-x86-3270200.zip)
2. Extract the .zip file and place sqlite3.exe into a configured PATH location (example: C:\windows)

A. From an Administrator CMD prompt, run the following: 
sqlite3 "install_path\NetBackup\var\vnetd\whitelist_cache.db" (adjust path as needed)
B. At the sqlite> prompt, enter: pragma journal_mode;   <press enter to see the current mode setting>
C. At the sqlite> prompt, enter: .quit   <press enter to exit the sqlite interface>

Note: the default mode is 'delete'
Note: if the mode shows 'WAL' no changes need to be made
Note: the value is not case sensitive

To adjust the mode setting to WAL (on Windows):
A. From an Administrator CMD prompt, run the following: 
sqlite3 "install_path\NetBackup\var\vnetd\whitelist_cache.db" (adjust path as needed)
B. At the sqlite> prompt, enter: pragma journal_mode = WAL;   <press enter to adjust the current mode setting>
C. At the sqlite> prompt, enter: .quit   <press enter to exit the sqlite interface>

Note: Once adjusted, the setting will persist through NetBackup restarts.

 

Step 3. In addition to the above tuning changes, an EEB exists which will help to avoid a deadlock issue when multiple connections rapidly access the whitelist cache database. The tuning and code changes for the EEB are included in 8.2.

The binary exists for the following versions of NetBackup:

  • NetBackup 8.1: Etrack 3957949 (EEB1)
  • NetBackup 8.1.1: Etrack 3950392 (EEB1)
  • NetBackup 8.1.2: Etrack 3965917 (EEB1)

If you think you might be a candidate for this EEB, please open a case with Veritas Support.

(Step 3. is for Media Servers including the primary server if this also has a media server role)

References

Etrack : 3957950

Was this content helpful?