The Java administration console is unable to connect to PBX, login fails with a status 526.

Article: 100043398
Last Published: 2018-12-24
Ratings: 0 0
Product(s): NetBackup

Problem

Attempts to login via the Java administration console fail, reporting a status 526.  Monitoring the PBX log on the master server shows the console never connects to PBX.

The problem may be intermittent in that after several attempts the console finally would login.

Error Message

Unable to login, status: 526

Cause

The PBX port on the master server O/S is receiving incomplete TCP SYN handshakes and thus the incoming connections are not completing.  If the number of concurrent incomplete connections exceeds the listen backlog for PBX, then the incomplete connections will prevent connections that could otherwise complete from entering the listen backlog and being accepted.  The O/S will typically timeout incomplete connections after a period of a few seconds, allowing other connections into the backlog to be accepted by PBX.  But if the ratio of incoming incomplete connections is high, relative to the number of incoming connections that are complete, the situation is equivalent to a denial of service (DOS) attack on PBX (TCP port 1556).

This will likely be most noticeable when the Java console cannot connect to PBX on the master server.  But it can cause other NetBackup processes on other hosts to also not connect, leading to various job and operation failures.

Monitoring the PBX log in real time shows some connection being accepted, but not the ones of interest, and there may be delays in activity when the O/S waits for incomplete connections. 

To tail the PBX log run:

vxlogview -p 50936 -i 103 -SAdditional testing found a telnet test would also sometimes fail to connect to the PBX port on the master server.  Sometimes telnet would connect, but only after a delay of several seconds.  E.g. telnet master1 1556

The condition can be confirmed by checking for incomplete connections to the PBX port 1556 on the master server using the netstat command.  This works on Windows and *NIX operating systems and will show connections in SYN_RCVD state indicating they are incomplete.  The inbound connections will show the IP address of the master server, and port 1556, in the left column of the output.  The right column is the remote or connecting IP address and port number.

netstat -n -a
...snip...
... 10.1.1.1.1556    11.1.1.11.6000 ... SYN_RCVD
... 10.1.1.1.1556    12.1.1.12.4836 ... SYN_RCVD
... 10.1.1.1.1556    11.1.1.11.2849 ... SYN_RCVD
...snip...

If netstat is run on the connecting remote host(s), it will show the same connections in a SYN_SENT state.  Connections in this state will similarly timeout and be dropped by the TCP stack on that host.

Typically, this DOS condition will be unintentional and created by group of remote hosts running NetBackup processes that are trying to connect to the master server, but are prevented by some conflict in the network path either to the master server or from the master server.   See the hosts with IP addresses 11.1.1.11 and 12.1.1.12 in the example above.

Three scenarios have been observed.  All involve either the master server having multiple network interfaces (NICs) or the remote hosts having multiple network interfaces, and an asymmetrical network route being created.  In such a route, the TCP packets sent to the master server use one route/NIC, and the reply packets returned to the remote host use a different route/NIC.  As a result, packets may only be delivered in one direction, and not have a return route in the other direction.

In one scenario, the routing table on one of the remote hosts was configured to place packets onto a network segment that does not match the routing table on which the master server expected packets from those IP addresses to arrive.

In a second scenario, both network routes exist, but a network stack along one of the routes detects that it is only seeing packets in one direction and drops the connection.  The drop could be performed by a firewall or switch, or even the TCP stack on the master server or the remote host.

In a third scenario, NetBackup on the remote host is configured contrary to the network routing configuration and requests a source binding (source IP address) for the connection that does not match the local routing table.  As a result, the outbound TCP packet exits the NIC that matches the routing table, but bearing a source IP address for a second NIC.  As a result the reply TCP packet is addressed and routed to the second NIC.  The resulting asymmetrical route then results in the first or second scenario above.

These scenarios cause PBX connection failures when the number of such concurrent incomplete connections exceed the TCP listen backlog for PBX.  When PBX starts, it requests the O/S to provide a TCP incomplete connection queue of size 5.  AIX increases this by a factor of 1.5 for the queue length (i.e. 8 connections), other O/S may pad by more or less.  Exceeding the queue length may not cause NetBackup to fail outright because some TCP stacks will retransmit the TCP packets and they may arrive later when there is space in the queue.  NetBackup will also retry the connection several times.  But if too many incomplete connections exist and respawn quickly enough, legitimate connections (such as from the Java admin console) may not enter the backlog, even with multiple retries, and must give up and fail.

On AIX, the current queue length (q0len), and requested backlog (qlimit) can be displayed using the 'netstat -n -a -o' command.  Review the output for port 1556 that is in a LISTEN state.  If q0len is consistently 8, then incoming connections are being denied at that point-in-time.

netstat -n -a -o
...snip...
tcp4   0   0 *.1556   *.* LISTEN
...snip...
    q0len:8 qlen:0 qlimit:5 so_state: (PRIV)
...snip...

AIX also provides a count of incoming connections that have been dropped in the past due to a full listen backlog.  This count is not relative to a specific TCP port.  However, if the count are not incrementing then connections is not currently being prevented from entering the queues of any listening services.  E.g.

netstat -s
...snip...
    0 discarded due to listener's queue full
...snip...

Solution

Ensure the routing table on the master server and remote hosts will exchange packets over a single NIC on each host.  Use 'netstat -r' to view the routing table.   Ensuring the proper routes are in place, allows the master server to responded out the same network interface from which the inbound connection was received, and the reply packets to the remote host will arrive on the same NIC from which the initial packets were sent.  

In addition, use nbgetconfig on the remote hosts to confirm that NetBackup on those hosts is not configured to request a source binding that conflicts with the host routing table, for the route to the master server.  These configurations are rarely, if ever, needed.  If present, see the Related Articles below for additional details about recommended and safe usage.

  • The required interface, if present, must match network interface the host routing table uses to contact the master server. 
  • Any preferred network, that matches the master server IP address and specifies a source binding, must match the network interface the host routing table uses to contact the master server.
  • Any preferred network, that prohibits the use of an address or address range, must not specify or include any IP addresses local to the host.

For example:

nbgetconfig
...snip...
REQUIRED_INTERFACE = my_hostname_matching_route_to_the_master
...snip...
PREFERRED_NETWORK = master_hostname MATCH my_hostname_matching_route_to_the_master
PREFERRED_NETWORK = not_any_of_my_hostnames PROHIBITED
...snip...

References

Etrack : 3950272

Was this content helpful?