NetBackup jobs fail after /mnt/nbdata partition becomes full on Flex appliance media server instance

Article: 100051463
Last Published: 2021-09-29
Ratings: 0 1
Product(s): Appliances

Problem

Jobs failing with connection issues, such as status 25 (cannot connect on socket), and image cleanup jobs fail with status 2027 (Media server is not active). The Flex appliance media server instance /mnt/nbdata filesystem size exceeds 90% at times, plus hits 100% during backups and some image cleanup jobs.
 

Error Message

Example, backup job failure snippet:

Sep 22, 2021 12:26:01 AM - Error bpbrm (pid=8301) db_FLISTsend failed: cannot connect on socket (25)
Sep 22, 2021 12:26:01 AM - Warning bptm (pid=9001) cannot update image database to add completed fragment, error = cannot connect on socket
Sep 22, 2021 12:26:01 AM - Info bptm (pid=9001) EXITING with status 25 <----------

Example, image cleanup failure:

Sep 22, 2021 12:33:49 AM - Info nbdelete (pid=9101) deleting expired images. Media Server: <flex_app_server_name> Media: @aaaaX
Sep 22, 2021 12:33:49 AM - Error nbdelete (pid=9101) Cannot obtain resources for this job : error [2027]
Sep 22, 2021 12:33:49 AM - requesting resource  @aaaaX
Sep 22, 2021 12:33:49 AM - Error nbjm (pid=8302) NBU status: 2027, EMM status: Media server is not active
Sep 22, 2021 12:33:49 AM - Error nbjm (pid=8302) NBU status: 2027, EMM status: Media server is not active
Media server is not active  (2027)

 

Cause

The /mnt/nbdata filesystem was designed to hold configuration data, not logs or large VxUpdate packages (.sja files). If the /mnt/nbdata/usr/openv/var/proxy_peers directory does not have space to create the needed subdirectories and files, the instance will stop accepting connections. (ref: status 25)

Additionally, since this particular mount controls the connections to the appliance Docker instances and it is becoming full (100%), this will potentially cause the Media Server to be reported in a "down" state, similar to what happens when the filesystem fills up. (ref: status 2027)

In this example...

  • The VxUpdate packages (.sja files) in the /usr/openv/var/global/repo/nb directory were taking up a large amount of space, which was impacting the connections, as mentioned above, and reporting the media server as down.

Note: If this is for Flex 1.1 and 1.2, see known issue in Related Articles below.
 

Solution

Open a Technical Support case with the Flex Appliance team to assist with one or more of the following options:

  1. Cleaning up the /mnt/nbdata filesystem.
  2. Resizing the /mnt/nbdata filesystem.
  3. Workaround: Creating supported symbolic links to free up space on the /mnt/nbdata filesystem.

 

Applies to:
Filesystem Mounted on /mnt/nbdata Size Use% at 100%
Flex Appliance 2.0.1 (not a version specific issue)

References

JIRA : APPSOL-142633

Was this content helpful?