Important Update: Cohesity Products Knowledge Base Articles
All Cohesity Knowledge Base Articles are now managed via the Cohesity Support Portal: https://support.cohesity.com/s/searchunify. The Knowledge Base articles available here will not reflect the latest information or may no longer be accessible.
The NetBackup Web Management Console (nbwmc) service fails to start after a cluster switch or failover.
Problem:
After an install or an upgrade to NetBackup the NetBackup Web Management Console (nbwmc) service fails to start when the NetBackup group is switched or fails-over to the secondary node in a Linux - VERITAS Cluster Server (VCS).
Error Message:
Example: Only the error messages are included below.
/usr/openv/wmc/webserver/logs/catalina.2019-08-12.log
12-Aug-2019 11:41:57.100 SEVERE [main] org.apache.catalina.core.StandardService.initInternal Failed to initialize connector [Connector[com.netbackup.tomcat.connector.nio.NBHttp11NioProtocol-3652]]
org.apache.catalina.LifecycleException: Failed to initialize component [Connector[com.netbackup.tomcat.connector.nio.NBHttp11NioProtocol-3652]]
Caused by: org.apache.catalina.LifecycleException: Protocol handler initialization failed
Caused by: java.lang.IllegalArgumentException: /usr/openv/var/global/wsl/credentials/nbwebservice.jks (Permission denied)
Caused by: java.io.FileNotFoundException: /usr/openv/var/global/wsl/credentials/nbwebservice.jks (Permission denied)
12-Aug-2019 11:42:07.711 SEVERE [localhost-startStop-2] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.springframework.web.context.ContextLoaderListener]
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'entityManagerFactory' defined in com.netbackup.config.PersistenceContext: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean]: Factory method 'entityManagerFactory' threw exception; nested exception is java.lang.IllegalArgumentException: DataSource must not be null
Caused by: java.lang.IllegalArgumentException: DataSource must not be null
In another example:
[Thu Aug 08 10:42:29 EDT 2019 - C3P0PostProcessingDataSource] C3P0PostProcessingDataSource could not be created. Could not get the DB password: Exception: Could not read the pwd as /usr/openv/var/global/wsl/config/web.conf does not exist
When Web Services are started the ' catalina ' log is updated, the first lines show the problem (permission denied):
23-Jun-2022 11:32:07.762 WARNING [main] org.apache.catalina.startup.Catalina.parseServerXml Unable to load server configuration from [/opt/VRTSnbu/var/global/wsl/webserver/conf/server.xml] <<<<< CORRECT path java.io.FileNotFoundException: /opt/VRTSnbu/var/global/wsl/conf/server.xml (Permission denied) <<<<< PATH is wrong, please ignore path.
Note: The wrong path " <installPath>/var/global/wsl/conf/server.xml " is defined for '' server.xml ', this is NOT the problem it's just a poor message.
Cause:
The two cluster nodes have a different "user id" and or "group id" for nbwebsvc and nbwebgrp. This was caused during the initial setup for nbwebsvc and nbwebgrp when another user and group on the second node already occupied the same "user id" and or "group id".
Example:
Notice that the "user id" and "group id" are not the same for nbwebsvc and nbwebgrp on each cluster node.
Node one:
# id nbwebsvcuid=500(nbwebsvc) gid=501(nbwebgrp) groups=501(nbwebgrp)
Node two:
# id nbwebsvcuid=502(nbwebsvc) gid=503(nbwebgrp) groups=503(nbwebgrp)
The reason why this failed on a cluster server is because the cluster contains a shared disk drive along with links for all of the NetBackup databases and the web server. When NetBackup is moved between nodes the binaries on the node remain but the shared disk that contains the databases and web server move from one node to the other.
To view the shared disk on the active node use the Linux 'df' command.
This is typical for NetBackup for VCS.
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vx/dsk/nbudg/dbvol
40G 493M 37G 2% /opt/VRTSnbu
The /opt/VRTSnbu mount is the shared drive.
The links are contained in the NBU_RSP file.
# cat /usr/openv/netbackup/bin/cluster/NBU_RSP
LINK=volmgr/misc/robotic_db
LINK=netbackup/db
LINK=netbackup/vault/sessions
LINK=var/global
This is the link for /usr/openv/var/global to /opt/VRTSnbu/var/global
# ls -l /usr/openv/var
lrwxrwxrwx 1 root root 23 Aug 12 13:49 global -> /opt/VRTSnbu/var/global
Explanation of why the users must be the same:
An initial install or upgrade of NetBackup is started on the active node. During the web server setup the nbwebsvc and nbwebgrp permissions are set not only for the files and folders on the node but also the files and folders on the shared disk for the databases and web server.
After the active node is up and running the passive node is then upgraded. However, the passive node does not have the shared disk or any of the shared web components. Therefore, the install/upgrade only sets permissions on the files and folders on the passive node.
In this example, the setup/upgrade on the active node set permissions for the files and folders on the node, the shared databases and the shared web server for nbwebsvc and nbwebgrp with user id and group id of 500 and 501 respectively.
On the passive node the permission for the files and folders on the node were set exactly same for nbwebsvc and nbwebgrp except that the user id and group id are not 500 and 501 but instead 502 and 503 respectively.
Now when the NetBackup server is switched, moved or fails-over to the passive node the node files and folders have the appropriate permissions to start the NetBackup Web Management Console (nbwmc) using nbwebsvc and nbwebgrp "user id" and "group id" of 502 and 503 but the shared disk web server files have a different nbwebsvc and nbwebgrp "user id" and "group id" of 500 and 501. During nbwmc startup, the catalina log reports permission denied for the shared files and the web server doesn't start.
Example:
After the failover and nbwmc doesn't start we file permissions are set for 500 501 not for nbwebsvc and nbwebgrp.
# ls -ln /usr/openv/var/global/wsl/credentials/nbwebservice.*-rw-r----- 1 500 501 2.8K Aug 11 17:25 /usr/openv/var/global/wsl/credentials/nbwebservice.bcfks
# ls -ln /usr/openv/var/global/wsl/config/web.conf-rwxrwx--- 1 500 501 68 Aug 11 17:22 /usr/openv/var/global/wsl/config/web.conf
These are same files but using the direct path on the shared disk. Note the same permissions for 500 501 not nbwebsvc and nbwebgrp.
# ls -ln /opt/VRTSnbu/var/global/wsl/credentials/nbwebservice.*-rw-r----- 1 500 501 2.8K Aug 11 17:25 /opt/VRTSnbu/var/global/wsl/credentials/nbwebservice.bcfks
# ls -ln /opt/VRTSnbu/var/global/wsl/config/web.conf-rwxrwx--- 1 500 501 68 Aug 11 17:22 /opt/VRTSnbu/var/global/wsl/config/web.conf
Solution:
Ensure that the user id and group id are the same for nbwebsvc and nbwebgrp on both nodes.
In this scenario, the administrator created the nbwebsvc and nbwebgrp using TN100023872 (see related articles). However, on node 2, there was another user and group occupying user id 500 and group id 501.
Comment: In this scenario, the nbwebsvc and nbwebgrp are using 500 and 501 but the user and group may vary in every environment. It's important that they match on each node not that they occupy 500 and 501.
1. Check for and remove any user and group occupying user id 500 and group id 501.
2. To assign a new UID to user nbwebsvc:
# usermod -u 500 nbwebsvc
3. To assign a new GID to group nbwebgrp:
# groupmod -g 501 nbwebgrp
4. Verify:
# id nbwebsvcuid=500(nbwebsvc) gid=501(nbwebgrp) groups=501(nbwebgrp)
Next, reset the file permissions on the second node.
5. cd /usr/openv/wmc/bin/install/
6. # ./wmcUtils -changeUser nbwebsvc nbwebgrpStopping NetBackup Web Management Console daemon..
Updating bp.conf with new values for user and group..
Calling configureEnv to update the new user and group values.
Calling setupWmc to set the required permissions for the new user and group..
Starting NetBackup Web Management Console daemon..
7. Now that the user and group match and permissions on the second node match those of the first node, switch the NetBackup group over to the second node.
8. Verify that file and folder permissions are now set to the nbwebsvc nbwebgrp
# ls -l /opt/VRTSnbu/var/global/wsl/credentials/nbwebservice.jks-rw-r----- 1 nbwebsvc nbwebgrp 2778 Aug 11 17:25 /usr/openv/var/global/wsl/credentials/nbwebservice.bcfks
# ls -l /opt/VRTSnbu/var/global/wsl/config/web.conf-rwxrwx--- 1 nbwebsvc nbwebgrp 68 Aug 11 17:22 /opt/VRTSnbu/var/global/wsl/config/web.conf