Veritas VxVM and SystemD integration

Article: 100034008
Last Published: 2022-01-11
Ratings: 3 2
Product(s): InfoScale & Storage Foundation

Problem:

From RHEL7 and SUSE12 a new startup solution was introduced.  -  SystemD.  Previously SysV scripts started sequentially. SystemD will start its services in parallel. This highlighted a timing issue for "services" that are dependent on a mount point that belongs to a VxVM device being available at the time the "service" is started.

 

Appliance EEB: https://www.veritas.com/support/en_US/article.100048097

 

Cause:

  1. Udev integration is needed  to register the /dev/vx/dsk/<dg>/<volume>  back to systemd
  2. VxVM requires the  udev daemon to re-read the udev rules after VxVM has fully started up
  3. There is a requirement to create service dependencies by adding a order-after-vxvm.conf entry into /etc/systemd/system/service.service.d.  
  4. Modify the remote-fs-pre.target so that it has a dependency on vxvm-boot.service.
  5. SystemD default timeout of 90 seconds is insufficient time to allow for veritas services to startup and the devices to be made available via UDEV.
  6. VEKI startup script does not include a LSB header, newer versions of SystemD set the dependency order for the veki startup script to start after vxvm.
  7. VxVM startup script has a dependency on the following services "iscsi iodrive raw" which means vxvm will start later and after the operating system has probed all devices.
  8. The "nofail" attribute for Veritas VxVM devices in /etc/fstab causes a shutdown issue for other SystemD services that are dependent on the VxFS filesystem.
  9. Systemd-udevd: inotify_add_watch(7, /dev/VxDMP[X], failed: No such file or directory messages in the journal/messages log.
  10. systemd-fsck reports the following messages "systemd-fsck[xxxx]: UX:vxfs fsck.vxfs: ERROR: V-3-20113: Cannot open: No such device or address"
  11. VxVM has an additional 2-minute sleep to wait for InfiniBand devices to become available. this can create an additional delay if you are not using these devices in your environment.
  12. With the native SystemD services for VeKI.service and VxFS.service we have simplified the dependency order
  13. SystemD VxFS .mount units may reach the SystemD default timeout and processes called by mount can be subsequently killed
  14. Volumes that are under an RVG will report IO error as the RVG has not yet be enabled/active
  15. Systemd-fsck@[xxx] start failed with result 'dependency'.
  16. The NetBackup service may fail if VxVM volumes are not yet available.
  17. VxFS checkpoints included in the /etc/fstab are not getting mounted at boot.
  18. Added vcs_vols attribute to /etc/vx/systemd/system.conf
  19. Enabled the "-b" option for NetBackup (NBU). Certain InfoScale (ie VxVM/VxFS) versions include the IA generator scripts, but not the NBU generator scripts.
  20. Updated the NBU generator script to work with the NBU Build Your Own (BYO) diskgroups and volumes.

 

Solution

Point: 1:

This issue has been addressed  in a hot-fix (HF) for VxVM, please contact your local support to obtain the latest HF: (3897047) 
NOTE: The HF does not address the issue where you are passing the filesystem check option for the device in the /etc/fstab.  [For this see points 10 and 15]
  
Point: 2:

This issue has been addressed  in a hot-fix (HF) for VxVM, please contact your local support to obtain the latest HF: (3917636)   

Point: 3:

If you run “vrts-systemd.sh -ec <service> -m <mount_point1>, <mount_point2>”  this will set up the dependency on the custom service to start after VxVM services have started.
 
This will add a dependency to the “service” for the dependant mounts so it waits for the required mount points to be mounted before the “service” is then started.

Point 4:

A SystemD generator script has been created to add a dependency on the “vxvm-recover.service” for the remote-fs-pre.target unit. 

Point 5:

The SystemD default timeout of 90 seconds is insufficient time to allow veritas services to startup and the devices to be made available via UDEV. new version of SystemD allows you to set this at a device level.

When you enable vxvm/systemd integration it will add a SystemD generator script to set a default of 300 seconds for vxvm devices, if this should be insufficient you can increase the value in the /etc/vx/systemd/system.conf file

Point 6:

This issue has been addressed  in a hot-fix (HF) for VeKI, please contact your local support to obtain the latest HF: (3923934)

This issue has currently only been seen on SUSE Linux.

Point 7:

The VxVM startup script has a dependency on the following services "iscsi iodrive raw" which means vxvm will start later and after the operating system has probed all devices. Cases, where customers have not installed these packages SystemD will move the dependency for vxvm-boot.service to basic.target and so Volume manager starts earlier up in the boot-up sequence and whilst LUN are still being probed by the operating system.

This means that not all devices are seen at the time VxVM has started up and you will need to run a "vxconfigd -k" or "vxdisk scandisks [new]" for volume manager to pick up the remaining disks.

To allow the operating system time to probe all the devices before VERITAS Volume Manager has started we add a dependency on the network service in the /etc/init.d/vxvm-boot script.

please contact your local support to obtain the latest HF: (3925377)

Point 8:

Remove the "nofail" entries from VERITAS /dev/vx/dsk* devices in  the /etc/fstab as it has been noted that during a system shutdown the devices marked with "nofail" in the /etc/fstab are unmounted before the SystemD services that are dependent on them are shutdown.

When running “vrts-systemd.sh -ex” it will check if “nofail” has been added to the /etc/fstab and will remove this automatically.

Point 9:

The latest 99-vxdmp-remove-blockdev.rules that is bundled with this knowledge document addresses the issue.

Point 10:

This is due to a timing issue between UDEV reporting back to SystemD that the device is available and the volume state.  

Messages logged:
systemd-fsck[xxx]: UX:vxfs fsck.vxfs: ERROR: V-3-20113: Cannot open : No such device or address
systemd-fsck[xxx]: fsck failed with error code 31.systemd-fsck[xxx]: Running request emergency.target/start/replacesystemd[1]:  

SystemD is trying to run a FSCK on a device that has not yet been enabled.   the  vx-vols-systemd-alias script  included within the vxvn/systemd integration knowledge document includes the necessary changes. Please run “vrts-systemd.sh -ex” to implement this.


Point 11:

This issue has been addressed in a hot-fix (HF) for VxFS, please contact your local support to obtain the latest HF: (3947265) 

Point 12:

This issue has been addressed in a hot-fix (HF) for VeKI/VxFS for the InfoScale 7.3.1 product, please contact your local support to obtain the latest HF: (VeKI:3954504/ VxFS:3952837) 

Point 13:

Occasionally the VxFS filesystem will require a FSCK or logreplay during the mount (if the VxFS File-System has not been unmounted cleanly).
To try and avoid the timeout situation we have created the following SystemD generator script to increase the [vxfs].mount timeout for VxFS mounts that are in the /etc/fstab and created a [vxfs].mount dependency for VxFS mounts that are being exported via NFS.


Point 14:

Volumes that are under an RVG  (Replicated Volume Group) will report an IO error similar to the messages you see below:

Messages logged:
systemd[1]: Starting File System Check on /dev/vx/dsk/vvrdg/vvrdata1...
systemd-fsck[4977]: UX:vxfs fsck.vxfs: ERROR: V-3-20113: Cannot open : No such device or address  
systemd-fsck[4977]: fsck failed with error code 31.
systemd-fsck: UX:vxfs fsck.vxfs: ERROR: V-3-20005: read of super-block on /dev/vx/dsk/vvrdg/vvrdata1 failed: Input/output error 

This is due to the fact that the volumes are in an RVG and this has yet to be enabled.  We have modified the vx-vols-systemd-alias script to check if the volume is in an RVG and take the following actions

• If the RVG is in a "RECOVER" state we will start the VVR processes and run a "vxrvg recover" on the RVG that contains the volume.
• If the RVG state is "DISABLED" we will start the RVG so that it's in an "ENABLED" state. This will make the volumes accessible.

Point 15:

The timeout value for systemd-fsck  for VxVM volumes where the fsck passno has been set in the fstab.

Messages logged:
systemd: Dependency failed for File System Check on /dev/vx/dsk/testdg/testvol.
systemd: Job systemd-fsck@dev-vx-dsk-testdg-testvol.service/start failed with result 'dependency'.
systemd: Job dev-vx-dsk-testdg-testvol.device/start failed with result 'timeout'.
systemd: Job dev-vx-dsk-testdg-testvol.device/start timed out.
systemd: Timed out waiting for device dev-vx-dsk-testdg-testvol.device.

Point 16:

The Netbackup service may fail if VxVM volumes that it requires have not yet started, please implement the VxVM/SystemD integration script. This will add a SystemD generator script. For NBU appliances it will check for the “nbuapp” disk-group volumes in the /etc/fstab and add the necessary mounts to the dependency order.  

For systems running build your own environments and where the Netbackup product is installed and entries exist in the fstab for  /usr/openv or /opt/openv the SystemD generator script will check for these mounts and will add a dependency for the Netbackup service.    

These volumes are defined via the  “nbu_vols” attribute in the /etc/vx/systemd/system.conf file. 

Point 17:

It was been observed that VxFS check-points added to the /etc/fstab were not getting mounted at boot time. Additionally the "mount -a" command would not mount the check-points. The vx-vols-systemd-alias script and VxFS generators scripts have been updated to map the check-point devices to the underlying filesystem.

Point 18:

Added the vcs_vols attribute to /etc/vx/systemd/system.conf file. See “VCS integration with SystemD services or mounts” knowledge document in the “Related Knowledge Base Articles” below.

 

Point 19:

To enable the Netbackup script use "vrts-systemd.sh -eb". This will set up two options in /etc/vx/systemd/system.conf if they don't already exist.

  • nbu_dg : Use this if you have diskgroups that need to be mounted before the netbackup.service is started
  • nbu_vols: Use this if you wish to specify certain volumes that need to be mounted before the netbackup.service is started, but not all the volumes in the Diskgroup are required.

 

Point 20:

The generator scripts have been modified to remove the hard-coded check for nbuapp and will now check for diskgroups that have been set in the "nbu_dg" attribute in the /etc/vx/systemd/system.conf file. This has the following default "nbuapp" and "vnetbckup-cat-vg"  diskgroups predefined.

nbu_dg="nbuapp vnetbkup-cat-vg"
nbu_vols="/usr/openv /opt/openv"



Installation instructions:

Please use the self-extracting attached “vrts-selfextract.sh” that has the available scripts mentioned in the points above:
# sh vrts-selfextract.sh     [This will extract the vrts-systemd.sh and the vrts-systemd directory to the current working directory]
# chmod u+ vrts-systemd.sh
# ./vrts-systemd.sh -ex   to install the latest files to their respective locations. Please use the self-extracting attached “vrts-selfextract.sh” that has the available scripts mentioned in the points below:
# systemctl disable vxvm-fstab-automount.service  [this will disable vxvm-fstab-automount.service as this is no longer required]

If the timeout values are not sufficient for your environment, these can be changed in the /etc/vx/systemd/system.conf file:

vxvm_device_timeout=300
vxfs_mount_timeout=300
vxfs_fsck_timeout=300
nbu_vols="/usr/openv /opt/openv"

 

 

References

JIRA : SDIOCFT-265

Was this content helpful?