Video Screencast Help

NetBackup maintenance mode feature

Created: 27 Dec 2012
bbahnmiller's picture
9 Agree
0 Disagree
+9 9 Votes
Login to vote

NetBackup needs a “maintenance mode”. The product is complex enough that to shut it down for maintenance requires a bunch of steps, man of which have to occur in a specific order.

It would be nice to have a maintenance mode button on the gui (and a command line command) that would:

  1. Suspend all backup requests – similar to the “nbpemreq –suspend_scheduling”.
  2. Suspend Storage LifeCycle Policies (SLP’s)
  3. Suspend PureDisk operations.
  4. Suspend running backups (those that can be suspended.)
  5. List running backups that cannot be suspended and give option to kill the jobs or wait until finished.
  6. List running duplications – SLP’s or other – and give option to kill the duplications or wait until they finish.
  7. Wait for all tape drives to empty.
  8. Allow the maintenance mode to stay in effect over a reboot (or a re-cycle of the NetBackup daemons.)

We have a manual process we go through to make sure we can work on tape libraries, VTL’s, media servers, etc. It would be very nice to have a single point for this control. And having the ability to keep the system quiet when the master server has to be rebooted would be very beneficial as well. Many times you need to verify device configurations after a reboot, and having to pause everything after the reboot can be painful and time consuming.

Once you are done with the maintenance, you would also need to resume NetBackup operations.

  1. Wait for all media servers to become active for tape and disk.
  2. Resume scheduling - "nbpemreq -resume_scheduling".
  3. Resume all PD activities.
  4. Resume suspended backups.
  5. Restart killed backups (optional on a case by case basis.)
  6. Resume SLP's.
  7. Resume duplications.

There are probably some things I'm forgetting here, but this would be a good start. Oftentimes, after maintenance and reboots, there are a lot of things to clean up. And you end up with a lot of failed backups that really shouldn't be reported as failed. (Especially when you have automatic ticketing systems involved.)