Snapshot deletion failure should cause snapshot job to complete with a status 1
I have found an instance where a snapshot for a vmware backup fails to delete with a status 40, yet the snapshot job still completes with a status 0 like it was successful. Please see the excerpt from the job details below. When I brought this up to support as a flaw they stated this was intentional so that the deletion failure does not cause the loss of the backup image and because NetBackp deletes the snap the next time the backup runs. The problem with this approach comes up when you look at VMs that only get weekly backups or VMs with a lot of heavy IO. In both of these cases the snapshot could easily exhaust the free space in the datastore and cause the VMs in that datasotre to crash. Because a virtualized database server could easily meet both of the cases above, I think that this problem prevelant as more people begin to rely on VM database servers. I have firsthand experience with this as one of our cloud customer's production SQL VMs crashed after holding a snapshot for almost 24 hours.
My idea is that the snapshot job should complete witha status 1 to indicate that the backup completed with some minor errors, much like he way skipped files are handled in OS backups. This would alert the administrator that something was wrong, but would allow you to keep the backup image generated from the job.
5/21/2014 12:49:16 PM - end VMware, Validate Image; elapsed time: 0:00:00
5/21/2014 12:49:16 PM - begin VMware, Delete Snapshot
5/21/2014 12:49:18 PM - end VMware, Delete Snapshot; elapsed time: 0:00:02
5/21/2014 12:49:18 PM - end operation
the requested operation was successfully completed(0)