VIDEO: NetBackup Support Screencast Demo: Troubleshooting Exchange Backup and Restore Problems with NetBackup 6.5 and 7.0
The video linked to is a part of the NetBackup Support Screencast Demo Video series. It discusses Troubleshooting Exchange Backup and Restore Problems with NetBackup 6.5 and 7.0.
TRANSCRIPT OF VIDEO:
Welcome to the NetBackup Support Screencast Demo Video series.
These videos deliver how to demonstrations in a variety of NetBackup functions.
They assume fundamental NetBackup knowledge.
If you need basic netbackup training, please go to http://education.symantec.com where you will be able to find a listing of instructor-led classroom training as well as self-paced computer-based courses for NetBackup.
This video is Troubleshooting Exchange Backup and Restore Problems with NetBackup 6.5 and 7.0. Within this video we will discuss:
1. Errors which most likely indicate a problem that is not specific to Exchange backups and restores (such as connectivity issues and tape drive issues).
2. Checking setup and usage (both environmental and NetBackup aspects and several situation-specific considerations).
3. Differential Problem Analysis
4. Which logs to enable and review for troubleshooting Exchange backups andrestores.
5. Resources to utilize to continue troubleshooting
Narrowing the Scope: Exchange or Not?
Within the broad scope of NetBackup problems, some errors will be related to the Exchange client portion of the operation, but other issues will have nothing to do with the Exchange client and need to be pursued from other angles that are not addressed in this video. For instance, a problem with a tape drive will not be best resolved by looking at the Exchange policy configuration.
Therefore, to maximize your troubleshooting, we need to first identify what area of NetBackup has the problem. Often, this is best done using the error code for the job.
Media Manager Issues
Media Manager issues are problems related to the media server and writing the backed up data to media (whether tape or disk). While there are a number of media manager status codes, ones that most commonly occur are shown here:
79 could not connect to vmd on host
83 media open error
84 media write error
85 media read error
86 media position error
213 no storage units available for use
800 resource request failed
830 EMM database: drives unavailable or down
These codes most likely indicate a media manager issue in NetBackup that is not specific to Exchange.
NetBackup, as the name implies, is a heavy user of your network. Therefore, any issues in connectivity, whether poor network performance, name resolution problems, routing issues or a variety of other issues, can cause NetBackup failures. While there are a number of networking or connectivity status codes, ones that most commonly occur are shown here:
23 socket read failed
24 socket write failed
25 cannot connect on socket
41 network connection timed out
42 network read failed
46 server not allowed access
57 client connection refused
58 can't connect to client
59 access to the client was not allowed
These codes most likely indicate a networking issue affecting NetBackup that is not specific to Exchange.
Setup and Usage:
If the status code is not related to media manager or connectivity issues, the next step is to confirm that the setup and usage are correct. While this may seem too basic of a step, incorrect setup and usage is the most common cause of problems within NetBackup. If a NetBackup Support Screencast demo video exists for the activity you are trying, please review it and the operation you are attempting to accomplish. If a video does not exist, please review the documentation for what you are trying to do.
As part of the review of your setup and usage, there are some specific environmental factors which lie outside NetBackup that should be reviewed. Specifically, please check:
For both Backups and Restores:
Service Account permissions: If a service account is being used for the NetBackup client service on the Exchange server, ensure it has the correct permissions. Normally, it should be a Domain account with local administrative rights, as well as full Exchange administrative rights.
Exchange Database status: If an Exchange database is not online and healthy, it may cause a backup to fail. Check this in Exchange System Manager, correct any problems and retry the backup.
Microsoft Volume Shadowcopy Service (VSS): Exchange snapshot backups rely heavily on Microsoft Windows’ Volume Shadowcopy Service (VSS) framework. Problems with VSS can cause snapshot backups to fail. Run the vssadmin list writers command on the Exchange server to get a status of the VSS Exchange Writer and any VSS Hardware Provider Writers that may be installed. If problems are seen in the vssamdin output, please address those and retry the backup.
Replication Status: Database replication is common within Exchange 2007 and 2010 environments; specifically, Cluster Continuous Replication (CCR) for Exchange 2007 and Database Availability Group (DAG) for Exchange 2010. Replication problems can cause backups to fail or, if the backup is successful, they can cause Exchange to not truncate the transaction logs.
To check the replica copy status, the commands shown here can be used in the Exchange command shell:
For Exchange 2007: Get-StorageGroupCopyStatus -Server <mailbox_server>
For Exchange 2010: Get-MailboxDatabaseCopyStatus -Server <mailbox_server>
To check the Exchange 2010 DAG: Get-DatabaseAvailabilityGroup –status
If problems are seen with replication, please resolve them and retry the backup.
Maintenance Jobs: Maintenance jobs for either Exchange or Windows scheduled to overlap with backup jobs may cause backup jobs to fail. It is strongly recommended that Exchange server maintenance jobs and NetBackup Exchange backup jobs are configured so that they do not conflict with each other.
Hidden Administrative Shares: Exchange restores require the default hidden administrative shares exist (e.g. \\exchangeserv\R$ for the R: drive on the mailbox server called exchangeserv). Failure to have this hidden share is sometimes seen on volumes that were added to the Exchange server for the purpose of the restore.
Recovery Storage Group (RSG) paths: For Exchange 2003 RSG restores from snapshot backups, Exchange requires that the system path and the log path be the same.
Additionally, within NetBackup there are several items that we recommend checking because we have seen a number of failures result from them:
Service account mailboxes: If you have a NetBackup service account configured, you will need a mailbox associated with that account. The mailbox must not be hidden from the Global Address List in Exchange and must be initialized by having sent and received e-mail.
Cluster hostname usage: When backing up or restoring an Exchange cluster, use the Exchange virtual cluster hostname rather than the physical node name of the Exchange server.
Snapshot method: Exchange 2010 backup policies must be configured for VSS snapshots. Failure to do so will result in a status code 72.
Offhost Backup Operating Version: When configuring the NetBackup Exchange policy for an off-host backup, make sure that the off-host is the same Windows version and hardware platform as the Exchange mailbox server being backed up.
Restores to an Exchange cluster: Restores to an Exchange cluster should use a destination client name of the Exchange cluster virtual host name, not the physical node name of the server. Because the Destination Client field cannot be specified on a restore from a client Backup, Archive, Restore (BAR) GUI, restores to an Exchange cluster must be done from the master server’s BAR GUI or a Remote Administration Console host.
Roll-Forward Recovery: When restoring from a full backup that occurred BEFORE another Exchange full backup had been done, a gap will exist in the transaction logs because they were truncated after the second full backup. This gap will cause the Roll-Forward Recovery to fail, so a Roll-Forward should not be attempted.
Differential Problem Analysis
In addition to checking the general environment, configuration and utilization, identifying differences between working and non-working backup and restore operations or differences in the success of the same operation over time can help isolate the source of the problem.
Answering the following questions will help in this process:
1. Did the failed operation work correctly in the past? If it did, what changes occurred between the time it worked and when it did not?
2. Is the problem intermittent? That is, does the operation sometimes succeed and sometimes fail and what differences exist between when it works and when it does not work (for instance, full backups always fail, but incremental always succeed, or backups always fail at 3pm but always work at 4pm)?
3. Are there any similar operations that fail in a similar manner (for instance, two separate clients have the same failure or a file system backup on the affected Exchange client is also failing)?
4. Are there are any similar operations that succeed (for instance, does one Exchange server fail to back up, but all others back up successfully)?
Logging and Research
Before turning on diagnostic logging within NetBackup, it is easiest to look at the basic information and error messages provided by NetBackup and Exchange. Review the job’s detailed status information in the Activity Monitor and the BAR GUI’s job Progress Log. The job’s detailed status can be found by double-clicking on the failed NetBackup Job in Activity Monitor. Look for errors, especially errors related to BEDS. Make note of the approximate time of the error as well as the exact error text and code (including the hexadecimal error code of it is a BEDS error).
The job Progress Log can be found in the NetBackup Backup, Archive and Restore (BAR) GUI. In the Windows BAR GUI it can be found by clicking the View Status button in the toolbar or found in the File menu. In the Java BAR GUI, the Progress Log can be found under the job’s Task Progress tab. The Progress Log will contain much information that is very similar to the job detailed status information, but will often contain more detail.
Because NetBackup interacts with Exchange through an Application Programming Interface (API), problems with backups or restores may occur within Exchange, but be observed as errors in the NetBackup job. Frequently, if an underlying Exchange issue occurs, it will be reported within the Windows Application Event Log or System Event Log. Thus, it is VERY important to review these logs in addition to the NetBackup job information.
The Windows Event Logs should be reviewed on all involved hosts, including all cluster nodes and the host used for any off-host backups. In the Application Event Log, relevant messages will mostly be seen coming from processes beginning with Exchange or ESE (which are both for Exchange errors) or VSS (for snapshot and replication errors). Review any warning or error messages occurring at or around the time of the NetBackup errors. Make note of the specific event IDs and error or warning text.
Once you have collected information about the errors showing in the NetBackup Activity Monitor and the Windows System Event Log and Application Event Log, please search within the NetBackup Support Website: (http://www.symantec.com/business/support/index?page=landing&key=15143
). Of special note when going to the NetBackup Support Website are the “Late Breaking News” bulletins that are on the NetBackup Support Website under “Known Issues”. The Late Breaking News bulletins have been created to provide updates on Documentation and Known Issues discovered post-release. These documents attempt to highlight the most common known issues and concerns reported by customers. Therefore, we strongly recommend reviewing these regularly to ensure awareness of issues that make affect you. These may be particularly relevant if you are encountering a problem shortly after upgrading NetBackup to a new version.
In addition to searching on the NetBackup Support Website, if you observe any Windows or Exchange warning or error messages in the Windows event logs, please use your preferred web search engine to search for information related to the cause of those errors.
If the problem cannot be resolved through the avenues discussed, additional higher-verbosity logging will need to be gathered to troubleshoot. For a discussion of how to enable logging, please see our Technote “Best Practice recommendations for enabling and gathering NetBackup logging" at this URL: http://www.symantec.com/docs/TECH47372
When troubleshooting Exchange issues, the specific logs enabled will vary by the type of operation being performed.
For all backups and restores, the following logs will typically be relevant
· NetBackup master server: legacy logs bprd and bpdbm; Veritas Unified Logging (VxUL) logs nbjm and nbpem
· NetBackup media server: legacy logs bpbrm and bptm
For Backups of Exchange:
· Streaming Information Store and Legacy Mailbox/Public Folders
- Legacy log bpbkar on the NetBackup Client (can get large with Legacy Mailbox/Public Folders and high verbosity)
· Snapshot (including Exchange 2007 CCR and Exchange 2010 DAG)
- Legacy logs to set up on each NetBackup Client node if clustered:
- bpbkar (usually no bpbkar log on active node if passive node backup)
- bpfis (can get large with high verbosity and many transaction logs)
- beds (for Exchange 2007 and higher)
- Legacy log bpresolver for Exchange 2010 –on the node that DAG resides on
· Granular Mailbox/Public Folders
- legacy logs on the Exchange client: bpcd, bpbkar, nbfsd, and beds
- For Exchange 2003 and 2007 Exchange clients with NetBackup 6.5: legacy log ncf
- For Exchange 2003 and 2007 Exchange clients with NetBackup 7.0: VxUL log ncflbc
- For Exchange 2010 clients: VxUL log ncflbc
- On the media server: legacy log nbfsd
- Same as Snapshot but collected from both client and off-host
- If using VCS cluster, collect the vxvm log
For Restore on Exchange
- Legacy log tar on the destination client
- Legacy log bpresolver for Exchange 2010 –on the node that DAG resides on
· Granular Mailbox/Public Folders
- Legacy logs bpcd, nbfsd, and beds
- For Exchange 2003 and 2007 with NetBackup 6.5: legacy log ncf
- For Exchange 2003 and 2007 with NetBackup 7.0: ncflbc (browse) and ncfgre (restore)
- For Exchange 2010: VxUL logs ncflbc (browse) and ncfgre (restore)
- On the media server: legacy log nbfsd
Reviewing NetBackup Logs
In the NetBackup logs, look for common strings that indicate an error or problem. While not the only indicators of a problem, messages that contain the strings shown here may be relevant:
Within Legacy logs: "ERR -", "FTL -", "<16>" or “<32>”
Within VxUL logs: "Error" or "Fail"
If you’ve pursued the prior methods for troubleshooting your issue without success, we would recommend contacting NetBackup Support if you have a valid support contract.
If you have a Severity 1 (Emergency) problem, please call your local Symantec Support telephone number. However, if you have a Severity 2, 3 or 4 issue and have done research and troubleshooting, one of the best and fastest ways to convey the information you have gathered about your problem is through the MySupport website, available through this website: http://www.symantec.com/business/support/index?page=cdlogin
. Our MySymantec portal will allow you to electronically open a case and update the case information with details of the problem. Additionally, you can upload any logs that you have gathered. We will then work with you to resolve the problem.
The video can be viewed by clicking on the following link: www.symantec.com/tv/products/details.jsp