Error on a request to write a media location identifier during long backups to Deduplication folders across a WAN.
Problem
During a backup to a deduplication folder, when new media is required, an error pops up indicating that Backup Exec could not write a media location identifier (file mark) to the media. This can happen if the job is taking a long time to fill up a piece of media or complete a job (more than 30 minutes per media). During this time, Backup Exec holds open a session that is not transmitting any data. A variety of devices along the route between the remote agent and the media server can interpret this lack of data transmission as an abandoned session and arbitrarily terminate the session.
This problem typically happens at approximately the same amount of time into an operation. That is, if the job keeps failing at the the same times (e.g. 1.5 hours), then this may be the cause of the problem.
Error Message
Storage device "Dedupe Folder:2" reported an error on a request to write a media location identifier (file mark). Error reported: A device attached to the system is not functioning.
Cause
Turning on the Debug Monitor for active debugging (SGMon) shows the following errors
[5216] 09/09/10 13:34:22 ndmp_readit: Caught message on closed connection.
Socket 0xc2c len 0xffffffff
[5216] 09/09/10 13:34:22 ndmp_readit: ErrorCode :: 10054 : An existing
connection was forcibly closed by the remote host.
This indicates that one of Backup Exec's sessions from the remote agent to the media server was terminated. The next time this session is used to communicate between the remote agent and the media server, an error is reported.
Note: Refer the related article on how to use SGMON for debug.
Solution
Note: As of Backup Exec 2010 R3 Hotfix 159965, Backup Exec has implemented its own keep alive message across deduplication connections. After this Hotfix, it should no longer be necessary to set the system wide keep alive settings below.
There is a setting in Windows that causes the operating system to send "Keep Alive" messages across connections that are otherwise inactive. Set KeepAliveTime and KeepAliveInterval to five seconds (5000ms ). Please consult your Windows documentation for specific instructions on how to do this.
Note that while this has resolved the problem in most cases, there are situations where intervening routers are ignoring the keep alive messages and terminating the session after a set period of time any way. If this is happening, the configuration of those routers must be changed to resolve this problem.
Applies To
This issue is most often associated with a backup that is happening over a slow link (such as a WAN).