Backup Failure - MrxSmb ID 50 - Delayed Write Failed
Every night, a small proportion of the SQL backups fail with multiple instances of the following error in the System Event log: MRxSmb Event ID 50. Different DB's have the error each night. The symptoms are very similar to those described in KB890352. However, the article only applies to Win 2003 32-bit before SP2. Our servers have SP2 and the one with this error is 64-bit.
Server A is a clustered SQL Server 2005, and is the source of the failing SQL Server backups. It is running 2003 R2 SP2 EE 64-bit.
Server B is the backup server. It is running 2003 R2 SP2 EE 32-bit.
Error in Server A's System Event Log:
"{Delayed Write Failed} Windows was unable to save all the data for the file \Device\LanmanRedirector. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.".
The data comes from SAN 1 attached to Server A. The SQL Server backup jobs backs up the data across a 1Gb network to Server B which has backup LUNs attached to SAN 2. Server A uses a UNC path to connect to Server B for backups.
The “delayed write failed” errors coincide with writing large backup files. These cause very high Disk Queue Length (7 – 30) and 100% disk load to be reported for the backup LUN on Server B.
The MrxSMB Event ID 50 has a status code (the final octet) of c000020c. This translates as “Connection Disconnected”. This is the same status code described in KB890352.
Moving the backups so that they are saved to local disk on a third server, Server C, eliminates the Delayed Write Failed errors. However, Server C does not have enough free space to hold all of the backups.
Manual file copies of very large files between Server A and Server B using the UNC path does not cause the event log error. This implies that the problem is with SQL Server 2005's backup job process.
We have changed the network topology so that Server A and B are in the same VLAN.
All servers have all the latest BIOS, firmware and drivers.
AntiVirus is configured to exclude BAK, TRN, MDF and LDF files.
Any help would be appreciated!
All Replies
- By backing up to a network share (which is what I am interpreting you are doing), you're effectively using NAS. I assume you read the KB regarding SQL and NAS: http://support.microsoft.com/default.aspx/kb/304261. I have a feeling this is related to your problem, but if it is, the problem really isn't SQL Server. Copying files from A to B has nothing to do with how SQL interacts with NAS.
Or are you using a third party backup software program and not using SQL Server to generate backups?
Allan Hirt Blog: http://www.sqlha.com/blog Author: Pro SQL Server 2008 Failover Clustering (Apress - due out June, 2009) - Thanks for your input Allan.With reference to the article you mention, the database files are stored on a SAN local to the SQL Server. It is only the backup that traverses the network. This is a supported configuration.We are using in-built SQL Server backup jobs. not a third-party tool.
- Hey Hexen,Any luck on this? I am having a similiar issue. I am replicating data over my WAN to a NAS device. It is a bit more complicated then your setup but basically the same thing. I was wondering if you found a solution that might work for me.Thanks,Mark Breaux
- I also am having the same problem. any help or feedback from your experience would be great. thanks