none
Exchange 2010 Mailbox database corrupted

    Question

  • We experienced an unfortunate problem with the Hyper-v virtual hard drive files containing our Exchange server which ended up corrupting our only Mailbox database store.  That file size was about 16GB, and now is reported as 0KB and the system reported extra storage space.  Exchange cannot mount the Mailbox database.  I have ran ESEUTIL to repair the database, but errors were reported.

    The problem started because our 16GB of e-mail was generating over 220GB of transaction logs and it filled up the hard drive.  I attempted to increase the virtual hard drive size so I could use the server backup and enable circular logging to reduce the transaction logs, but ended up increasing the hard drive when a snapshot existed.  Long story short, the virtual hard drive has been merged with the snapshot and is finally operational, however, it appears that the Mailbox database store became corrupt in the process.

    I have all 220GB of transaction logs (current) as well as a full vhd backup and transaction logs from the November 6th backup.  I need to restore the Mailbox database store somehow.  Options I have thought about are to take the Mailbox database store from the backup and use the transaction logs from the current image to rebuild it.  Another option is to someone repair that corrupted database store and not use the backup.  Unfortunately, I cannot seem to find any documentation to do either.

    Any suggestions are welcomed.

    Thursday, January 27, 2011 11:14 PM

Answers

  • Take a copy of the current database file before you start.
    You then simply restore the database and the logs will replay on their own.

    I presume that you have been through the DR section of the help?

    http://technet.microsoft.com/en-us/library/dd876874.aspx

    Another option would be to repair the Exchange database. That is documented in numerous places as it hasn't changed since Exchange 2000. Once the database is successfully repaired, I would create a new database and migrate all content to it. I don't like to leave a repaired database in production.

    However the main advice I give in this kind of scenario is the same to everyone.

    Don't try it on your own. Call Microsoft support, pay their fee and get expert assistance. That will prove to be the most cost effective thing in the long run and will increase the chances of a successful restore considerably. They can do the one thing I (or anyone else on a forum) cannot, which is physically look at the server, ensure that everything is in place for a successful restore.

    Simon.


    Simon Butler, Exchange MVP
    Blog | Exchange Resources
    • Marked as answer by a.boell Thursday, February 03, 2011 5:00 AM
    Friday, January 28, 2011 12:06 AM
  • I decided to explore my idea of taking the logs and copying them to the backed up server copy (after completely backing up my progress thus far).  When I restarted the server, the database was not mounted and wouldn't mount with these extra logs in place.  At this point I attempted to run various commands from ESEUTIL just in case the hold up was something small.  That didn't seem to work.  Afterwards, I attempted to follow the Microsoft tech's second suggestion of removing the .edb file and see if the server will replay the transaction logs.  That did not work.  Then as a last resort, I went back to ESEUTIL and ran the recovery mode with a .edb file in place and it appears to be recreating the database from scratch.

    I did try this on my other server, however it didn't work there.  I really makes me wonder if there was more corrupted on that server than just the mailbox database.  That fact that it hasn't failed on me yet is remotely encouraging, unfortunately, this recovery mode has been running for about 10 hours and appears to only be about 16% completed.  At this point, I'm going to allow this process to finish as I feel this is the only possible way I'm going to experience a minimal loss of data.  If this doesn't work I plan to restore my backed up copy from November and allow the .ost and .pst files to repopulate it and deal with the mail loss for everyone else...that is unless anyone else has any other solutions.

    My concern now is what will happen to the messages that are being sent to me from outside.   All of our e-mail originates from a single location (an e-mail spam filter service that typically queues the e-mails for us when we're down).  I've noticed previously that 4 days seems to be about the max before messages completely die off.  I'm just not sure if there is anything I can do to prevent loss from these messages as we are approaching 3 days without being online.

    I will post the results of this scan back here and will check for other suggestions. 

    Thanks.

    • Marked as answer by a.boell Thursday, February 03, 2011 5:00 AM
    Saturday, January 29, 2011 2:17 PM

All replies

  • Take a copy of the current database file before you start.
    You then simply restore the database and the logs will replay on their own.

    I presume that you have been through the DR section of the help?

    http://technet.microsoft.com/en-us/library/dd876874.aspx

    Another option would be to repair the Exchange database. That is documented in numerous places as it hasn't changed since Exchange 2000. Once the database is successfully repaired, I would create a new database and migrate all content to it. I don't like to leave a repaired database in production.

    However the main advice I give in this kind of scenario is the same to everyone.

    Don't try it on your own. Call Microsoft support, pay their fee and get expert assistance. That will prove to be the most cost effective thing in the long run and will increase the chances of a successful restore considerably. They can do the one thing I (or anyone else on a forum) cannot, which is physically look at the server, ensure that everything is in place for a successful restore.

    Simon.


    Simon Butler, Exchange MVP
    Blog | Exchange Resources
    • Marked as answer by a.boell Thursday, February 03, 2011 5:00 AM
    Friday, January 28, 2011 12:06 AM
  • Simon,

    Thank you for your response.  I have taken your suggestion to contact Microsoft on this issue.  While the issue is not currently resolved, the support technician did have me place the database file from the backup into the server and attempt to have the logs replay on their own.  Unfortunately, that did not work.  His next suggestion was to completely remove the database file from the folder and have Exchange recreate the database by the logs since all logs since the server was established have been maintained.  This too, unfortunately, did not work.  His final suggestion was to create a new database and use the individual .ost and .pst files to repopulate the missing data.

    My problem with the last solution is not all of my users use a dedicated machine and therefore would not have an .ost file available.  What I am attempting to do now is I have turned off my mail server and fired up my backed up copy of my mail server from November (while disabling the ability to receive external messages), just to see how the interaction between Outlook on my desktop and the Exchange server will work.  Currently, my Outlook has populated 6 messages to the Exchange server.  I'm sure it will eventually work, but it appears to take time.  While I am still at risk of losing 20% of my staff's e-mails from the last 3 months, this is far better than only relying on the .ost files from those with Outlook. 

    Given the success of firing up the backup (minimal success, but right now I'll take anything), I wonder if I can inject the transaction logs from the current mail server to the backup to have it replay those.  Just a thought.

    Thanks,

    Andy

    Saturday, January 29, 2011 12:29 AM
  • I decided to explore my idea of taking the logs and copying them to the backed up server copy (after completely backing up my progress thus far).  When I restarted the server, the database was not mounted and wouldn't mount with these extra logs in place.  At this point I attempted to run various commands from ESEUTIL just in case the hold up was something small.  That didn't seem to work.  Afterwards, I attempted to follow the Microsoft tech's second suggestion of removing the .edb file and see if the server will replay the transaction logs.  That did not work.  Then as a last resort, I went back to ESEUTIL and ran the recovery mode with a .edb file in place and it appears to be recreating the database from scratch.

    I did try this on my other server, however it didn't work there.  I really makes me wonder if there was more corrupted on that server than just the mailbox database.  That fact that it hasn't failed on me yet is remotely encouraging, unfortunately, this recovery mode has been running for about 10 hours and appears to only be about 16% completed.  At this point, I'm going to allow this process to finish as I feel this is the only possible way I'm going to experience a minimal loss of data.  If this doesn't work I plan to restore my backed up copy from November and allow the .ost and .pst files to repopulate it and deal with the mail loss for everyone else...that is unless anyone else has any other solutions.

    My concern now is what will happen to the messages that are being sent to me from outside.   All of our e-mail originates from a single location (an e-mail spam filter service that typically queues the e-mails for us when we're down).  I've noticed previously that 4 days seems to be about the max before messages completely die off.  I'm just not sure if there is anything I can do to prevent loss from these messages as we are approaching 3 days without being online.

    I will post the results of this scan back here and will check for other suggestions. 

    Thanks.

    • Marked as answer by a.boell Thursday, February 03, 2011 5:00 AM
    Saturday, January 29, 2011 2:17 PM
  • If the antispam service only queues for four days, then I would be looking for another antispam service. This would be one of the key features that I would expect from a service like that. Four days isn't much more than regular MX record delivery.

    The easiest way to get the email in to somewhere under your control is to simply build a Windows 2003 server, install IIS and the SMTP component on it. Setup each domain that you Exchange server is responsible for in SMTP, and set it to allow relaying and to use a smart host. Enter the smart host as an internal IP address that is not in use (so the email queues). Change the timeouts on the SMTP server so email isn't rejected. Your email will then come in and queue on this server. When you have Exchange running again, change the smart host to the Exchange server and then it will deliver email correctly. Basically an SMTP gateway approach. http://www.amset.info/exchange/gateway.asp

    It did surprise me that you look my advice to call Microsoft, as many don't. They come to forums to try and save the fee. However what you went through with Microsoft would have taken hours to go through on a forum, and with the best will in the world, that isn't really something that should really be done for "free" on a forum.

    The Exchange logs are very sensitive, and you have to get everything just right for them to replay correctly. However it would appear that eventually you have everything in place. It isn't quick, but then you have a lot of logs to process. When you are doing things like offline defrag, the rough rule of thumb is at max 4gb per hour. If you have done 16% of 220gb in 10 hours, then you are going along at around 3.5gb an hour, so you aren't far off the maximum the database can process content at. I would leave it to run, you may well find you have most things back.

    Mailbox database corruption is quite rare and is often caused by external influences, such as faulty hardware. Therefore it is quite reasonable to expect that if there is corruption in the database, there could be problems elsewhere. The server that was the original home for Exchange should probably be checked carefully to see if there are problems being reported - particularly with the RAID card or hard disks. That could point to where a problem occurred.

    Simon.


    Simon Butler, Exchange MVP
    Blog | Exchange Resources
    Saturday, January 29, 2011 2:32 PM
  • At the point I was at, contacting Microsoft made the most sense.  Interestingly enough I was willing to pay for their assistance but I contacted the Microsoft licensing department and it appears that our licensing includes a TechNet Plus Direct benefit.  I'm a one-man operation and didn't have an opportunity to get through all the fine details of my Microsoft licensing; lesson learned.  I'm not sure exactly what that benefit includes, but that support call may be included at no cost, we'll have to see.

    I will be contacting my anti-spam service to see exactly how long they will queue our e-mails.  I will also look into the SMTP gateway idea as any loss of e-mail for any reason gets under my skin.  Even if I can completely restore my database with this latest recovery attempt, I still stand to lose some incoming e-mails.  Given the circumstances, that's a loss I'm happy to deal with, but if there are ways of even preventing that from occuring, I'm all over it.

    It also turns out our local mail archiving was misconfigured, but for the good.  Our company policy is to archive e-mails for 30 days.  While I thought I set it up to do so, I messed up, so I have all messages from last July through January 23rd archived.  There must have been some disruption between the archive server and the Exchange server the afternoon of Jan 23rd as no messages beyond then have been archived, but that will allow me to give everyone back messages through January 23rd, even if the recovery process fails and I have to use the November backup.

    I thank you for your assistance as you obviously know what you're talking about.  I will post the results of the recovery process.

    Saturday, January 29, 2011 4:25 PM
  • In looking at the process I've gone through so far, I am wondering why the recovery didn't work when I had the backed up database with the current logs.  The more I think about it, I am a little scared to throw away these logs to save space just in case something like this happens again.  On the other hand, having our small school generate over 30 GB of log files per month is not only going to eat up a ton of space, but if I have to rebuild it again, say a year from now, I'm looking to be down for well over a week since the rebuild process can only process 4GB/hr.

    Is the recovery process intended to take a backed up database copy and recover it by using the current logs?  If so, what is the process to do that and where did I mess up?  If I just missed a step here, I'm fine with that, but if I did things correct and it still failed, I'm a little concerned on how to prepare for the future.

    Any thoughts on this?

    Andy

    Saturday, January 29, 2011 7:16 PM
  • Normally.... when a full backup of a database is taken, the transaction logs already comimtted to the database are flushed automatically.

    Example...

    If you perform a full EDB backup on Sunday @ 12PM, then log files already committed to the EDB are flushed (deleted) from disk. If EDB file is somehow corrupted or the disk is lost on Tuesday @ 2PM, you restore the EDB file from the Sunday backup and replay the logs that happened since Sun @ 12PM through Tue @ 2PM and have essentially zero lost data.

    This is why if you only have one copy of data we suggest the EDB file and Transaction Log files are stored on different spindles. If the EDB drive is lost you can restore from backup and replay logs and not lose anything. If the Transaction Log drive is lost you only lose the data that was not yet committed to the EDB (which is usually minimal).

    I would suggest the following reading material.

    Understanding the Exchange 2010 Store: http://technet.microsoft.com/en-us/library/bb331958.aspx

    Understanding Backup, Restore, and Disaster Recovery in 2010: http://technet.microsoft.com/en-us/library/dd876874.aspx

     


    Microsoft Premier Field Engineer, Exchange
    MCSA 2000/2003, CCNA
    MCITP: Enterprise Messaging Administrator 2010
    Former Microsoft MVP, Exchange Server
    My posts are provided “AS IS” with no guarantees, no warranties, and they confer no rights.
    Saturday, January 29, 2011 7:31 PM
  • Well that's what I thought.  Hmmm, wonder why it didn't work in my case.  I guess the difference is that I didn't use the Windows Server Backup program for the backup; I just did a file copy when the virtual machine was powered off.  So when you use the Windows Server Backup program and the transaction logs are flushed, does the numbering restart?  If so, then that was my problem and it would make sense why it didn't work.

    Thank you for those links.  I have read most of the material during my times of waiting for files to copy the last few days.  It's unfortunately I didn't familiarize myself with this information prior to this incident happened.

    We're at around 22% completed...only time will tell.

    Thanks,

    Andy

    Saturday, January 29, 2011 7:46 PM
  • So when you use the Windows Server Backup program and the transaction logs are flushed, does the numbering restart?  If so, then that was my problem and it would make sense why it didn't work.

    Thank you for those links.  I have read most of the material during my times of waiting for files to copy the last few days.  It's unfortunately I didn't familiarize myself with this information prior to this incident happened.

    VSS backups are the only supported 2010 backups, so a file copy of a virtual machine or the EDB file itself wouldn't be considered a true backup in the sense of Exchange. Lesson learned, we all go through it at some point, no worries... you're on a good path to recovery now. :)

    The numbering doesn't restart, they just get flushed and new logs with higher #s keep getting generated. Let's say you do backups once a week. Then let's say you try to do a restore from a week ago and find out the tape with the EDB file is somehow bad and can't restore it, but you can restore the log files from that time period. Now you have to go back two weeks, restore the 2 week old database, and play 2 weeks worth of logs into it. If we reset the numbering after each full backup you wouldn't be able to do that kind of drastic measure.

    There are 68,719,476,735 log files that can be created for each log stream before a reset is needed. If you look at the log file names, they'll be in a format like E##, then numbers and letters with .LOG at the end. The E## is the log generation stream, each DB gets one so there will be log streams with E00, E01, E02, etc... when there are >1 DBs. The other numbers after E## are in hex form, so plug that into a hex to decimal and you'll see the # log file it actually is. If you ever run into a risk of running out where all the valuse are getting close to E##FFFFFFFFF.Log you'll get a warning in the event log and then there is a process you can use to reset the numbering back to zero and start over.


    Microsoft Premier Field Engineer, Exchange
    MCSA 2000/2003, CCNA
    MCITP: Enterprise Messaging Administrator 2010
    Former Microsoft MVP, Exchange Server
    My posts are provided “AS IS” with no guarantees, no warranties, and they confer no rights.
    Saturday, January 29, 2011 8:05 PM
  •  Another update.

    After waiting 78 hours for the repair to complete, it terminated with error -515 at about 96%.  I attempted to mount it in EMC and that failed.  I then ran eseutil /mh and it reported that it was in a clean shutdown state.  So I ran eseutil /p and the database was successfully repaired.  Then I ran eseutil /d and it was successfully defragmented.  However, it still would not mount in EMC.

    As I sat there contemplating any other options aside from having to go back to my November backup, I thought that the last 4 days was spent waiting for a new database to be created.  The database itself was reported in the proper state, there was just some errors with the transaction logs, and not many of them comparatively speaking.  So I deleted all the transaction logs (they were backed up, just in case) and attempted to mount the database...and it worked!  I logged into OWA and things appeared as they were on Wednesday.

    In reviewing the conversations on this thread:

    Another option would be to repair the Exchange database. That is documented in numerous places as it hasn't changed since Exchange 2000. Once the database is successfully repaired, I would create a new database and migrate all content to it. I don't like to leave a repaired database in production.

     How can I perform the migration properly?

    Once I get that done, I noticed that there is a Exchange 2010 SP1 available and it seems like I should upgrade to that, but I'm not sure when is a safe time to do that.  Any suggestions?  And of course, I will be creating a backup using Windows Server Backup to prevent this issue from happening again.  That will probably be my last step of those listed, unless others suggest another option.

    Thanks,

    Andy

     

    Tuesday, February 01, 2011 6:19 PM
  • If you have repaired a database, then you create a new database.
    Then use move mailbox to move the data to the new database. Once all mailboxes have been moved you can remove the database. At that point immediately restart the information store service so that any system mailboxes are recreated. You then have a clean database.

    I would install Exchange 2010 SP1 as soon as you confident the server and data is stable.

    Simon.


    Simon Butler, Exchange MVP
    Blog | Exchange Resources
    Tuesday, February 01, 2011 9:06 PM
  • Hi Andy,

    Any update for your issue?

    Regards!
    Gavin
    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
    Wednesday, February 02, 2011 9:00 AM
    Moderator
  • Yesterday I moved the data to a new database.  I also attempted to perform a backup using Windows Server Backup, but it stopped with an error.  I think it errored out because before the backup of the Exchange server was completed I began to run a backup on the Hyper-V server were the Exchange server was running and I'm thinking there may have been some conflicts with trying to backup a vhd while that server on the vhd itself was performing a backup.  I've tried it again this morning, but when I open the backup program it just says reading data; please wait.  I know this isn't the correct forum to post questions about it, but other threads that I've read just say give it time and it will complete. 

    Once I can get the backup completed, hopefully I will see the transaction logs created since yesterday reduce.  Then when I feel that is functioning as expected, I will upgrade to SP1.

    I'll report back in once I can get the backup to complete and hopefully the transaction logs will reduce in size.

    Thanks,

    Andy

    Wednesday, February 02, 2011 2:49 PM