none
EDB Corruption Errors 228, 233, 234, 530 RRS feed

  • Question

  • Just recently we had an incident in which a couple of switches were rebooted unintentionally which caused a backup process (Veeam) that utilizes VM snapshots (on VMware) to go haywire. I thought that it was just the backup process that was corrupted but then suddenly, I started getting errors every 5 minutes after the last snapshot was removed that primarily log event ID 233 but also 234, 228, and 530. Below are what they state:

    228: At '5/15/2015 11:42:26 AM', the copy of database 'DB' on this server encountered an error that couldn't be automatically repaired without running as a passive copy and failover was attempted. The error returned was "There is only one copy of this mailbox database (DB). Automatic recovery is not available.". For more information about the failure, consult the Event log on the server for "ExchangeStoreDb" events.

    233: At '5/15/2015 11:46:03 AM', database copy 'DB' on this server encountered an error. For more information, consult the Event log for "ExchangeStoreDb" or "MSExchangeRepl" events.

    234: At '5/15/2015 11:42:26 AM', the copy of database 'DB' on this server encountered a serious I/O error that may have affected all copies of the database. For information about the failure, consult the Event log on the server for "ExchangeStoreDb" or "MSExchangeRepl" events. All data should be immediately moved out of this database into a new database.

    530: Information Store (3468) DB: The database page read from the file "F:\DB\Database\DB.edb" at offset 238081900544 (0x000000376ec98000) (database page 7265682 (0x6EDD92)) for 32768 (0x00008000) bytes failed verification due to a lost flush detection timestamp mismatch. The read operation will fail with error -1119 (0xfffffba1).  If this condition persists, restore the database from a previous backup.  This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

    So I figured, well, I can just create a new database and migrate the mailboxes over but many of them fail to migrate. For those mailboxes, I've tried to run the PowerShell command to repair them (New-MailboxRepairRequest) but that fails too with the following error:

    10049: Online integrity check for request 0ab17d2b-bd15-4161-b4df-0dfcfd16c4d6 failed with error -1119.

    The export to PST file fails as well and users report that archiving through Outlook fails once it reaches a corrupted folder.

    I thought this was only happening for one of the databases so we figured we'd migrate as many as we could to a new drive and then announce data loss for the rest. Right now, we're copying the last good backup of the edb and the log files to the drive to mount in the old one's place in hopes that we can get away from the errors. Unfortunately, due to drive constraints, we were forced to enable circular logging on this database but we're okay with the one-two days of data loss for that particular database. The disturbing part is that once we dismounted the corrupted database, we started receiving the same errors for two other databases... Fortunately, at least those aren't nearly as big and they do not have circular logging enabled so we might be able to do a full restore assuming that the log files are not corrupted. However, I am worried that there is a bigger problem such as drive failure.

    I am wondering if anyone can offer some advice for this scenario and I wanted to make sure that I am going down the right path of simply running restore process for each DB that gets this error until we can move everything to new storage. I am on Exchange 2010 SP1 and we have been working hard over the last few months to get our environment ready for 2013 (we purchased new storage for that deployment). Sorry for the lengthy post and please let me know if you need any further info for me.

    Thank you in advance for your time!


    • Edited by Scott_42 Friday, May 15, 2015 7:49 PM
    Friday, May 15, 2015 7:48 PM

Answers

  • Yes definitely sounds like a disk related issue and could be the actual disk, or even the Ethernet controller, driver, firmware or perhaps even a latency issue caused by the switch although if you moved it to another drive and it resolved without changing anything else then definitely something on that disk LUN

    Glad you were able to resolve by moving it to another  drive. 

    let me know if you discover the cause as its always good to have that knowledge for the future

     

    Search, Recover, & Extract Mailboxes, Folders, & Email Items from Offline Exchange Mailbox and Public Folder EDB's and Live Exchange Servers or Import/Migrate direct from Offline EDB to Any Production Exchange Server, even cross version i.e. 2003 --> 2007 --> 2010 --> 2013 with Lucid8's DigiScope

    Monday, May 18, 2015 3:17 PM

All replies

  • UPDATE:

    We restored the last backup we had of the EDB for the one database in question. It replayed the few log files that we had, threw a BUNCH of errors (mostly 228 but also 203 and 474) and then all of a sudden, everything went back to normal (no errors, no corrupted mailboxes, I am even able to migrate the same ones that failed earlier). It's been almost an hour since we mounted the EDB from backup and the errors for the other EDB's that were reported to be corrupted have also ceased. It's almost as if putting that EDB from backup back in place put the drive back into consistency and/or put all of Exchange back into consistency. I'll wait to celebrate until the 1AM maintenance cycle runs and see if it kicks more errors but if anyone cares to elaborate or explain, that would be helpful. I am obviously not an Exchange expert nor a storage expert so I am only making educated guesses at this point. Otherwise, if this remains stable, then perhaps we've bought ourselves enough time to finish our Exchange 2013 deployment...



    • Edited by Scott_42 Saturday, May 16, 2015 1:13 AM
    Saturday, May 16, 2015 1:10 AM
  • So very odd and I ran into this same issue with a client the other day however they were using Exchange 2010 and in summary here was what took place

    1. Customer had two Exchange servers with numerous DB's and originally all DB's were on local storage, however they were running low on disk space

    2. Since they had an iSCSI array with lots of storage they created new luns on the array for each box, and the moved the EDBs and LOGS to the array on Saturday

    3. All went well until on Sunday night the stores started going down

    4. They mucked with it until Monday when I finally got involved and upon examination there were numerous very serious errors all pointing at the iSCSI array.

    5. We finally discovered that they were sharing the same IQN between both servers which means they were stepping all over themselves i.e. both machines were trying to read/write from the same LUN which was causing all the corruption

    6. Cutting to the chase after shutting down all the DB.s making new luns and copying data off to a private LUN on one of the servers and letting the 2nd one retain ownership of the original lun we had 2 of the databases that started squawking  about "the copy of database 'DB' on this server encountered an error that couldn't be automatically repaired without running as a passive copy and fail over was attempted" and the weird part is that these were just stand alone DB's i.e. not part of a DAG and never had been

    7. We tried sames things as you did to no avail.  Finally the db's cratered, however luckily we had backups of the damaged DB's so we dial-toned the production DB's to start fresh and then used our DigiScope tool to open the copy of the damaged DB via our forensic mount option and restored all the data to the new clean stores.

    So with that said if the situation has any similar threads you should at least make offline copies of the DB's that are squawking in case you have to do a recovery and also check your system event logs to ensure that the IP flip issue you discovered did not cause any disk related errors


    Search, Recover, & Extract Mailboxes, Folders, & Email Items from Offline Exchange Mailbox and Public Folder EDB's and Live Exchange Servers or Import/Migrate direct from Offline EDB to Any Production Exchange Server, even cross version i.e. 2003 --> 2007 --> 2010 --> 2013 with Lucid8's DigiScope

    Saturday, May 16, 2015 1:37 AM
  • Yes, we have a very similar setup indeed. Luckily the system log looks clean so far but I am still suspicious of our SCSI drives. At least it looks like things are stable for the moment but I'll keep monitoring the application log and we are making it a priority to get a new Exchange 2013 deployment completed and get all of the mailboxes migrated as soon as possible. Otherwise, we may consider at least migrating the mailboxes in that database to a new one for the time being. I will also take your advice and try to get a copy of the EDB. I am very curious how your client managed to get duplicate IQN's though.
    Saturday, May 16, 2015 2:23 AM
  • Yes I would make a copy of any EDBs acting odd or giving errors since the customer ignored that advice at first because they were more concerned about keeping users up and they lost one of their DB's entirely so then they listened and we fixed the rest up.

    So the IQN issue was a purposeful mistake, i.e. the storage guy thought it was fine to share IQN's because he had read an article showing him how to do this for exchange.  Well when I asked him what article it turns out it was in relation to CLUSTERED servers which is true ( but we were not using clusters)  and you can do that for the quorum drive because even though multiple servers have access its setup so that no more then one server will talk to that LUN, i.e. the passive nodes just wait to take over if the Active node fails.  Anyway in the end all of it was caused by a mistake...


    Search, Recover, & Extract Mailboxes, Folders, & Email Items from Offline Exchange Mailbox and Public Folder EDB's and Live Exchange Servers or Import/Migrate direct from Offline EDB to Any Production Exchange Server, even cross version i.e. 2003 --> 2007 --> 2010 --> 2013 with Lucid8's DigiScope

    Saturday, May 16, 2015 3:46 AM
  • Well, we are certainly not ignoring that advice... In cases like this I'd actually prefer some downtime to make sure that no new data gets lost on top of everything. I like that term "purposeful mistake" :). I guess I can see how that could happen with the IQN, especially if you haven't done a lot of Windows clustering before.

    Well, as an update to my issue, I found some evidence that points to the drives. Although I still can't find anything specific in any of the event logs for the LUN, I had a mailbox fail to migrate on one of my EDB's. Since it was one of our smaller EDB's I decided to try migrating it to a different drive and after that move, the mailbox migration succeeded. I can only assume that parts of the disk are becoming corrupted or Exchange is having a hard time reading its own EDB when it sits on certain drive sectors.

    Monday, May 18, 2015 2:59 PM
  • Yes definitely sounds like a disk related issue and could be the actual disk, or even the Ethernet controller, driver, firmware or perhaps even a latency issue caused by the switch although if you moved it to another drive and it resolved without changing anything else then definitely something on that disk LUN

    Glad you were able to resolve by moving it to another  drive. 

    let me know if you discover the cause as its always good to have that knowledge for the future

     

    Search, Recover, & Extract Mailboxes, Folders, & Email Items from Offline Exchange Mailbox and Public Folder EDB's and Live Exchange Servers or Import/Migrate direct from Offline EDB to Any Production Exchange Server, even cross version i.e. 2003 --> 2007 --> 2010 --> 2013 with Lucid8's DigiScope

    Monday, May 18, 2015 3:17 PM