none
Bogus ESE 474 -1018 Errors on Exchange 2010 SP3ru4 running on HyperV RRS feed

  • Question

  • Got a one month old fully patched virtual Server 2008R2+Exchange 2010 SP3ru4 install (with 8 VCPUs and 32GB) on a server 2012 HyperV host system.  It was migrated from Exchange 2003, everything is running well and backups are working fine.

    Last week I got the dreaded Exchange ESE Event ID 474 Database Page Cache event on my MailBox Database (The read operation will fail with error -1018) at 1:32 AM Feb 1.

    I followed the steps in KB314917 and took the database offline and used eseutil to run some readonly checks.  My original ESE error was "Information Store (8996) Mailbox Database 2013122812: The database page read from the file "G:\Program Files\Microsoft\Exchange Server\V14\Mailbox\Mailbox Database 2013122812\Mailbox Database 2013122812.edb" at offset 4978507776 (0x0000000128be0000) (database page 151931 (0x2517B)) for 32768 (0x00008000) bytes failed verification due to a page checksum mismatch.  The expected checksum was [2d3ed2c1ba3c60c3:3edac1250af8e68a:6c9e6c9e11033ced:770988f676e6ddd5] and the actual checksum was [2d26d2d94790c331:3edac1250af8e68a:57165716d727a497:770988f676e6ddd5].  The read operation will fail with error -1018 (0xfffffc06).  If this condition persists then please restore the database from a previous backup.  This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.".  

    This indicates the issue was with page 151931 so I did a eseutil /M "Mailbox Database 2013122812.edb" /p151931 and dumped that page but it came back correct - all checksums were ok.  Finally I checked the entire database with a eseutil /K "Mailbox Database 2013122812.edb"  and a eseutil /G "Mailbox Database 2013122812.edb" and the whole thing came back clean in both cases which implies this is not a real error.  I've had two more ESE 474 errors since then and both time the bad pages came back clean - no issues in the database (the exchange server and my backups continue to work fine).

    It appears this is a bogus/false error, any ideas why?  As I'm under HyperV I'm not sure who to contact as my "hardware vendor" - the database resides on a fixed hyperV VHDX disk on a LSI SAS RAID6 array which reports no errors and we had no power glitches, new updates, or anything that can explain it. I had a look through event logs on the hyperV host as well and saw nothing of interest.

    I am running ESET antivirus for exchange but it has exclusions for the database but I turned off the realtime component after the 2nd error and still got the 3rd error so it's not that.

    Looking through the logs around the time of the initial problem the only thing out of the ordinary is the system had cleaned up 10 deleted mailboxes the night before (which was not unexpected) and there were a bunch of 9863 events "An invalid event history watermark has been detected by background cleanup. The watermark will be deleted." in the logs around the time of the cleanup (which should be harmless?).  

    Not sure what to do next.  Ideas?

    Thanks.


    -- Al

    Friday, February 7, 2014 8:35 PM

All replies

  • In my experience, a -1018 error is never false and Exchange is being truthful that something at the storage level failed.  If you cant identify the storage issues readily, be prepared to create a new database and moving mailboxes if this happens again and ensure you have good backups.

    I would ensure that all the AntiVirus exclusions are set:

    http://technet.microsoft.com/en-us/library/bb332342(v=exchg.141).aspx

    more info

    http://support.microsoft.com/kb/314917

    Understanding and analyzing -1018, -1019, and -1022 Exchange database errors


    Twitter!: Please Note: My Posts are provided “AS IS” without warranty of any kind, either expressed or implied.


    Friday, February 7, 2014 8:55 PM
    Moderator
  • I believe I already addressed both the links you provided.  I verified AV exclusions and even turned off realtime scanning for a bit.  As I noted in my post I used KB314917 and fully checked the database and found no errors.  As ESE 474 -1018 events are supposed to be unrecoverable read errors I don't see how it is possible that all the errors corrected themselves unless the errors themselves are not real.  Other operations such as backups (both full & incremental) are running without issues.  It is only these strange ESE events in the middle of the night that have me stumped (all exchange overnight maintenance tasks complete normally BTW).

    As I also stated I don't see any storage issues and as I am running as a VM under hyperV I'm not sure where to go anyway regarding "disk" issues as it is virtualized.  My HyperV host shows no issues on its RAID arrays.

    Thanks.


    -- Al

    Friday, February 7, 2014 9:53 PM
  • You may want to consider opening a case with Microsoft to get to the bottom of this. They can look at all the dumps deeper and maybe point you in the right direction. As I mentioned, I rarely see 1018s anymore, but when I have seen them in the past, it was always a valid indicator of a problem somewhere.


    Twitter!: Please Note: My Posts are provided “AS IS” without warranty of any kind, either expressed or implied.

    Friday, February 7, 2014 10:11 PM
    Moderator
  • I don't even know how to open a case, just an end user here.

    My backup program lets me make a "sandbox restore" clone of my VM.  I going to try an offline defrag (after rechecking the DB and making a backup) on a clone and see if that helps.  I think that will clear the error counters anyway and maybe eliminate the nightly ExchangeStoreDB 233 errors which I think are keyed off those counters?

    I will also do a full memory test on my hyperV server but you'd think I'd have bigger issues if my ECC RAM was flakey.

    Not sure what else to do.

    Thanks.


    -- Al

    Tuesday, February 11, 2014 5:57 PM