Diagnosing Repeated CHKDSK Errors RRS feed

  • Question

  • Hello all!

    We have a variety of servers running Windows Server 2003 Standard SP2. On two -- and only two -- of them we are having a problem with chkdsk repeatedly finding errors with the file system.

    We have an automated script that runs a read-only chkdsk on each of our servers on a daily basis as a precautionary measure, due to file system corruption we experienced once in the past. On most of our servers, it routinely finds no problems. However, on these two, it repeatedly finds errors. When it does, we will take the server offline and run a full chkdsk allowing it to fix the errors. And then run another chkdsk afterward to verify that the errors are gone. And it always shows "clean." But then within a couple of days they come back.

    I have done extensive hardware diagnostics to verify that there are no failures happening with the system hardware that could be leading to these problems, and they all report clean. In addition, the fact that this is two separate servers leads me to think it unlikely that we have two simultaneous hardware failures on two separate servers that are going undetected.

    There is a commonality between these two servers however. They are both involved in a nightly data processing process that writes a great deal of data to both servers. That leads me to suspect that something is happening during this process that is causing corruption in the file system. But with just a generic "errors were found" output from chkdsk, it's hard to know where to start looking in tracking down the problem.

    Can anyone offer some advice on either how to pry more information out of chkdsk on the specific files, folders, etc. that are showing errors or inconsistencies? Or perhaps recommend a third-party tool that could provide more information on problems the file system has? (These are NTFS volumes, BTW, and we are not doing surface scans for bad sectors during this process.)

    Does anyone have any more general thoughts on possible causes for repeated errors such as these?

    Thanks so much!

    - Tom
    Wednesday, November 18, 2009 10:00 PM


All replies

  • Additional Information:

    I dug up the following Event Viewer log from the last time we actually took the server down and did a full chkdsk /f on it. Does this information shed any light on what might be the source of the problems? The hexadecimal file identifiers doesn't really help me much in determining where the problem lies.


    Event Type:    Information
    Event Source:    Winlogon
    Event Category:    None
    Event ID:    1001
    Date:        10/26/2009
    Time:        11:49:56 PM
    User:        N/A
    Computer:    STEWIE
    Checking file system on C:
    The type of the file system is NTFS.

    A disk check has been scheduled.
    Windows will now check the disk.                        
    Cleaning up instance tags for file 0xda175.
    Cleaning up instance tags for file 0x108d06.
    Cleaning up instance tags for file 0x17b0a4.
    Cleaning up instance tags for file 0x17d3b1.
    Cleaning up minor inconsistencies on the drive.
    Cleaning up 20 unused index entries from index $SII of file 0x9.
    Cleaning up 20 unused index entries from index $SDH of file 0x9.
    Cleaning up 20 unused security descriptors.
    CHKDSK is verifying Usn Journal...
    Usn Journal verification completed.
    CHKDSK discovered free space marked as allocated in the
    master file table (MFT) bitmap.
    Windows has made corrections to the file system.

     104856223 KB total disk space.
      89527512 KB in 1636225 files.
        642248 KB in 51235 indexes.
             0 KB in bad sectors.
       1797055 KB in use by the system.
         65536 KB occupied by the log file.
      12889408 KB available on disk.

          4096 bytes in each allocation unit.
      26214055 total allocation units on disk.
       3222352 allocation units available on disk.

    Internal Info:
    e0 d3 19 00 b0 bf 19 00 81 c0 28 00 00 00 00 00  ..........(.....
    c9 01 00 00 02 00 00 00 5e 03 00 00 00 00 00 00  ........^.......
    d6 59 78 7f 00 00 00 00 a6 8c a1 a0 03 00 00 00  .Yx.............
    9c 4b ab 76 00 00 00 00 00 00 00 00 00 00 00 00  .K.v............
    00 00 00 00 00 00 00 00 fa 41 19 a1 04 00 00 00  .........A......
    10 b0 0f 8f 00 00 00 00 3d 9f 82 7c 18 00 00 00  ........=..|....
    81 f7 18 00 00 00 00 00 00 60 53 58 15 00 00 00  .........`SX....

    Windows has finished checking your disk.
    Please wait while your computer restarts.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Wednesday, November 18, 2009 10:38 PM
  • Check this out:

    Leonardo Fagundes
    Thursday, November 19, 2009 5:42 PM
  • Thanks for the pointer!

    However, in this case, we are not seeing the specific errors referenced in that KB article. Plus, we have run chkdsk /f repairs repeatedly on the two servers and that seems to repair it for a time, but then the issue comes back.
    Thursday, November 19, 2009 7:21 PM
    • Proposed as answer by David Shen Friday, November 20, 2009 1:50 PM
    • Marked as answer by David Shen Sunday, November 22, 2009 5:39 AM
    Friday, November 20, 2009 1:42 PM
  • Thanks for that KB article pointer as well!

    In this particular case, however, I'm not certain that issue applies. First, there are two servers involved, and one of them definitely does not fit the conditions outlined in the KB article. It does not have a MFT that is greater than 4 GB nor does it have more than ~4 million files. Second, the errors we are receiving also include some additional errors beyond the ones covered by that KB article. For example, the "cleaning up instance tags for file xxx" errors are not addressed.

    Do you still feel that particular issue could apply and that I should apply the hotfix?

    Thanks again!

    - Tom
    Wednesday, November 25, 2009 3:01 PM
  • I am not sure that it applies or not, and I could not find any other KB articles or reference for the errors you are showing.

    The way I see it, you have two options left:
    1) open a case with MS, and work on this
    2) Make a system-state backup, check backup, and apply this hotfix

    Eitherway, I am not sure if I can help you any futher than this hotfix

    Thursday, November 26, 2009 8:05 PM