locked
A disk read error occurred - Musings about an Achilles heal RRS feed

  • General discussion

  • Well, my laptop died the other day with:-

    A Disk read error occurred

    Press Ctrl+Alt+Del to restart

    So, I dived into the ether to find the solution.  I was amazed at just how many people had this problem and the huge number of solutions, partial solutions and red herrings over this topic.  I tried all the usual 'fixes'; running FIXBOOT, running this, running that, mucking around with this fiddling with that.  All to no avail.

    What did seem to work was reinstalling the Operating System from scratch.  But I didnt want that.  I'd spent hours configuring up my desktop as i liked it and installing tools and handy applications and really didnt want to spend loads of time again setting up everything.  So, I decided to delve deeper into issue and try and find out what the root problem was.  This is a blogette of what I did and my findngs....

     

    My first port of call was to find out where the error message was coming from.  Bios?  OS? What?  To do this I had to find out just how a modern PC boots up - how does a computer start.  On startup, after doing its self-tests and looking in the Bios settings to see what disk to boot on, the Bios loads the first 512 bytes from the boot disk (located at Cylinder 0, Head 0, Sector 1).  Once loaded into its RAM memory it starts executing the bytes as machine code.

    This machine code has been written by Microsoft and is put on the boot disk as part of installing an operating system (XP/V7/W7/etc...).  It is VERY basic and simple as, at this point, just about nothing is known about the computer and 512 bytes is not much memory to play with.  Looking at this code I spotted the following texts:- "A disk read error has occured", "NTLDR is missing" and "Press Ctrl+Alt+Del to restart"!!!  So the error text comes from the very first code loaded from the disk!

    This answered the first question; Can the machine see the boot Hard Drive ok - yes it can, as it loaded data from the disk. Otherwise it wouldn't have known how to display the error text.

    So, next step was to find out what that code actual does and what could cause the "A disk read error has occured" text to be displayed.

    So, what does it do...

    First this this code does is ask the Bios what the parameters of the disk are.  The Bios replies, telling the machine code how many drives it thinks it has and details of the disk being booted onto (Number of heads, number of cylinders and number of sectors per track).  The Bios could reply with an error, in which case the machine code makes up the details (It decides the disk has maximum number of heads, cylinders and sectors per track).

    Using the drive parameters, the machine code calculates the hard drive's total number of sectors.

    Using this count, the machne code decides if the active partition start is before or after the drives 8GB boundary (How does it do this?  It only knows the total number of sectors at this point - how does it know where the Active Partition start is? What are 'Hidden sectors'?).

    Assuming the partition start is before the 8GB boundary, the code then reads 16 sectors (8KB), starting from the first sector after the hidden sectors (I assume!), into memory.  It asks the Bios to do this for each sector, telling it which sector to read and where to put it in memory.  Once it has done this it carries on running that machine code which I assume searches for and loads the NTLDR file from the disk which, in turn, starts to load the operating system etc.

    Every time it asks the Bios to load the next sector of the 16 sectors it loads, it checks to see if the Bios reported an error.  If it did then the "A Disk read error occurred Press Ctrl+Alt+Del to restart" error is shown.  The problem is, that the error could have been for a number of reasons.  The Bios does tell the machine code what the error was but this is ignored and the ubiquitous 'read error' is shown. this expains why different people find different solutions, it is because the "A Disk read error occurred" could have happened for different reasons.  There may be a faulty IDE cable which then means the address of the head/cylinder/track is corrupted; or one of the sectors in the 16 sectors loaded has gone bad etc....

    So, now the questions - for those with bigger brains than myself - that this analysis raises;

    1.  The boot sector code loads the extended 16-sector boot code from the first sector AFTER the 'Hidden Sectors'.  What are these 'Hidden sectors'?  Why does the Bios (Int 13h function 08h) not include them in the Drive Parameters it returns?

    2. The big question... In my case I believe one of the 16 boot sectors on the drive is bad and the disk is not storing the data correctly - which is why simply re-writing the boot data doesnt work.  So how do I fix it?  Is it possible or, as the subject header states, is this an Achillies heal of the NTFS boot system in that if one of the boot sectors on the disk goes bad it cannot be fixed?

     

     

    Thursday, October 13, 2011 10:22 PM

All replies

  • Now things get more myserious still....

    On the drive in question I took a binary image - using ImageXML on ubcd4win and then completely re-installed XP.  Doing a non-quick format - IE. Starting from scratch.  All good, XP installed great and worked nicely.

    I then took another binary image so that I could compare the first 16 sectors - aiming to see where the difference lay and perhaps hacking at a binary level to not use the defective sector (the $Boot only uses the the first 7 sectors and I could - in theory relocate the code into good sectors and skip the bad;  just a bit of fun...).

    But - the two images were identical!!!  The same.  Not a bit different.  No 0 in one where there is a 1 in the other!!  The only differences were the 4-byte disk signature (in Sector 0) and the NTFS volume serial number (boot sector offset 48h).

    How confused am I now!!  If the bootstrapping code (first 16 sectors) is the same - why does one have the BIOS 13h return an error when it tries to load one of the 16 sectors and the other doesn't?  Obviously, I am missing something here or have not understood something correctly but am stumped about what! :-)

    If anyone can hold my hand and explain what I'm missing here a great many little grey cells - which are currently running around and bumping into each other in a very odd way - will be eternally grateful... lol

    Thursday, October 13, 2011 11:53 PM
  • While digging further I've realised a fundamental assumption I made is that the error text must be shown by the code that loads the extended 16 sectors.  Wrong!  The loaded 16 sectors includes the original single boot sector and that code in the next 6 sectors loaded can also call the error text display code - so the error may well NOT be caused by the code that loads the 16 sectors using the BIOS Int13h function!  And, as the two images are the same it is probably being caused by code executing in the extended sectors. Ahhhhh!  More digging needed....

    Friday, October 14, 2011 3:58 AM