none
Windows 10 Pro BSOD - Need Help Troubleshooting RRS feed

  • Question

  • I've had this system about 4.5 months and everything was running fine until about a month ago when I started getting frequent WHEA Uncorrectable Errors and BSODs.  I had been running the system with a mild overclock (MSI GameBoost) since the beginning, and for the first 3 months it was entirely stable.  I don't believe it is the root of the problem.  Believing the cause to be a OS issue, I restored a backup of the OS drive I made when I first received the system and then reinstalled my apps. Again everything was fine until yesterday when I received multiple crashes within a 24-hour period.  There is no consistency as to what programs are running when the crash occurs.  I am really concerned that this may be a hardware issue--how do I figure that out?  Please help. 

    Here is a DropBox link to the log files:

    https://www.dropbox.com/sh/8xs0wjq3v87ks8s/AADYfv5JXolBZqjdIYLi-s8la?dl=0

    Friday, October 11, 2019 5:28 PM

Answers

  • The logs displayed corrected and uncorrected hardware errors.

    Overclocking often creates an unstable computer environment.

    This includes unexpected shutdowns and restarts.

    The BSOD were WHEA 0x124 with reported overclocking.

    Perform the following steps:

    1) Open administrative command prompt and type or copy and paste:
    2) sfc /scannow
    3) dism /online /cleanup-image /scanhealth
    4) dism /online /cleanup-image /restorehealth
    5) sfc /scannow
    6) chkdsk /scan
    7) wmic recoveros set autoreboot = false
    8) wmic recoveros set DebugInfoType = 7

    9) When these have completed > right click on the top bar or title bar of the administrative command prompt box > left click on edit then select all > right click on the top bar again > left click on edit then copy > paste into the thread

    10) Make sure that there is no over clocking while troubleshooting.

    11) Sometimes there are problems in the bios that produce BSOD. 

    12) To ensure that there are no improper bios settings please reset the bios.

    How to Clear Your Computers CMOS to Reset BIOS Settings:
    https://www.howtogeek.com/131623/how-to-clear-your-computers-cmos-to-reset-bios-settings/
    3 Ways to Reset Your BIOS - wikiHow:
    http://www.wikihow.com/Reset-Your-BIOS


    13) In the left lower corner search type: system or system control > open system control panel > on the left pane click advanced system settings 

    a) > on the advanced tab under startup and recovery > click settings > post an image of the startup and recovery window into the thread

    b) > on the advanced tab under performance > click on settings > on the performance options window > click on the advanced tab > under virtual memory > click on change > post an image of the virtual memory window into the thread

    14) Choose one of the temperature monitoring software applications: Speecy, HW monitor, Speed fan:
    Speccy - Free Download - Piriform: Speccy - System Information - Free 
    https://www.piriform.com/speccy
    https://www.ccleaner.com/speccy
    HWMONITOR | Softwares | CPUID: HWMONITOR | Softwares | CPUID
    http://www.cpuid.com/softwares/hwmonitor.html
    SpeedFan - Access temperature sensor in your computer: SpeedFan - Access temperature sensor in your computer
    http://www.almico.com/speedfan.php


    15) Run Prime95 - Stress Test Your CPU
    Prime95 - Stress Test Your CPU Windows 10 Performance Maintenance Tutorials
    https://www.tenforums.com/tutorials/16474-prime95-stress-test-your-cpu.html
    a) Record the maximum temperature and post the maximum temperature into the thread
    b) Record test duration and post the uninterrupted test duration into the thread
    c) Aim for testing > 3 hrs and abort testing as needed for freezing, temperature changes (see link)
    d) Use the Microsoft snipping tool to post images into the thread.
    Take Screenshot in Windows 10 Windows 10 General Tips Tutorials 
    https://www.tenforums.com/tutorials/6102-take-screenshot-windows-10-a.html

    16) Run HD Tune (free version) (all drives)
    https://www.hdtune.com/
    Post images into the thread for results on these tabs:
    a) Health
    b) Benchmark
    c) Full error scan


    17) Run Sea Tools for Windows
    long generic test
    Post an image of the test result into the thread
    http://www.seagate.com/support/downloads/seatools/seatools-win-master/
    http://knowledge.seagate.com/articles/en_US/FAQ/202435en

    18) In the left lower corner search type:  dxdiag > When the DirectX Diagnostic Tool opens click on the next page button so that each tab is opened > click on save all information > save to desktop > post one drive or drop box share link into the thread

    19) Wait one week before running Prime 95 so that you have time to evaluate computer stability / instability with no overclocking.  If there are no BSOD that will conclude the troubleshooting.  If there are BSOD with WHEA 0x124 after settings are returned to stock then test the CPU with Prime95 and IPDT.

    https://downloadcenter.intel.com/download/19792/Intel-Processor-Diagnostic-Tool

    .

    .

    .


    .
    .
    .
    .
    .

    Please remember to vote and to mark the replies as answers if they help.
    .
    .
    .
    .
    .

    • Marked as answer by brumma Saturday, October 19, 2019 7:24 PM
    Saturday, October 12, 2019 12:26 AM
  • Confirm that this is the link for the CPU > temp 100

    https://www.intel.com/content/www/us/en/products/processors/core/i9-processors/i9-9900.html

    Often software testing of hardware does not find faulty hardware and other types of testing are needed such as swaps.

    Download and install:  IPDT

    https://downloadcenter.intel.com/download/19792/Intel-Processor-Diagnostic-Tool

    Post images and results for all steps / tests using share links (one drive, drop box, or google drive)



    Monday, October 14, 2019 5:11 AM
  • "Why did the system allow me to overclock for 3 months before problems developed.  Should I still be concerned that my processor is damaged somehow?"

    When you overclock a processor the amount of heat it generates also increases considerably. It is possible that over the three months that it was overclocked it slowly degraded the fan's or heatsink's ability to dissipate heat from the system appropriately. It may be a good idea to add some thermal paste to the bottom of the heatsink and check/clean the fans. 

    The tests you've done seem to show a healthy processor now. Did you adjust voltage settings on the bios when you overclocked? That may be the root of the issue here.

    "The temperatures seemed rather high during the Prime95 test.  They usually hovered around 60-65c for the duration of the test, but got as high as 90c--Is this an issue or normal for this processor?"

    60°C-65°C is normal for that CPU. The 90°C hits were probably just part of the stress test. 

    -Aaron

    • Marked as answer by brumma Saturday, October 19, 2019 7:24 PM
    Tuesday, October 15, 2019 4:25 PM
  • First scannow: Windows Resource Protection found corrupt files but was unable to fix some of them.

    Second scannow: Windows Resource Protection found corrupt files and successfully repaired them.

    .

    .

    On the ST2000 drive:

    Run Sea Tools for Windows
    long generic test
    Post an image of the test result into the thread
    http://www.seagate.com/support/downloads/seatools/seatools-win-master/
    http://knowledge.seagate.com/articles/en_US/FAQ/202435en

    .

    .

    On the Samsung drives see if there are tests supported for the drives:

    https://www.samsung.com/semiconductor/minisite/ssd/product/consumer/magician/

    .

    .

    Run HD Sentinel:  (free or trial edition)
    Hard Disk Sentinel - HDD health and temperature monitoring
    https://www.hdsentinel.com/
    Post images of each of these tabs into the thread:
     Overview tab
     Temperature
     SMART

    .

    .

    In one week run the administrative command prompt command to collect new log files and post share link with dxdiag.

    New computers when not overclocked should not have corrected and /or uncorrected hardware errors (WHEA).

    It these errors occur with components under warranty then consider Return Merchandise Authorization (RMA).

    .

    .

    In the left lower corner search type: reliability > open reliability control panel

    https://www.howtogeek.com/166911/reliability-monitor-is-the-best-windows-troubleshooting-tool-you-arent-using/


    Wednesday, October 16, 2019 1:45 AM
  • In one week run the administrative command prompt command and post a share link with log files into the newest post.

    Manually collect dxdiag and include in the share link.

    The reliability control panel displayed red circles and yellow triangles.

    Click on these columns to viewed what had happened.

    Please make sure that there is no overclocking for the week so that the logs can report computer stability instability while the computer is set to stock.

    Cooling and fans should be checked and cleaned as needed.

    The logs had displayed corrected and uncorrected hardware errors.

    New computers should not have WHEA errors.

    If the logs display corrected or uncorrected hardware errors when there is no overclock then contact the computer or motherboard manufacturer or computer store for a Return Merchandise Authorization (RMA).

    During this one week period you will check/ monitor the reliability control panel and at the end of the 7 day period the logs will be scanned. 


    .
    .
    .
    .

    Please remember to vote and to mark the replies as answers if they help.
    .
    .
    .
    .
    .
    Friday, October 18, 2019 4:50 AM

All replies

  • Looks like you checked disc health and even downloaded and ran MalwareBytes? I'm assuming you saw nothing out of the ordinary in both cases. 

    Event[5428]:

    Log Name: System Source: Microsoft-Windows-WHEA-Logger Date: 2019-10-04T15:45:41.489 Event ID: 19 Task: N/A Level: Warning Opcode: Info Keyword: N/A User: S-1-5-19 User Name: NT AUTHORITY\LOCAL SERVICE Computer: Brumma-NZXT Description: A corrected hardware error has occurred. Reported by component: Processor Core Error Source: Corrected Machine Check Error Type: Cache Hierarchy Error Processor APIC ID: 2

    There are 8 of these errors in your eventlog. The first was on October 4th, 2019 and a few more happened a few days apart until October 10th, 2019 when you had it happen four times in a row. Under some of these I also noticed some logs talking about errors in time keeping, has your computer also been having issues with keeping the correct time?

    I don't have time to look at the dump files right now but from the event logs it seems to be a hardware problem to me. I'd recommend updating drivers on all hardware components, updating the motherboard bios, and installing all cumulative windows updates.

    -Aaron


    • Edited by Aaron-PS Friday, October 11, 2019 6:24 PM Added more info
    Friday, October 11, 2019 6:17 PM
  • Thanks, Aaron. I tried all of that before the fresh install a week ago. All to no avail. BIOS and drivers are current, but I held off on Windows updates (except for Security Intelligence Updates) for the past week in fear that they the cause of the problem. is there a way to determine if it is indeed a component problem rather than a software/driver issue?

    Thanks,

    Chris

    Friday, October 11, 2019 7:22 PM
  • You can try downloading and using the Processor Diagnostics Tool:  https://downloadcenter.intel.com/product/186605/Intel-Core-i9-9900K-Processor-16M-Cache-up-to-5-00-GHz-

    I'd also use the built in Windows Memory Diagnostic Tool to verify that it isn't a RAM issue. 

    Nowadays, especially in Windows 10, a BSoD is indicative of hardware failure and is usually only a software problem when it comes to drivers. If all drivers are up to date and are all verified and signed it probably is an actual hardware component failing.

    -Aaron

    Friday, October 11, 2019 8:08 PM
  • Hi brumma.

    Both the dump files have recorded a 0x00000124 (WHEA_UNCORRECTABLE_ERROR) bug check, which indicates that a fatal hardware error has occurred.
    In both the cases, the value of the first parameter indicates that a machine check exception occurred, which is a problem affecting the CPU: I'd recommend you to turn off any overclock and to execute diagnostic tests on the CPU, since the memory dump files have confirmed that the issue occurred on the CPU.

    Bye.


    Luigi Bruno

    MCP, MOS, MTA, MCTS, MCSA, MCSE

    Friday, October 11, 2019 10:19 PM
  • The logs displayed corrected and uncorrected hardware errors.

    Overclocking often creates an unstable computer environment.

    This includes unexpected shutdowns and restarts.

    The BSOD were WHEA 0x124 with reported overclocking.

    Perform the following steps:

    1) Open administrative command prompt and type or copy and paste:
    2) sfc /scannow
    3) dism /online /cleanup-image /scanhealth
    4) dism /online /cleanup-image /restorehealth
    5) sfc /scannow
    6) chkdsk /scan
    7) wmic recoveros set autoreboot = false
    8) wmic recoveros set DebugInfoType = 7

    9) When these have completed > right click on the top bar or title bar of the administrative command prompt box > left click on edit then select all > right click on the top bar again > left click on edit then copy > paste into the thread

    10) Make sure that there is no over clocking while troubleshooting.

    11) Sometimes there are problems in the bios that produce BSOD. 

    12) To ensure that there are no improper bios settings please reset the bios.

    How to Clear Your Computers CMOS to Reset BIOS Settings:
    https://www.howtogeek.com/131623/how-to-clear-your-computers-cmos-to-reset-bios-settings/
    3 Ways to Reset Your BIOS - wikiHow:
    http://www.wikihow.com/Reset-Your-BIOS


    13) In the left lower corner search type: system or system control > open system control panel > on the left pane click advanced system settings 

    a) > on the advanced tab under startup and recovery > click settings > post an image of the startup and recovery window into the thread

    b) > on the advanced tab under performance > click on settings > on the performance options window > click on the advanced tab > under virtual memory > click on change > post an image of the virtual memory window into the thread

    14) Choose one of the temperature monitoring software applications: Speecy, HW monitor, Speed fan:
    Speccy - Free Download - Piriform: Speccy - System Information - Free 
    https://www.piriform.com/speccy
    https://www.ccleaner.com/speccy
    HWMONITOR | Softwares | CPUID: HWMONITOR | Softwares | CPUID
    http://www.cpuid.com/softwares/hwmonitor.html
    SpeedFan - Access temperature sensor in your computer: SpeedFan - Access temperature sensor in your computer
    http://www.almico.com/speedfan.php


    15) Run Prime95 - Stress Test Your CPU
    Prime95 - Stress Test Your CPU Windows 10 Performance Maintenance Tutorials
    https://www.tenforums.com/tutorials/16474-prime95-stress-test-your-cpu.html
    a) Record the maximum temperature and post the maximum temperature into the thread
    b) Record test duration and post the uninterrupted test duration into the thread
    c) Aim for testing > 3 hrs and abort testing as needed for freezing, temperature changes (see link)
    d) Use the Microsoft snipping tool to post images into the thread.
    Take Screenshot in Windows 10 Windows 10 General Tips Tutorials 
    https://www.tenforums.com/tutorials/6102-take-screenshot-windows-10-a.html

    16) Run HD Tune (free version) (all drives)
    https://www.hdtune.com/
    Post images into the thread for results on these tabs:
    a) Health
    b) Benchmark
    c) Full error scan


    17) Run Sea Tools for Windows
    long generic test
    Post an image of the test result into the thread
    http://www.seagate.com/support/downloads/seatools/seatools-win-master/
    http://knowledge.seagate.com/articles/en_US/FAQ/202435en

    18) In the left lower corner search type:  dxdiag > When the DirectX Diagnostic Tool opens click on the next page button so that each tab is opened > click on save all information > save to desktop > post one drive or drop box share link into the thread

    19) Wait one week before running Prime 95 so that you have time to evaluate computer stability / instability with no overclocking.  If there are no BSOD that will conclude the troubleshooting.  If there are BSOD with WHEA 0x124 after settings are returned to stock then test the CPU with Prime95 and IPDT.

    https://downloadcenter.intel.com/download/19792/Intel-Processor-Diagnostic-Tool

    .

    .

    .


    .
    .
    .
    .
    .

    Please remember to vote and to mark the replies as answers if they help.
    .
    .
    .
    .
    .

    • Marked as answer by brumma Saturday, October 19, 2019 7:24 PM
    Saturday, October 12, 2019 12:26 AM
  • I am currently working my way through the process and am currently running Prime95 with no OC. My CPU temps are getting quite high—above 80c—and I am getting a bit nervous. Should I stop the test? How high is safe for an i9 9900k during the test? Thanks, Chris
    Sunday, October 13, 2019 10:06 PM
  • Confirm that this is the link for the CPU > temp 100

    https://www.intel.com/content/www/us/en/products/processors/core/i9-processors/i9-9900.html

    Often software testing of hardware does not find faulty hardware and other types of testing are needed such as swaps.

    Download and install:  IPDT

    https://downloadcenter.intel.com/download/19792/Intel-Processor-Diagnostic-Tool

    Post images and results for all steps / tests using share links (one drive, drop box, or google drive)



    Monday, October 14, 2019 5:11 AM
  • After troubleshooting over the weekend, here are my test results:

    Steps 1-9)

    Microsoft Windows [Version 10.0.17763.678]
    (c) 2018 Microsoft Corporation. All rights reserved.

    C:\Windows\system32>sfc /scannow

    Beginning system scan.  This process will take some time.

    Beginning verification phase of system scan.
    Verification 100% complete.

    Windows Resource Protection found corrupt files but was unable to fix some of them.
    For online repairs, details are included in the CBS log file located at
    windir\Logs\CBS\CBS.log. For example C:\Windows\Logs\CBS\CBS.log. For offline
    repairs, details are included in the log file provided by the /OFFLOGFILE flag.

    C:\Windows\system32>dism /online /cleanup-image /scanhealth

    Deployment Image Servicing and Management tool
    Version: 10.0.17763.1

    Image Version: 10.0.17763.678

    [==========================100.0%==========================] The component store is repairable.
    The operation completed successfully.

    C:\Windows\system32>dism /online /cleanup-image /restorehealth

    Deployment Image Servicing and Management tool
    Version: 10.0.17763.1

    Image Version: 10.0.17763.678

    [==========================100.0%==========================] The restore operation completed successfully.
    The operation completed successfully.

    C:\Windows\system32>sfc /scannow

    Beginning system scan.  This process will take some time.

    Beginning verification phase of system scan.
    Verification 100% complete.

    Windows Resource Protection found corrupt files and successfully repaired them.
    For online repairs, details are included in the CBS log file located at
    windir\Logs\CBS\CBS.log. For example C:\Windows\Logs\CBS\CBS.log. For offline
    repairs, details are included in the log file provided by the /OFFLOGFILE flag.

    C:\Windows\system32>chkdisk /scan
    'chkdisk' is not recognized as an internal or external command,
    operable program or batch file.

    C:\Windows\system32>chkdsk /scan
    The type of the file system is NTFS.

    Stage 1: Examining basic file system structure ...
      613632 file records processed.
    File verification completed.
      9450 large file records processed.
      0 bad file records processed.

    Stage 2: Examining file name linkage ...
      1409 reparse records processed.
      737382 index entries processed.
    Index verification completed.
      0 unindexed files scanned.
      0 unindexed files recovered to lost and found.
      1409 reparse records processed.

    Stage 3: Examining security descriptors ...
    Security descriptor verification completed.
      61876 data files processed.
    CHKDSK is verifying Usn Journal...
      38693088 USN bytes processed.
    Usn Journal verification completed.

    Windows has scanned the file system and found no problems.
    No further action is required.

     976132095 KB total disk space.
     183203380 KB in 532458 files.
        375568 KB in 61877 indexes.
             0 KB in bad sectors.
        751291 KB in use by the system.
         65536 KB occupied by the log file.
     791801856 KB available on disk.

          4096 bytes in each allocation unit.
     244033023 total allocation units on disk.
     197950464 allocation units available on disk.

    C:\Windows\system32>wmic recoveros set autoreboot = false
    Updating property(s) of '\\BRUMMA-NZXT\ROOT\CIMV2:Win32_OSRecoveryConfiguration.Name="Microsoft Windows 10 Pro|C:\\Windows|\\Device\\Harddisk3\\Partition4"'
    Property(s) update successful.

    C:\Windows\system32>wmic recoveros set DebugInfoType = 7
    Updating property(s) of '\\BRUMMA-NZXT\ROOT\CIMV2:Win32_OSRecoveryConfiguration.Name="Microsoft Windows 10 Pro|C:\\Windows|\\Device\\Harddisk3\\Partition4"'
    Property(s) update successful.

    C:\Windows\system32>

    Link: https://www.dropbox.com/s/zomoiq9ntp3dwwk/debugging.txt?dl=0


    Step 10-12) 

    Overclocking is disabled and BIOS was reset to defaults.


    Step 13)





    Step 15 - Prime95 Test)

    a) The maximum temp during the Prime95 test was 90c

    b) Test duration was 4 hrs with no errors

    c) Test remained uninterrupted

    d) Temperatures: https://www.dropbox.com/s/u30vk7kmmed9mu4/HWmonitor.PNG?dl=0


    Step 16 - HDTune)

    - All drives were healthy

    https://www.dropbox.com/s/m7t1ouqtn95o9ns/HDTune_Benchmark_________ST2000DM008-2FR1.png?dl=0

    https://www.dropbox.com/s/jn7eqs9u18oj668/HDTune_Error_Scan_________ST2000DM008-2FR1.png?dl=0

    https://www.dropbox.com/s/ydje1cebepeki6c/HDTune_Health_________ST2000DM008-2FR1.png?dl=0

    https://www.dropbox.com/s/d8mz0f71necniu6/HDTune_Benchmark_HGST____HTS721010A9E630.png?dl=0

    https://www.dropbox.com/s/0m4b4csuayu6w6k/HDTune_Error_Scan_HGST____HTS721010A9E630.png?dl=0

    https://www.dropbox.com/s/0gp588th4bz0sgp/HDTune_Health_HGST____HTS721010A9E630.png?dl=0

    https://www.dropbox.com/s/brwtj6dj7cscx4q/HDTune_Benchmark_NVMe____Samsung_SSD_950.png?dl=0

    https://www.dropbox.com/s/1wm5qytle7hvss0/HDTune_Error_Scan_NVMe____Samsung_SSD_950.png?dl=0

    https://www.dropbox.com/s/agacm482435u14k/HDTune_Health_NVMe____Samsung_SSD_950.png?dl=0

    https://www.dropbox.com/s/doez7j9fko3tghk/HDTune_Benchmark_NVMe____Samsung_SSD_970.png?dl=0

    https://www.dropbox.com/s/8pr2g7ze0kvnsz9/HDTune_Error_Scan_NVMe____Samsung_SSD_970.png?dl=0

    https://www.dropbox.com/s/ji6re3acd1pryr8/HDTune_Health_NVMe____Samsung_SSD_970.png?dl=0


    Step 18)

    https://www.dropbox.com/s/smhqnwtuu1m3vog/DxDiag.txt?dl=0


    Intel Processor Diagnostics Test)

    https://www.dropbox.com/s/zniar30bsqgh62j/IPDT_TESTRESULTS.TXT?dl=0

    Everything appears to be stable at this point with no crashes since disabling Game Boost in the BIOS.  However, a few questions remains...

    1) Why did the system allow me to overclock for 3 months before problems developed.  Should I still be concerned that my processor is damaged somehow?

    2) The temperatures seemed rather high during the Prime95 test.  They usually hovered around 60-65c for the duration of the test, but got as high as 90c--Is this an issue or normal for this processor?

    I appreciate everyone's input so far.

    
    • Edited by brumma Tuesday, October 15, 2019 9:20 AM
    Tuesday, October 15, 2019 9:19 AM
  • "Why did the system allow me to overclock for 3 months before problems developed.  Should I still be concerned that my processor is damaged somehow?"

    When you overclock a processor the amount of heat it generates also increases considerably. It is possible that over the three months that it was overclocked it slowly degraded the fan's or heatsink's ability to dissipate heat from the system appropriately. It may be a good idea to add some thermal paste to the bottom of the heatsink and check/clean the fans. 

    The tests you've done seem to show a healthy processor now. Did you adjust voltage settings on the bios when you overclocked? That may be the root of the issue here.

    "The temperatures seemed rather high during the Prime95 test.  They usually hovered around 60-65c for the duration of the test, but got as high as 90c--Is this an issue or normal for this processor?"

    60°C-65°C is normal for that CPU. The 90°C hits were probably just part of the stress test. 

    -Aaron

    • Marked as answer by brumma Saturday, October 19, 2019 7:24 PM
    Tuesday, October 15, 2019 4:25 PM
  • First scannow: Windows Resource Protection found corrupt files but was unable to fix some of them.

    Second scannow: Windows Resource Protection found corrupt files and successfully repaired them.

    .

    .

    On the ST2000 drive:

    Run Sea Tools for Windows
    long generic test
    Post an image of the test result into the thread
    http://www.seagate.com/support/downloads/seatools/seatools-win-master/
    http://knowledge.seagate.com/articles/en_US/FAQ/202435en

    .

    .

    On the Samsung drives see if there are tests supported for the drives:

    https://www.samsung.com/semiconductor/minisite/ssd/product/consumer/magician/

    .

    .

    Run HD Sentinel:  (free or trial edition)
    Hard Disk Sentinel - HDD health and temperature monitoring
    https://www.hdsentinel.com/
    Post images of each of these tabs into the thread:
     Overview tab
     Temperature
     SMART

    .

    .

    In one week run the administrative command prompt command to collect new log files and post share link with dxdiag.

    New computers when not overclocked should not have corrected and /or uncorrected hardware errors (WHEA).

    It these errors occur with components under warranty then consider Return Merchandise Authorization (RMA).

    .

    .

    In the left lower corner search type: reliability > open reliability control panel

    https://www.howtogeek.com/166911/reliability-monitor-is-the-best-windows-troubleshooting-tool-you-arent-using/


    Wednesday, October 16, 2019 1:45 AM
  • Hi,

     

    Any update? Please let us know if you need more help.


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Thursday, October 17, 2019 7:17 AM
  • I've had to do some work over the last few days, so haven't had the downtime to run any more tests.  I will be doing so and posting the results in the next day or so.  Thanks!

    Thursday, October 17, 2019 8:30 PM
  • Here are the results from SEA Tools on the ST2000 HDD:

    https://www.dropbox.com/s/wyjivmdpkard9tv/SEA_tools.PNG?dl=0

    I downloaded and ran Samsung Magician with the following results:

    SSD Benchmarks: https://www.dropbox.com/s/svanhiwu67zk54l/SSDs_benchmarks.PNG?dl=0

    970EvoPlus Diagnostic: https://www.dropbox.com/s/bgi28qxqkkf8cv1/970EvoPlus_diag.PNG?dl=0
    HDD Benchmarks: https://www.dropbox.com/s/swyrfict205rrc4/HDDs_benchmarks.PNG?dl=0

    I ran HD Sentinel with the following results:

    970EvoPlus:

    Overview
    Temperature
    S.M.A.R.T.

    950Pro:

    Overview
    Temperature
    S.M.A.R.T.

    ST2000:

    Overview
    Temperature
    S.M.A.R.T.

    HGST:

    Overview
    Temperature
    S.M.A.R.T.

    Finally, I opened the Reliability Control panel. Here are the statistics as of Oct. 17, 2019 (history is limited since the last reinstallation):

    https://www.dropbox.com/s/45rvrllx628i4of/reliability_10-17-19.PNG?dl=0

    As far as next steps are concerned, I have a few more questions. Should I run the tests again with overclocking enabled or disabled? Wouldn't be a good idea to run the tests with OC in order to pinpoint the problem? If the system remains stable without overclocking, should I assume that there is no problem with the processor, or is that still a possibility?

    Thanks to all for your help so far!

    P.S. As I was typing this resply, my C drive temperatures have now reached into the red at 52C. Is that normal behavior for a 970Evo Plus, or could this be contributing to my problems?


    • Edited by brumma Friday, October 18, 2019 2:24 AM
    Friday, October 18, 2019 2:07 AM
  • In one week run the administrative command prompt command and post a share link with log files into the newest post.

    Manually collect dxdiag and include in the share link.

    The reliability control panel displayed red circles and yellow triangles.

    Click on these columns to viewed what had happened.

    Please make sure that there is no overclocking for the week so that the logs can report computer stability instability while the computer is set to stock.

    Cooling and fans should be checked and cleaned as needed.

    The logs had displayed corrected and uncorrected hardware errors.

    New computers should not have WHEA errors.

    If the logs display corrected or uncorrected hardware errors when there is no overclock then contact the computer or motherboard manufacturer or computer store for a Return Merchandise Authorization (RMA).

    During this one week period you will check/ monitor the reliability control panel and at the end of the 7 day period the logs will be scanned. 


    .
    .
    .
    .

    Please remember to vote and to mark the replies as answers if they help.
    .
    .
    .
    .
    .
    Friday, October 18, 2019 4:50 AM
  • Just a few days later and I've had numerous software crashes as well as a system crash on startup (just froze, no BSoD).  This time it indicated an INTERNAL_POWER_ERROR.  All of the files are in a folder here:

    10/19/2019 error logs

    If you just want the .dmp files and critical events log, I've included links to them here:

    Memory Dump file

    Critical Events log

    In addition, here is the folder containing the application crash logs:

    App crash logs

    I have tried to eliminate the source of any software issues, such as the LEDkeeper.exe listed in several of the errors.  I am continuing with the troubleshooting methodology outlined above and will post any additional results.  I am seriously considering getting an RMA for the system, but would like to have a better idea of which component(s) is causing the problem.

    Thanks again.

    Saturday, October 19, 2019 8:18 PM
  • There were no misbehaving drivers seen in the latest memory dump.

    Please run the log collector and include dxdiag results sometime this week.

    If corrected hardware errors continue without overclocking Plan RMA.


    .
    .
    .

    Please remember to vote and to mark the replies as answers if they help.
    .
    .
    .
    .
    .



    Sunday, October 20, 2019 2:09 AM